Python抓取知乎答案内容
2016-08-24 18:10
387 查看
import urllib2 import re from bs4 import BeautifulSoup class Spider(): def __init__(self, user_agent): self.user_agent = user_agent def analyzeHtml(self, content): if content is None: print "Empty" print content bs = BeautifulSoup(content,"html.parser") title = bs.title author = bs.find_all("a",class_="author-link") if author is not None: for a in author: print a for a_name in a.strings: print a_name answers = bs.find_all("div", class_="zm-editable-content clearfix") if answers is not None: for answer in answers: for answer_detail in answer.strings: print answer_detail print answer def getContentFromHost(self, url): header = {"User-Agent": self.user_agent} request = urllib2.Request(url, headers=header) response = urllib2.urlopen(request) content = response.read() return content if __name__ == '__main__': host = "https://www.zhihu.com/question/48554642" user_agent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)" spider = Spider(user_agent) spider.analyzeHtml(spider.getContentFromHost(host))
相关文章推荐
- Python使用Srapy框架爬虫模拟登陆并抓取知乎内容
- Python使用Srapy框架爬虫模拟登陆并抓取知乎内容
- Python网页抓取:获取页面中某段内容的xpath
- [python]抓取网页的内容
- Python3 urllib抓取指定URL的内容
- paip.抓取网页内容--java php python
- paip.抓取网页内容--java php python
- 用Python实现页面内容抓取
- python登录新浪微博并抓取内容
- [Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容
- [Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容
- [Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容
- python抓取网页内容
- Python抓取html内容
- Python抓取网页内容应用代码分析
- 用python的curl和lxml来抓取和分析网页内容
- python抓取页面内容 实例
- python抓取网页内容
- 利用Python和Beautiful Soup抓取网页内容
- [Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容