python3 [入门基础实战] 爬虫入门之爬取糗事百科
2017-05-24 23:43
701 查看
#encoding=utf8 import requests from lxml import etree class QiuShi(object): headers = { "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36" } url = 'http://www.qiushibaike.com/text/' def __init__(self): filed = ['作者','性别','年龄','段子内容','好笑','评论'] # self.write = CSV('qiushi.csv',filed) print(filed) # 总页码 def totalUrl(self): urls = [self.url+'page/{}?s=4985075'.format(i) for i in range(1,36)] for url in urls: print(u'正在获取:'+url.split('/')[-2]+u'页') self.getInfo(url) # 抓取详细信息 def getInfo(self,url): item= {} html = requests.get(url,headers = self.headers).text data = etree.HTML(html) infos = data.xpath('//*[@class="article block untagged mb15"]') print(infos) for info in infos: try: item[1] = info.xpath('div[1]/a[2]/h2/text()')[0] try: age = info.xpath('div[1]/div[@class="articleGender womenIcon"]/text()')[0] item[2] = u'女' item[3] = age except: age = info.xpath('div[1]/div[@class="articleGender manIcon"]/text()')[0] item[2] = u'男' item[3] = age except: item[1] = u'匿名用户' item[2] = u'不详' item[3] = u'不详' item[4] = info.xpath('a/div/span/text()')[0].strip() item[5] = info.xpath('div[2]/span[1]/i/text()')[0] item[6] = data.xpath('//*[@class="qiushi_comments"]/i/text()')[0] row = [item[i] for i in range(1, 7)] # self.write.writeRow(row) print(row) # with open('C:\\QiuShiBaiKe.cvs', 'w+') as f: # # f.write('{},{},{},{},{}'.format(row, work_year, money, palace, '\n')) # f.write(row+"") if __name__ == '__main__': qiushi = QiuShi() qiushi.totalUrl()
相关文章推荐
- python3 [入门基础实战] 爬虫入门之智联招聘的学习(一)
- python3 [入门基础实战] 爬虫入门之爬取豆瓣读书随笔页面
- python3[爬虫基础入门实战] 爬取豆瓣电影排行top250
- python3 [入门基础实战] 爬虫入门之xpath的学习
- python3 [入门基础实战] 爬虫入门之智联招聘的学习(一)
- python爬虫入门 实战(一)---爬糗事百科
- 2018年大神带你用Python零基础进阶课程入门爬虫flask实战
- python3 [入门基础实战] 爬虫入门之xpath爬取脚本之家python栏目
- python3 [入门基础实战] 爬虫入门之爬取豆瓣阅读中文电子书[热门排序]
- python3 [入门基础实战] 爬虫之四季花果园的采果模块
- python3 [入门基础实战] 爬虫入门之刷博客浏览量
- Python爬虫实战一之爬取糗事百科段子
- Python爬虫教程——入门一之爬虫基础了解
- Python爬虫实战(1):爬取糗事百科段子
- Python爬虫实战:糗事百科
- Python爬虫开发(一):零基础入门
- python爬虫实战-糗事百科(最新版亲测好用)
- Python爬虫实战一:爬取糗事百科的文本段子
- Python爬虫----爬虫入门(5)---Requests基础
- Python基础学习-爬虫入门知识