Python 简单爬虫 豆瓣热门影评
2016-09-09 18:40
417 查看
第一次写Python,备忘用,写的不完善大家见笑了
# -*- coding:utf-8 -*- import urllib import urllib2 import re import xlwt book=xlwt.Workbook(encoding='utf-8',style_compression=0) sheet=book.add_sheet('movie_review',cell_overwrite_ok=True) sheet.write(0, 0, '标题') sheet.write(0, 1, '影评人') sheet.write(0, 2, '电影') sheet.write(0, 3, '星级') sheet.write(0, 4, '时间') sheet.write(0, 5, '内容') baseurl='https://movie.douban.com/review/best/?start=' for i in range(0,3): url_list=baseurl+str(i*20) request_url = urllib2.Request(url_list) response_url = urllib2.urlopen(request_url) html_url = response_url.read().decode('utf-8') pattern_url = re.compile('<h3 class="title">.*?<a href="(.*?)/"', re.S) url_thispage = re.findall(pattern_url, html_url) for j in range(0,10): url = url_thispage[j] request = urllib2.Request(url) response = urllib2.urlopen(request) html = response.read().decode('utf-8') pattern_title = re.compile('<span property="v:summary">(.*?)</span>',re.S) pattern_reviewer = re.compile('<span property="v:reviewer">(.*?)</span>',re.S) pattern_movie = re.compile('<a href="https://movie.douban.com/subject/.*?/">(.*?)</a>',re.S) pattern_star = re.compile('<span property="v:rating" class="main-title-hide">(.*?)</span>',re.S) pattern_time = re.compile('<p property="v:dtreviewed".*?">(.*?)</p>',re.S) pattern_content = re.compile('<div property="v:description" class="clearfix">(.*?)</div>',re.S) title = re.findall(pattern_title,html) reviewer = re.findall(pattern_reviewer,html) movie = re.findall(pattern_movie,html) star = re.findall(pattern_star,html) time = re.findall(pattern_time,html) content = re.findall(pattern_content,html) k=i*10+j+1 sheet.write(k,0,title[0]) sheet.write(k,1,reviewer[0]) sheet.write(k,2,movie[0]) sheet.write(k,3,star[0]) sheet.write(k,4,time[0]) sheet.write(k,5,content[0]) book.save('d:\ test.xls') #print k
相关文章推荐
- Python动态类型的学习---引用的理解
- Python3写爬虫(四)多线程实现数据爬取
- 垃圾邮件过滤器 python简单实现
- 下载并遍历 names.txt 文件,输出长度最长的回文人名。
- install and upgrade scrapy
- Scrapy的架构介绍
- Centos6 编译安装Python
- 使用Python生成Excel格式的图片
- 让Python文件也可以当bat文件运行
- [Python]推算数独
- 爬虫笔记
- Python中zip()函数用法举例
- Python中map()函数浅析
- Python将excel导入到mysql中
- Python在CAM软件Genesis2000中的应用
- 使用Shiboken为C++和Qt库创建Python绑定
- Python菜鸟之路:Django 路由、模板、Model(ORM)