您的位置：首页 > 编程语言 > Python开发

Python3 抓取百度贴吧图片

2016-06-08 17:29 501 查看

我抓取的地址是http://tieba.baidu.com/p/3125473879?pn=2，这个帖子共有82页左右，下面的代码主要抓取82页的所有图片，具体代码如下：

"""抓取百度贴吧图片"""
#导入模块
import re
import urllib
from urllib.request import urlopen,urlretrieve
#获取抓取页面的源代码
def getHtml(url):
page = urlopen(url)
html = str(page.read())
page.close()
return html
#通过源代码以及正则表达式，匹配我们的url
def getImg(html):
reg = r'<img class="BDE_Image" src="(.+?\.jpg)" '
imgre = re.compile(reg)
imglist = re.findall(imgre,html)
x = 0
for imgurl in imglist:
urlretrieve(imgurl,'C:\\Users\\Water\\PycharmProjects\\test\\image\\%s-%s.jpg' % (i,x))
x = x + 1
#调用函数
i = 1
while i < 83:
html = getHtml("http://tieba.baidu.com/p/3125473879?pn=" + str(i))
getImg(html)
i+=1
print(i)

抓取结果如下，我这里只是简单些一下，以后再详细介绍。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： python

相关文章推荐

新的分享

章节导航