您的位置：首页 > 编程语言 > Python开发

Python3 爬虫实战(二)——图片爬虫

2017-07-20 13:01 399 查看

上一篇博文http://blog.csdn.net/nju_flepped/article/details/75452517爬取了ONE的每日一句，ONE不仅每日一句都很经典，每天的图片也都很好看。这次我们就来爬取每期的图片(截止到2017年7月19号)。

有了上一次爬虫的基础，这次要轻松很多。我们这次只需要分析页面源代码找到目标图片所在的标签即可。源代码如下：

通过观察源代码我们可以看到，目标图片所在的标签是img标签，我们只需要使用bs4的find_all()查找函数，即可找到，整个源代码共有两个img标签，目标图片在第二个img标签中（所以第22行代码中用h[1]取第二个img标签）。代码如下：

import re
from urllib import request
import requests
from bs4 import BeautifulSoup

url='http://wufazhuce.com/one/'#每一期公共部分
Path='B:\\pytest\\MLtest\\one_img\\'#图片保存路径
num=0#记录爬取照片的个数
for i in range(14,1775):
s=str(i)
currenturl=url+s#当前期的url
try:
res=requests.get(currenturl)
res.raise_for_status()
except requests.RequestException as e:
print(e)
else:
html=res.text
soup = BeautifulSoup(html,'html.parser')
a=soup.select('.one-titulo')#期次
h=soup.find_all('img')#图片标签
imgUrl=h[1].get('src')#取图片的链接
index=re.sub("\D","",a[0].string.split()[0])#取得期次
if(index==''):
continue
imgName=Path+'VOL.'+index+'.jpg'#图片的完整路径含图片名
request.urlretrieve(imgUrl,imgName)#保存图片
num+=1
print('已爬取%s张图片...'%num)
print('-----爬取结束-----')

结果：

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： python 爬虫图片

相关文章推荐

新的分享

章节导航