您的位置：首页 > 编程语言 > Python开发

Python-爬虫初学

2015-10-16 14:04 393 查看

#爬取网站中的图片
1 import re     #正则表达式库
import urllib #url链接库

def getHtml(url):
page = urllib.urlopen(url) #打开链接
html = page.read()         #像读文本一样读取网页内容
return html

def getImg(html):
reg = r'<img src="(.+?\.png)" alt'   #匹配表达式
imgre = re.compile(reg)              #编译成正则表达式对象
imglist =re.findall(imgre, html)     #查找全部满足匹配的
x = 0
for imgurl in imglist:
print "imgurl:", imgurl
urllib.urlretrieve("http://www.uestc.edu.cn/" + imgurl, '%d.png' % x)  #依次遍历下载，源链接用的是相对地址，所以添加前缀
x += 1

html = getHtml("http://www.uestc.edu.cn/")
print getImg(html)
#print html

参考学习链接：
http://www.cnblogs.com/fnng/p/3576154.html

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航