您的位置：首页 > 编程语言 > Python开发

python_从web抓取文档

2011-11-20 11:17 190 查看

从Web 的一个URL中抓取文档

代码：

import urllib
doc = urllib.urlopen("http://www.python.org").read()
print doc　　　　直接打印出网页
def reporthook(*a):
print a
#将http://www.renren.com网页保存到renre.html中，每读取一个块调用一字reporthook函数

urllib.urlretrieve("http://www.renren.com",'renren.html',reporthook)
#将http://www.renren.com网页保存到renre.html中

urllib.urlretrieve("http://www.renren.com",'renren.html')

结果：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

..........................网页内容

</body>

</html>

(0, 8192, -1)

(1, 8192, -1)

(2, 8192, -1)

(3, 8192, -1)

(4, 8192, -1)

(5, 8192, -1)

urllib.urlopen返回一个类文件对象..

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航