您的位置：首页 > 编程语言 > Python开发

Python使用urllib2模块抓取HTML页面资源的实例分享

2016-05-03 00:00 1086 查看

先把要抓取的网络地址列在单独的list文件中

http://www.jb51.net/article/83440.html http://www.jb51.net/article/83437.html http://www.jb51.net/article/83430.html http://www.jb51.net/article/83449.html

然后我们来看程序操作，代码如下：

#!/usr/bin/python

import os
import sys
import urllib2
import re

def Cdown_data(fileurl, fpath, dpath):
if not os.path.exists(dpath):
os.makedirs(dpath)
try:
getfile = urllib2.urlopen(fileurl)
data = getfile.read()
f = open(fpath, 'w')
f.write(data)
f.close()
except:
print

with open('u1.list') as lines:
for line in lines:
URI = line.strip()
if '?' and '%' in URI:
continue
elif URI.count('/') == 2:
continue
elif URI.count('/') > 2:
#print URI,URI.count('/')
try:
dirpath = URI.rpartition('/')[0].split('//')[1]
#filepath = URI.split('//')[1].split('/')[1]
filepath = URI.split('//')[1]
if filepath:
print URI,filepath,dirpath
Cdown_data(URI, filepath, dirpath)
except:
print URI,'error'

原文网址为：http://www.diyoms.com/python/1806.html

您可能感兴趣的文章:

零基础写python爬虫之urllib2使用指南
零基础写python爬虫之urllib2中的两个重要概念：Openers和Handlers
零基础写python爬虫之使用urllib2组件抓取网页内容
Python库urllib与urllib2主要区别分析
python中使用urllib2获取http请求状态码的代码例子
Python中使用urllib2防止302跳转的代码例子
python中使用urllib2伪造HTTP报头的2个方法
python通过urllib2爬网页上种子下载示例
Python使用urllib2获取网络资源实例讲解

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： Python urllib2

相关文章推荐

新的分享

章节导航