您的位置:首页 > 编程语言 > Python开发

用python来为自己办事-抓取网页内容

2014-06-01 11:12 375 查看
import sys,urllib

url="http://www.putclub.com/html/radio/VOA/presidentspeech/index.html"

wp = urllib.urlopen(url)

print "start download..."

content = wp.read()

print content.count("center_box")

index = content.find("center_box")

content=content[content.find("center_box")+1:]

content=content[content.find("href=")+7:content.find("target")-2]

filename = content

url ="http://www.putclub.com/"+content

print content

print url

wp = urllib.urlopen(url)

print "start download..."

content = wp.read()

#print content

print content.count("<div class=\"content\"")

#content = content[content.find("<div class=\"content\""):]

content = content[content.find("<!--info end------->"):]

content = content[:content.find("<div class=\"dede_pages\"")-1]

filename = filename[filename.find("presidentspeech")+len("presidentspeech/"):]

filename = filename.replace('/',"-",filename.count("/"))

fp = open(filename,"w+")

fp.write(content)

fp.close()

print content
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: