用python来为自己办事-抓取网页内容
2014-06-01 11:12
375 查看
import sys,urllib
url="http://www.putclub.com/html/radio/VOA/presidentspeech/index.html"
wp = urllib.urlopen(url)
print "start download..."
content = wp.read()
print content.count("center_box")
index = content.find("center_box")
content=content[content.find("center_box")+1:]
content=content[content.find("href=")+7:content.find("target")-2]
filename = content
url ="http://www.putclub.com/"+content
print content
print url
wp = urllib.urlopen(url)
print "start download..."
content = wp.read()
#print content
print content.count("<div class=\"content\"")
#content = content[content.find("<div class=\"content\""):]
content = content[content.find("<!--info end------->"):]
content = content[:content.find("<div class=\"dede_pages\"")-1]
filename = filename[filename.find("presidentspeech")+len("presidentspeech/"):]
filename = filename.replace('/',"-",filename.count("/"))
fp = open(filename,"w+")
fp.write(content)
fp.close()
print content
url="http://www.putclub.com/html/radio/VOA/presidentspeech/index.html"
wp = urllib.urlopen(url)
print "start download..."
content = wp.read()
print content.count("center_box")
index = content.find("center_box")
content=content[content.find("center_box")+1:]
content=content[content.find("href=")+7:content.find("target")-2]
filename = content
url ="http://www.putclub.com/"+content
print content
print url
wp = urllib.urlopen(url)
print "start download..."
content = wp.read()
#print content
print content.count("<div class=\"content\"")
#content = content[content.find("<div class=\"content\""):]
content = content[content.find("<!--info end------->"):]
content = content[:content.find("<div class=\"dede_pages\"")-1]
filename = filename[filename.find("presidentspeech")+len("presidentspeech/"):]
filename = filename.replace('/',"-",filename.count("/"))
fp = open(filename,"w+")
fp.write(content)
fp.close()
print content
相关文章推荐
- 萌新的Python学习日记 - 爬虫无影 - 使用BeautifulSoup + css selector 抓取自己想要网页内容
- python抓取网页内容
- 利用python抓取网页各种类型内容(静态、动态)
- [Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容
- 一个极其简洁的Python网页抓取程序,自己主动从雅虎財经抓取股票数据
- Python抓取网页中内容,正则分析后…
- python实现抓取网页上的内容并发送到邮箱
- [Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容
- [Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容
- paip.抓取网页内容--java php python
- python网页抓取之自己动手写字典
- python抓取网页内容
- 用python的curl和lxml来抓取和分析网页内容
- [Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容
- Python网页抓取:获取页面中某段内容的xpath
- Python抓取网页内容应用代码分析
- Python_BeautifulSoup 抓取网页内容入门
- [Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容
- python抓取网页内容
- Python抓取网页内容应用代码分析