您的位置:首页 > 其它

Windows Live Writer客户端在51CTO写博客

2009-10-13 09:41 465 查看
简单小爬虫

#!/usr/bin/env python
#coding:utf-8
import urllib2
import bs4
url = 'http://www.163.com'
content = urllib2.urlopen(url).read()
content = content.decode('gbk')

soup = bs4.BeautifulSoup(content)
links = soup.select('li a[href]')

result = []
for link in links:
href = link.attrs['href']
title = link.text
if '.html' in href and '163.com' in href and len(title) >3:
result.append(link)
for link in result:
print link.attrs['href'], link.text

print '共有新闻[%s]条', len(result)

本文出自 “Linux_Config” 博客,请务必保留此出处http://liang1026.blog.51cto.com/10119067/1681675
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: