您的位置：首页 > 编程语言 > Python开发

Python:爬取乌云厂商列表，使用BeautifulSoup解析

2016-01-27 15:52 218 查看

在SSS论坛看到有人写的Python爬取乌云厂商，想练一下手，就照着重新写了一遍

原帖：http://bbs.sssie.com/thread-965-1-1.html

#coding:utf-8
import urllib2
from bs4 import BeautifulSoup

url = 'http://wooyun.org/corps/page/'
total_page = 44
count = 1

file = open('wooyunCS1.csv', 'w')

for num in range(1, total_page + 1):
real_url = url + str(num)
response = urllib2.urlopen(real_url)
html = response.read()
soup = BeautifulSoup(html, 'html.parser', from_encoding='utf-8')
for i in range(0, len(soup('td', width='370'))):
if i % 2 == 0:
name = soup('td', width='370')[i].get_text()
link = soup('td', width='370')[i + 1].get_text()
print name, ':', link
file.write(str(count) + ',' + name.encode('utf-8') + ',' + link.encode('utf-8'))
count += 1

file.close()
print "OVER"

#总结：
#存储CSV时候的格式： 用 + ',' + 格式，就会把每个参数分开成每一列存储
#所需要的内容交替出现时，可用取位置的方法，偶数行和奇数行来分别取
#在此例中使用str(num)，比使用re.sub()简便

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航