您的位置:首页 > 编程语言 > Python开发

python3.5 爬取bing搜索结果页面标题、链接

2016-09-17 18:59 316 查看
一个简单的爬虫小程序,可以抓取bing输入关键字后第一个页面的标题、链接。

import re,urllib.parse,urllib.request,urllib.error
from bs4 import BeautifulSoup as BS

baseUrl = 'http://cn.bing.com/search?'
word = '鹿晗 吴亦凡 张艺兴'
print(word)
word = word.encode(encoding='utf-8', errors='strict')
#print(word)

data = {'q':word}
data = urllib.parse.urlencode(data)
#print(data)
url = baseUrl+data
print(url)

try:
html = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
print(e.code)
except urllib.error.URLError as e:
print(e.reason)

soup = BS(html,"html.parser")
td = soup.findAll("h2")
count = soup.findAll(class_="sb_count")
for c in count:
print(c.get_text())

for t in td:
print(t.get_text())
pattern = re.compile(r'href="([^"]*)"')
h = re.search(pattern,str(t))
if h:
for x in h.groups():
print(x)


运行结果截图:

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python bing 爬虫