您的位置:首页 > 编程语言 > Python开发

python get页面 并查找特定的数字来做成排名系统

2013-10-21 17:22 417 查看
针对    http://ulive.univs.cn/event/event/template/index/265.shtml   中间部分的排名,抓取网站名字,跟点开链接后所获得的票数制成一张网站跟票数对应的表

调用requests模块跟 BeautifulSoap模块来处理。十分迅速

from bs4 import BeautifulSoup
import requests
from operator import itemgetter, attrgetter

import re
#coding=utf-8
Target='http://ulive.univs.cn/event/event/template/index/265.shtml'
Host = 'http://ulive.univs.cn'
s = requests.session()

r1 = s.get(Target)

soup=BeautifulSoup(r1.text)

Schools=[]
Ar=soup.findAll(attrs={'title':True})
Ar=Ar[:-2]
for oc in Ar:
string=str(oc)
string=string[string.index("href")+6:]
URL=string[:string.index('"')]
newURL=Host+URL
voteId=newURL[-12:-6]
newRequests=s.get(newURL)
if newRequests:
newSoap = BeautifulSoup(newRequests.text)
if newSoap:
QueryPiao=newSoap.find("span",id="voteNum265-"+voteId)
if QueryPiao :
QueryPiao=QueryPiao.text
QueryPiao=eval(QueryPiao)
schoolName=oc.text
Schools.append((oc.text,QueryPiao))
else:
print oc.text + "QueryPiao is error"
else:
print oc.text+" soap is error"
else:
print oc.text + " is error"

i = len(Schools)-1

while i >0 :
i=i-1
print Schools[i][0]
print str(Schools[i][1])

exit(0)


最后效果如同



 输出时就命令行里重定向一下就好了 

python QueryVote > 20131021.txt

不过这时候会碰到编码的问题。

异常: 'ascii' codec can't encode characters 

字符集的问题,在文件前加两句话: 

reload(sys) 
sys.setdefaultencoding( "utf-8" ) 

然后再加一条 Schools.sort(key = lamba x : x[1]) 就排序完了。。

最终的代码是:

from bs4 import BeautifulSoup
import requests
import sys
import time
from operator import itemgetter, attrgetter

import re
reload(sys)
sys.setdefaultencoding("utf-8")

#coding:utf-8

Target='http://ulive.univs.cn/event/event/template/index/265.shtml'
Host = 'http://ulive.univs.cn'
s = requests.session()

r1 = s.get(Target)

soup=BeautifulSoup(r1.text)

Schools=[]
Ar=soup.findAll(attrs={'title':True})
Ar=Ar[:-2]
for oc in Ar:
string=str(oc)
string=string[string.index("href")+6:]
URL=string[:string.index('"')]
newURL=Host+URL
voteId=newURL[-12:-6]
newRequests=s.get(newURL)
if newRequests:
newSoap = BeautifulSoup(newRequests.text)
if newSoap:
QueryPiao=newSoap.find("span",id="voteNum265-"+voteId)
if QueryPiao :
QueryPiao=QueryPiao.text
QueryPiao=eval(QueryPiao)
schoolName=oc.text
Schools.append((oc.text,QueryPiao))
else:
print oc.text + "QueryPiao is error"
else:
print oc.text+" soap is error"
else:
print oc.text + " is error"

Schools.sort(key = lambda x:x[1])

print "Current Time is : "
print time.strftime('"%Y-%m-%d"',time.localtime(time.time()))
i = len(Schools)-1
t = 1
while i >0 :
i=i-1
print ''
print str(t)
t = t + 1
print Schools[i][0]
print str(Schools[i][1])
print ''
exit(0)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python 抓取数据