您的位置:首页 > 编程语言 > Python开发

使用Python统计csdn技术专家男女比例

2009-08-30 21:30 435 查看
算不上完全原创,拿着
luotuo512
的代码改的,这里感谢luotuo512

原帖地址:

http://topic.csdn.net/u/20090829/23/cd59c0ae-133d-46c9-86af-38cb70d23544.html

http://topic.csdn.net/u/20090830/20/930263ff-6805-450b-931c-33c65003d9a3.html

# -*- coding: utf-8 -*-
#统计csdn技术专家男女比例
import time
import urllib,re,threading,sets
class thr_stat(threading.Thread):
def __init__(self, threadname, m, n):
threading.Thread.__init__(self,name=threadname)
self.m = m
self.n = n
self.male = 0
self.female = 0

def run(self):
print self.getName(),self.m,self.n
for i in range(self.m,self.n):
sock=urllib.urlopen("http://hi.csdn.net/RankingStaticPage/True/3/5001/%d.htm"%i)
source=sock.read()
namepattern=re.compile("a href='//w*/profile")
link=namepattern.findall(source)
link=list(set(link))
for j in link:
sock=urllib.urlopen("http://hi.csdn.net/"+j[9:])
source=sock.read()
if source.find("他的博客")!=-1:
self.male+=1
elif source.find("她的博客")!=-1:
self.female+=1
def output(self):
print self.getName(),
print self.male, self.female
def stat():
male = 0
female = 0
start = 0
stop = 5
step = 1
thrs = []
for t in range(start, stop, step):
name = 'thr_' + str(t / step)
thrs.append(thr_stat(name, t, t+step))
print "start time:%s"%(time.ctime())
for t in thrs:
t.start()
for t in thrs:
t.join()
print "stop time:%s"%(time.ctime())
for t in thrs:
male += t.male
female += t.female
print "male:%d"%male
print "female:%d"%female
if __name__== '__main__':
stat()


输出结果(用时40秒):

start time:Sun Aug 30 21:39:39 2009

thr_0 0 1

thr_1 1 2

thr_2 2 3

thr_3 3 4

thr_4 4 5

stop time:Sun Aug 30 21:40:19 2009

male:73

female:2

更多结果(用时1分18秒):

start time:Sun Aug 30 21:50:48 2009

thr_0 0 2

thr_1 2 4

thr_2 4 6

thr_3 6 8

thr_4 8 10

thr_5 10 12

stop time:Sun Aug 30 21:52:06 2009

male:174

female:6
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: