您的位置:首页 > 编程语言 > Python开发

python 多线程编程总结(实验多线程判断网址是否在线)

2014-10-30 09:28 1031 查看
现在做一个针对网址是否在线的判断实验,利用多线程和普通方法来进行对比,以下为代码和代码结果:

一,不使用多线程,代码如下:

#encoding:utf-8

import threading

import urllib2

def online(url = ''):

"""判断网址是否在线"""

req = urllib2.Request(url)

try:

response=urllib2.urlopen(req)

if response.code == 200:

print response.geturl(),' this url is online'

else:

print 'not'

except urllib2.URLError as e:

if hasattr(e, 'reason'):

print url,' We failed to reach a server.'

print 'Reason: ', e.reason

elif hasattr(e, 'code'):

print url,' The server couldn\'t fulfill the request.'

print 'Error code: ', e.code

def main():

url_list = ['http://www.baidu.com','http://www.hitwh.edu.cn','http://www.13.com','http://www.ifeng.com','http://www.sina.com',

'http://www.wewin.com.gr/2','http://www.ifeng.com','http://www.sina.com','http://www.zeeif.com/int/',

'http://www.zeeif.com/websc/verification/',

'http://mjgds.org/classrooms/wp-content/plugins/10421312312/19890907.html',

'http://login-resolution-center-case-475ec2aec1br.propesage-algerie.com/ID',

'http://paypel-login-resolution-center.propesage-algerie.com/ID/',

'http://radiotransilvania.ro/clujarena/rena.php',

'http://kuleteknik.net/wp-includes/lol3.html',

'http://kuleteknik.net/wp-includes/lol2.html'

]

for url in url_list:

#t = threading.Thread(target = online,args = (url,))

#t.start()

online(url)

if __name__ == '__main__':

main()

结果如下:
http://www.baidu.com this url is online
http://www.hitwh.edu.cn this url is online
http://www.13.com We failed to reach a server.

Reason: [Errno 11001] getaddrinfo failed
http://www.ifeng.com this url is online
http://www.sina.com.cn/ this url is online
http://www.wewin.com.gr/2 We failed to reach a server.

Reason: Unauthorized
http://www.ifeng.com this url is online
http://www.sina.com.cn/ this url is online
http://www.zeeif.com/int/ We failed to reach a server.

Reason: Not Found
http://www.zeeif.com/websc/verification/ We failed to reach a server.

Reason: Not Found
http://mjgds.org/classrooms/wp-content/plugins/10421312312/19890907.html We failed to reach a server.

Reason: Internal Server Error
http://login-resolution-center-case-475ec2aec1br.propesage-algerie.com/ID this url is online
http://paypel-login-resolution-center.propesage-algerie.com/ID/ this url is online
http://radiotransilvania.ro/clujarena/rena.php We failed to reach a server.

Reason: Not Found
http://kuleteknik.net/wp-includes/lol3.html this url is online
http://kuleteknik.net/wp-includes/lol2.html this url is online

[Finished in 5.2s]

解释:使用了5.2秒,若判断网址更多,并且其中没有在线的网址更多时,时间会更长

二、使用多线程判断,代码如下:

#encoding:utf-8

import threading

import urllib2

def online(url = ''):

"""判断网址是否在线"""

req = urllib2.Request(url)

try:

response=urllib2.urlopen(req)

if response.code == 200:

print response.geturl(),' this url is online'

else:

print 'not'

except urllib2.URLError as e:

if hasattr(e, 'reason'):

print url,' We failed to reach a server.'

print 'Reason: ', e.reason

elif hasattr(e, 'code'):

print url,' The server couldn\'t fulfill the request.'

print 'Error code: ', e.code

def main():

url_list = ['http://www.baidu.com','http://www.hitwh.edu.cn','http://www.13.com','http://www.ifeng.com','http://www.sina.com',

'http://www.wewin.com.gr/2','http://www.ifeng.com','http://www.sina.com','http://www.zeeif.com/int/',

'http://www.zeeif.com/websc/verification/',

'http://mjgds.org/classrooms/wp-content/plugins/10421312312/19890907.html',

'http://login-resolution-center-case-475ec2aec1br.propesage-algerie.com/ID',

'http://paypel-login-resolution-center.propesage-algerie.com/ID/',

'http://radiotransilvania.ro/clujarena/rena.php',

'http://kuleteknik.net/wp-includes/lol3.html',

'http://kuleteknik.net/wp-includes/lol2.html'

]

for url in url_list:

t = threading.Thread(target = online,args = (url,))

t.start()

#online(url)

if __name__ == '__main__':

main()

结果如下:
http://www.baidu.com this url is online
http://www.ifeng.com this url is online
http://www.13.com We failed to reach a server.

Reason: [Errno 11001] getaddrinfo failed
http://www.ifeng.com this url is online
http://paypel-login-resolution-center.propesage-algerie.com/ID/ this url is online
http://www.hitwh.edu.cn this url is online
http://login-resolution-center-case-475ec2aec1br.propesage-algerie.com/ID this url is online
http://www.sina.com.cn/ this url is online
http://www.sina.com.cn/ this url is online
http://mjgds.org/classrooms/wp-content/plugins/10421312312/19890907.html We failed to reach a server.

Reason: Internal Server Error
http://www.zeeif.com/websc/verification/ We failed to reach a server.

Reason: Not Found
http://www.zeeif.com/int/ We failed to reach a server.

Reason: Not Found
http://kuleteknik.net/wp-includes/lol2.html this url is online
http://kuleteknik.net/wp-includes/lol3.html this url is online
http://www.wewin.com.gr/2 We failed to reach a server.

Reason: Unauthorized
http://radiotransilvania.ro/clujarena/rena.php We failed to reach a server.

Reason: Not Found

[Finished in 1.7s]

解释:每一个网址判断都使用一个线程执行,时间只用了1.7s

总结:

1、当判断的网址多时,数量级达到百万级,多线程的优势会显现的非常大。

2、该多线程代码是为每一个网址创建一个线程,当网址过多时,很显然这个方法不行,所以可以优化该判断代码。

3、当网址存在数据库中时候,如何高效存入数据库,也是很重要的方法。

4、上面判断网址是否在线的函数,个人觉得不是非常正确,因为网址重定向的问题,网址可能不存在,但是重定向后,显示网址还存在,这也是以后改进方法,有改进办法的同学可以跟我留言,共同进步,如果我有方法,也会在博客公开。

更新(2014.10.30)

1、使用pycurl检测url是否在线,效率更高。

2、将其连接数据库,并且将结果存入数据库(自己做的小项目,已经完成)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: