python 多线程、多进程、协程性能对比(以爬虫为例)
2017-10-09 16:34
579 查看
基本配置:阿里云服务器低配,单核2G内存
首先是看协程的效果:
import requests import lxml.html as HTML import sys import time import gevent from gevent import monkey monkey.patch_all() # create url urls = [] for i in range(int(sys.argv[1]),int(sys.argv[2])): url = 'http://grri94kmi4.app.tianmaying.com/songs?page='+str(i) urls.append(url) def get_data(url): t1 = time.time() res = requests.get(url) if res.status_code == 200: print(url+' : '+'url open success'+' time use: '+ str(time.time()-t1)) html = HTML.fromstring(res.content) trs = html.xpath('//tbody/tr') data = [] for tr in trs: s = {} s['name'] = tr.xpath('./td/a/text()')[0] s['url'] = tr.xpath('./td/a/@href')[0] s['id'] = s['url'][30:] s['comment'] = tr.xpath('./td[last()]/text()')[0] data.append(s) if __name__ == '__main__': total = time.time() task = [] for url in urls: task.append(gevent.spawn(get_data,url)) gevent.joinall(task) print('total time use :', time.time()-total)View Code
在爬取20个链接的情况下,用时为4s:
total time use : 4.873192071914673
线程和进程差不多 ,用时6s左右
import requests import lxml.html as HTML import sys import time from multiprocessing import Pool as ThreadPool # create url urls = [] for i in range(int(sys.argv[1]),int(sys.argv[2])): url = 'http://grri94kmi4.app.tianmaying.com/songs?page='+str(i) urls.append(url) def get_data(url): t1 = time.time() res = requests.get(url) if res.status_code == 200: print(url+' : '+'url open success'+' time use: '+ str(time.time()-t1)) html = HTML.fromstring(res.content) trs = html.xpath('//tbody/tr') data = [] for tr in trs: s = {} s['name'] = tr.xpath('./td/a/text()')[0] s['url'] = tr.xpath('./td/a/@href')[0] s['id'] = s['url'][30:] s['comment'] = tr.xpath('./td[last()]/text()')[0] data.append(s) if __name__ == '__main__': total = time.time() pool = ThreadPool() results = pool.map(get_data,urls) pool.close() pool.join() print('total time use :', time.time()-total)
相关文章推荐
- python采用 多进程/多线程/协程 写爬虫以及性能对比,牛逼的分分钟就将一个网站爬下来!
- 也说性能测试,顺便说python的多进程+多线程、协程
- Python之多进程、多线程、协程和分布式进程
- python中多线程与非线程的执行性能对比
- python多进程、多线程、协程向mysql插入10000条数据
- eventlet引发的学习-python:单线程、多线程在IO两方面的性能对比
- python并发编程之多进程、多线程、异步和协程详解
- Python 多线程 多进程 协程 yield
- 【python】多进程+多线程 制作智联招聘爬虫 写入CSV+mongodb
- Python 多进程 多线程 协程 I/O多路复用
- python3-----多进程、多线程、多协程
- python并发编程之多进程、多线程、异步和协程详解
- Python多进程、多线程、协程学习小结
- python多线程、异步、多进程+异步爬虫
- python学习之路-11 多线程、多进程、协程
- python2 多线程和多进程、协程入门讲解
- 深入浅析python中的多进程、多线程、协程
- python并发编程之多进程、多线程、异步和协程
- python并发编程之多进程、多线程、异步和协程
- python3.6 多进程+协程的配合 提升爬虫效率?