您的位置:首页 > 编程语言 > Python开发

Python 抓取google链接代码

2013-12-06 17:33 253 查看
简单介绍下程序,PY2.7.2写的,如果是PY3的有不兼容的话请参照2-》3的手册自己改吧,另外由于msvcrt模块,只支持windows哈
本程序的原理是基于google的json的api,例如:https://ajax.googleapis.com/ajax
... p;rsz=8&start=1
如下图



1.line代表线程数
2.key是关键字,支持google语法
3.How many代表拉取几条,由于json一页只有8条,所以一个线程一次拉取8条哈
4.任何时候,按q键,直接退出
5.请大家按喜好随便修改

#! /usr/bin/env python
#coding=utf-8
import urllib2,urllib,threading,Queue,os
import msvcrt
import simplejson
import sys

seachstr = raw_input("Key?:")
pagenum = raw_input("How many?:")
pagenum = int(pagenum)/8+1
line = 5

class googlesearch(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.urls= []

def run(self):
while 1:
self.catchURL()
queue.task_done()
def catchURL(self):
self.key = seachstr.decode('gbk').encode('utf-8')
self.page= str(queue.get())
url = ('https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=%s&rsz=8&start=%s') % (urllib.quote(self.key),self.page)
try:
request = urllib2.Request(url)
response = urllib2.urlopen(request)
results = simplejson.load(response)
URLinfo = results['responseData']['results']
except Exception,e:
print e
else:
for info in URLinfo:
print info['url']

class ThreadGetKey(threading.Thread):
def run(self):
while 1:
try:
chr = msvcrt.getch()
if chr == 'q':
print "stopped by your action ( q )"
os._exit(1)
else:
continue
except:
os._exit(1)

if __name__ == '__main__':
pages=[]
queue = Queue.Queue()

for i in range(1,pagenum+1):
pages.append(i)

for n in pages:
queue.put(n)

ThreadGetKey().start()

for p in range(line):
googlesearch().start()

转自:http://sb.f4ck.org/forum.php?mod=viewthread&tid=6205&highlight=python
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python