关键字爬google的pdf
2015-11-05 14:21
399 查看
import google
import requests
def download_file(url,index):
local_filename=index+"-"+url.split("/")[-1]
r=requests.get(url,stream=True)
with open(local_filename,"wb") as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush
return local_filename
g=google.search('site:*.gov.ph filetype:pdf',tld='com.hk')
index=1
for url in g:
if url.endswith(".pdf"):
file_path=download_file(url,str(index))
print "downloading:"+url+"->"+file_path
index+=1
print "all download finished"
import requests
def download_file(url,index):
local_filename=index+"-"+url.split("/")[-1]
r=requests.get(url,stream=True)
with open(local_filename,"wb") as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush
return local_filename
g=google.search('site:*.gov.ph filetype:pdf',tld='com.hk')
index=1
for url in g:
if url.endswith(".pdf"):
file_path=download_file(url,str(index))
print "downloading:"+url+"->"+file_path
index+=1
print "all download finished"
相关文章推荐
- ubuntu sougou输入法
- HDU 5512 Pagodas
- google书签找回
- 数据签名标准算法-DSA (Digital signature Algorithm DSA)
- MTK6572 开机logo 和开机动画配置
- Django自带过滤器总结
- Algorithm Gossip: 得分排行
- Algorithm Gossip: 约瑟夫问题(Josephus Problem)
- Algorithm Gossip: 背包问题(Knapsack Problem)
- URAL 1534 Football in Gondor
- Go编译问题集锦
- View 的setVisibility有三个值:VISIBLE、INVISIBLE和GONE的区别
- Google 镜像站搜集
- good excel website
- 10月全球搜索引擎市场份额:Google、Bing份额大涨
- Microsoft Dynamics CRM2011 更换Logo
- Microsoft Dynamics CRM2011 更换Logo
- Microsoft Dynamics CRM2011 更换Logo
- Microsoft Dynamics CRM2011 更换Logo
- 开源浏览器引擎开发:Google 的人数是 Mozilla 的两倍