您的位置：首页 > 编程语言 > Python开发

爬虫－python调用百度API／requests

2018-09-04 15:17 323 查看

爬虫－python调用百度API／requests

from

urllib.request

import

urlopen

import

requests

import

json

url

"http://apis.baidu.com/txapi/mvtp/meinv"

#API

req

requests.get(url)

#request.get(url,param=param,headers = headers)

headers

'apikey'

'自己的apikey'

#自己的apikey

params

'num'

'5'

#请求参数(urlParam) :

requests.get(url,params

params,headers

headers)

print

(r)

print

(r)

def

SaveImage(ImageUrl,ImgName

'default.jpg'

):

response

requests.get(ImageUrl,stream

True

image

response.content

dst

'/Users/Alan/desktop/BaiDUImages/'

#路径

path

dst

ImgName

print

'Save the file:'

,path)

with

open

(path,

'wb'

) as img:

img.write(image)

def

run():

for

line

in

r[

'newslist'

]:

title

line[

'title'

picUrl

line[

'picUrl'

SaveImage(picUrl,ImgName

title

'.jpg'

run()

第二课：

#!/usr/bin/env python3

#antuor:Alan

import

requests

import

re

headers

'User-Agent'

'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36'

#审查元素 ，network 里的request headers,复制过来，变成字典形式url = 'http://www.cnblogs.com/wupeiqi/articles/4938499.html'

html

requests.get(url,headers

headers)

title

re.findall(

'href="(.*?)</a>'

,html.text)

#查找所有以href="开头＋</a>结尾的内容for i in title:　　print(i)

输入结果：

http://www.cnblogs.com/wupeiqi/">Mr.Seven
http://www.cnblogs.com/">博客园
http://www.cnblogs.com/wupeiqi/">首页
http://i.cnblogs.com/EditPosts.aspx?opt=1">新随笔
http://msg.cnblogs.com/send/%E6%AD%A6%E6%B2%9B%E9%BD%90">联系
http://www.cnblogs.com/wupeiqi/rss">订阅
http://www.cnblogs.com/wupeiqi/rss"><img src="http://www.cnblogs.com/images/xml.gif" alt="订阅" />
http://i.cnblogs.com/">管理
http://www.cnblogs.com/wupeiqi/articles/4938499.html">Python之路【目录】
http://www.cnblogs.com/wupeiqi/articles/4906230.html">Python之路【第一篇】：Python简介和入门
http://www.cnblogs.com/wupeiqi/articles/4911365.html">Python之路【第二篇】：Python基础（一）
http://www.cnblogs.com/wupeiqi/articles/4943406.html">Python之路【第三篇】：Python基础（二）
http://www.cnblogs.com/wupeiqi/articles/4963027.html">Python之路【第四篇】：模块
http://www.cnblogs.com/wupeiqi/articles/5017742.html">Python之路【第五篇】：面向对象及相关
http://www.cnblogs.com/wupeiqi/articles/5040823.html" target="_blank">Python之路【第六篇】：Socket
http://www.cnblogs.com/wupeiqi/articles/5040827.html" target="_blank">Python之路【第七篇】：线程、进程和协程
http://www.cnblogs.com/wupeiqi/articles/5095821.html">Python之路【第八篇】：堡垒机实例以及数据库操作
http://www.cnblogs.com/wupeiqi/articles/5132791.html" target="_blank">Python之路【第九篇】：Python操作 RabbitMQ、Redis、Memcache、SQLAlchemy
http://www.cnblogs.com/wupeiqi/articles/5237672.html" target="_blank">Python之路【第十五篇】：Web框架
http://www.cnblogs.com/wupeiqi/articles/5237704.html" target="_blank">Python之路【第十六篇】：Django【基础篇】
http://www.cnblogs.com/wupeiqi/articles/5246483.html" target="_blank">Python之路【第十七篇】：Django【进阶篇】
http://www.cnblogs.com/wupeiqi/articles/5341480.html" target="_blank">Python之路【第十八篇】：Web框架们
http://www.cnblogs.com/wupeiqi/articles/4949995.html" target="_blank">计算器源码
http://www.cnblogs.com/wupeiqi/articles/4980620.html" target="_blank">装饰器
http://i.cnblogs.com/EditArticles.aspx?postid=4938499" rel="nofollow">编辑
#" onclick="AddToWz(4938499);return false;">收藏

第三课：对异步加载的网页，可以用post向提交数据

#!/usr/bin/env python3

#antuor:Alan

import

requests

import

re

url

'http://www.crowdfunder.com/browse/deals'

data

'entities_only'

'true'

#是字符串，不是布尔值

'page'

'1'

＃是字符串

，不是数字

html

requests.post(url,data

data)  ＃用post方法，不用get

title

re.findall(

'"card-title">(.*?)</div>'

,html.text)

for

each

in

title:

print

(each)

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航