Python爬虫学习纪要(九):Requests 库学习笔记4
2017-11-22 19:44
417 查看
4、高级操作
4.1、文件上传
import requests
file = {'file':open('favicon.txt', 'rb')}
respone = equests.post('http://httpbin.org/post',files=files)
print(response.text)
4.2、获取cookies
import requests
r = requests.get('https://www.baidu.com')
print(r.cookies)
for key, value in r.cookies.items():
print(key, '=====', value)
-----------------------------------
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ ===== 27315
-----------------------------------
4.3、会话维持
4.3.1、普通请求
import requests
r = requests.get('http://httpbin.org/cookies/set/number/12456')
print(r.text)
print('=================')
r1 = requests.get('http://httpbin.org/cookies')
#本质上是两次不同的请求,session不一致
print(r1.text)
----------------------------
{
"cookies": {
"number": "12456"
}
}
=================
{
"cookies": {}
}
---------------------------
4.3.2、会话维持请求
import requests
#从Requsets中获取session
session = requests.session()
print(session)
print('===========')
#使用session去请求保证了请求是同一个session
r = session.get('http://httpbin.org/cookies/set/number/12456')
print(r.text)
print('===========')
r1 = session.get('http://httpbin.org/cookies')
print(r1.text)
------------------------------------------------
<requests.sessions.Session object at 0x00000262DE147A90>
===========
{
"cookies": {
"number": "12456"
}
}
===========
{
"cookies": {
"number": "12456"
}
}
-------------------------------------------------
4.4、证书验证
4.4.1、无证书访问
import requests
response = requests.get('https://www.12306.cn')
# 在请求https时,request会进行证书的验证,如果验证失败则会抛出异常
print(response.status_code)
-------------------------------------------------
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)
-------------------------------------------------
4.4.2、关闭证书验证
import requests
#关闭验证,但是仍然会报出证书告警
r = requests.get('https://www.12306.cn', verify = False)
print(r.status_code)
---------------------------------------------------------
Warning (from warnings module):
File "C:\Users\zjhzyjzhaoc1z\AppData\Roaming\Python\Python35\site-packages\urllib3\connectionpool.py", line 852
InsecureRequestWarning)
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
200
---------------------------------------------------------
4.4.3、消除关闭证书验证的警告
from requests.packages import urllib3
import requests
#关闭告警
urllib3.disable_warnings()
r = requests.get('https://www.12306.cn', verify = False)
print(r.status_code)
-----------------------------
200
-----------------------------
4.4.4、手动设置证书
import requests
# 设置本地证书
response = requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key'))
print(response.status_code)
4.5、代理设置
4.5.1、设置普通代理
import requests
proxies = {
'http':'http://127.0.0.1:9743',
'https':'https://127.0.0.1:9743'}
#往请求中设置代理(proxies)
r = requests.get('https://www.taobao.com', proxies = proxies)
print(r.status_code)
4.5.2、设置带有用户名和密码的代理
import requests
proxies = {
'http':'http://user:password@127.0.0.1:9743/'}
#往请求中设置代理(proxies)
r = requests.get('https://www.taobao.com', proxies = proxies)
print(r.status_code)
-----------------------
200
-----------------------
4.6、超时设置
import requests
from requests.exceptions import ReadTimeout
try:
# 设置必须在500ms内收到响应,不然或抛出ReadTimeout异常
r = requests.get("http://httpbin.org/get", timeout=0.5)
print(r.status_code)
except ReadTimeout:
print('Timeout')
-------------
200
-------------
4.7、认证设置
import requests
from requests.auth import HTTPBasicAuth
r = requests.get('http://120.27.34.24:9001', auth = HTTPBasicAuth('user', '123'))
print(r.status_code)
4.8、异常处理
import requests
from requests.exceptions import ReadTimeout, ConnectionError, RequestException
try:
r = requests.get('http://httpbin.org/get', timeout = 0.5)
print(r.status_code)
except ReadTimeout:
# 超时异常
print('Timeout')
except ConnectionError:
# 连接异常
print('Connection error')
except RequestException:
# 请求异常
print('Error')
4.1、文件上传
import requests
file = {'file':open('favicon.txt', 'rb')}
respone = equests.post('http://httpbin.org/post',files=files)
print(response.text)
4.2、获取cookies
import requests
r = requests.get('https://www.baidu.com')
print(r.cookies)
for key, value in r.cookies.items():
print(key, '=====', value)
-----------------------------------
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ ===== 27315
-----------------------------------
4.3、会话维持
4.3.1、普通请求
import requests
r = requests.get('http://httpbin.org/cookies/set/number/12456')
print(r.text)
print('=================')
r1 = requests.get('http://httpbin.org/cookies')
#本质上是两次不同的请求,session不一致
print(r1.text)
----------------------------
{
"cookies": {
"number": "12456"
}
}
=================
{
"cookies": {}
}
---------------------------
4.3.2、会话维持请求
import requests
#从Requsets中获取session
session = requests.session()
print(session)
print('===========')
#使用session去请求保证了请求是同一个session
r = session.get('http://httpbin.org/cookies/set/number/12456')
print(r.text)
print('===========')
r1 = session.get('http://httpbin.org/cookies')
print(r1.text)
------------------------------------------------
<requests.sessions.Session object at 0x00000262DE147A90>
===========
{
"cookies": {
"number": "12456"
}
}
===========
{
"cookies": {
"number": "12456"
}
}
-------------------------------------------------
4.4、证书验证
4.4.1、无证书访问
import requests
response = requests.get('https://www.12306.cn')
# 在请求https时,request会进行证书的验证,如果验证失败则会抛出异常
print(response.status_code)
-------------------------------------------------
urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)
-------------------------------------------------
4.4.2、关闭证书验证
import requests
#关闭验证,但是仍然会报出证书告警
r = requests.get('https://www.12306.cn', verify = False)
print(r.status_code)
---------------------------------------------------------
Warning (from warnings module):
File "C:\Users\zjhzyjzhaoc1z\AppData\Roaming\Python\Python35\site-packages\urllib3\connectionpool.py", line 852
InsecureRequestWarning)
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
200
---------------------------------------------------------
4.4.3、消除关闭证书验证的警告
from requests.packages import urllib3
import requests
#关闭告警
urllib3.disable_warnings()
r = requests.get('https://www.12306.cn', verify = False)
print(r.status_code)
-----------------------------
200
-----------------------------
4.4.4、手动设置证书
import requests
# 设置本地证书
response = requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key'))
print(response.status_code)
4.5、代理设置
4.5.1、设置普通代理
import requests
proxies = {
'http':'http://127.0.0.1:9743',
'https':'https://127.0.0.1:9743'}
#往请求中设置代理(proxies)
r = requests.get('https://www.taobao.com', proxies = proxies)
print(r.status_code)
4.5.2、设置带有用户名和密码的代理
import requests
proxies = {
'http':'http://user:password@127.0.0.1:9743/'}
#往请求中设置代理(proxies)
r = requests.get('https://www.taobao.com', proxies = proxies)
print(r.status_code)
-----------------------
200
-----------------------
4.6、超时设置
import requests
from requests.exceptions import ReadTimeout
try:
# 设置必须在500ms内收到响应,不然或抛出ReadTimeout异常
r = requests.get("http://httpbin.org/get", timeout=0.5)
print(r.status_code)
except ReadTimeout:
print('Timeout')
-------------
200
-------------
4.7、认证设置
import requests
from requests.auth import HTTPBasicAuth
r = requests.get('http://120.27.34.24:9001', auth = HTTPBasicAuth('user', '123'))
print(r.status_code)
4.8、异常处理
import requests
from requests.exceptions import ReadTimeout, ConnectionError, RequestException
try:
r = requests.get('http://httpbin.org/get', timeout = 0.5)
print(r.status_code)
except ReadTimeout:
# 超时异常
print('Timeout')
except ConnectionError:
# 连接异常
print('Connection error')
except RequestException:
# 请求异常
print('Error')
相关文章推荐
- Python爬虫学习纪要(十):Requests 库学习笔记5
- Python爬虫学习纪要(八):Requests 库学习笔记3
- Python爬虫学习纪要(六):Requests 库学习笔记1
- Python 爬虫学习笔记一: requests 模块
- Python爬虫库学习笔记-requests
- 【极客学院】-python学习笔记-3-单线程爬虫 (request安装遇到问题及解决,应用requests提取信息)
- Python爬虫(入门+进阶)学习笔记 1-3 使用Requests爬取豆瓣短评
- Python 开发简单爬虫 学习笔记1
- Python 爬虫学习笔记之单线程爬虫
- Python爬虫学习笔记——豆瓣登陆(一)
- python网络爬虫学习笔记之实力爬虫(
- Python学习笔记(八)爬虫基础(正则和编解码)
- Python爬虫学习笔记——豆瓣登陆(二)
- Python爬虫学习笔记--MySQLdb模块
- Python爬虫框架Scrapy 学习笔记 7------- scrapy.Item源码剖析
- Python的学习笔记DAY7---关于爬虫(2)之Scrapy初探
- Python学习笔记之简单爬虫
- Python Requests-学习笔记(6)-响应头
- Python Requests-学习笔记(11)-请求与响应对象
- Python爬虫学习纪要(五):正则表达式2