您的位置:首页 > 编程语言 > Python开发

Python爬虫学习纪要(九):Requests 库学习笔记4

2017-11-22 19:44 417 查看
4、高级操作

4.1、文件上传
import requests

file = {'file':open('favicon.txt', 'rb')}

respone = equests.post('http://httpbin.org/post',files=files)

print(response.text)

4.2、获取cookies
import requests

r = requests.get('https://www.baidu.com')

print(r.cookies)

for key, value in r.cookies.items():

    print(key, '=====', value)

-----------------------------------

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

BDORZ ===== 27315

-----------------------------------

4.3、会话维持

4.3.1、普通请求
import requests

r = requests.get('http://httpbin.org/cookies/set/number/12456')

print(r.text)

print('=================')

r1 = requests.get('http://httpbin.org/cookies')

#本质上是两次不同的请求,session不一致

print(r1.text)

----------------------------

{

  "cookies": {

    "number": "12456"

  }

}

=================

{

  "cookies": {}

}

---------------------------

4.3.2、会话维持请求

import requests

#从Requsets中获取session

session = requests.session()

print(session)

print('===========')

#使用session去请求保证了请求是同一个session

r = session.get('http://httpbin.org/cookies/set/number/12456')

print(r.text)

print('===========')

r1 = session.get('http://httpbin.org/cookies')

print(r1.text)

------------------------------------------------

<requests.sessions.Session object at 0x00000262DE147A90>

===========

{

  "cookies": {

    "number": "12456"

  }

}

===========

{

  "cookies": {

    "number": "12456"

  }

}

-------------------------------------------------

4.4、证书验证

4.4.1、无证书访问

import requests

response = requests.get('https://www.12306.cn')

# 在请求https时,request会进行证书的验证,如果验证失败则会抛出异常

print(response.status_code) 

-------------------------------------------------

urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

-------------------------------------------------

4.4.2、关闭证书验证

import requests

#关闭验证,但是仍然会报出证书告警

r = requests.get('https://www.12306.cn', verify = False)

print(r.status_code)  

---------------------------------------------------------

Warning (from warnings module):

  File "C:\Users\zjhzyjzhaoc1z\AppData\Roaming\Python\Python35\site-packages\urllib3\connectionpool.py", line 852

    InsecureRequestWarning)

InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
200

---------------------------------------------------------
4.4.3、消除关闭证书验证的警告

from requests.packages import urllib3

import requests

#关闭告警

urllib3.disable_warnings()

r = requests.get('https://www.12306.cn', verify = False)

print(r.status_code)

-----------------------------

200

-----------------------------

4.4.4、手动设置证书

import requests

# 设置本地证书

response = requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key'))

print(response.status_code) 

4.5、代理设置

4.5.1、设置普通代理

import requests

proxies = {

    'http':'http://127.0.0.1:9743',

    'https':'https://127.0.0.1:9743'}

#往请求中设置代理(proxies)

r = requests.get('https://www.taobao.com', proxies = proxies)

print(r.status_code)

4.5.2、设置带有用户名和密码的代理

import requests

proxies = {

    'http':'http://user:password@127.0.0.1:9743/'}

#往请求中设置代理(proxies)

r = requests.get('https://www.taobao.com', proxies = proxies)

print(r.status_code)

-----------------------

200

-----------------------

4.6、超时设置

import requests

from requests.exceptions import ReadTimeout

try:

   # 设置必须在500ms内收到响应,不然或抛出ReadTimeout异常

   r = requests.get("http://httpbin.org/get", timeout=0.5)

   print(r.status_code)

            

except ReadTimeout:

   print('Timeout')

-------------

200

-------------

4.7、认证设置

import requests

from requests.auth import HTTPBasicAuth

r = requests.get('http://120.27.34.24:9001', auth = HTTPBasicAuth('user', '123'))

print(r.status_code)

4.8、异常处理

import requests

from requests.exceptions import ReadTimeout, ConnectionError, RequestException

try:

   r = requests.get('http://httpbin.org/get', timeout = 0.5)

           print(r.status_code)

except ReadTimeout:

   # 超时异常

   print('Timeout')

except ConnectionError:

   # 连接异常

   print('Connection error')

except RequestException:

   # 请求异常

   print('Error')
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: