您的位置：首页 > 编程语言 > Python开发

Python requests模块的学习

2016-07-14 09:03 441 查看

学习环境：python 2.7 windows10

一、 requests get 请求

1.获得一个get请求

r = requests.get("http://www.hactcm.edu.cn"

2.获得网页文本

print r.text

输出结果
ï»¿<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head><title>æ²³å—ä¸åŒ»è¯å¤§å¦ä¸æ–‡ç½‘</title>
<meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7" />
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><link rel="stylesheet" type="text/css" href="style/style.css">
<style>

3.可以看到乱码。打印requests获得的网页编码

print r.encoding

输出结果是

ISO-8859-1

4.可以知道正确编码未获得可以手工指定编码

r.encoding='utf-8'

5.重新获得网页文本

print r.text

输处的网页文本

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head><title>河南中医药大学中文网</title>
<meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7" />
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><link rel="stylesheet" type="text/css" href="style/style.css">
<style>

可以看到编码正确

6.指定带参数的的get请求

url='http://www.sinopharm-henan.com/front/index/section1'
pars={"sectionId":'2'}#参数
r = requests.get(url,params=pars)
print r.url

输出的结果是

http://www.sinopharm-henan.com/front/index/section1?sectionId=2

7.也可以指定head头

例如

url='http://www.sinopharm-henan.com/front/index/section1'
pars={"sectionId":'2'}#参数
header={"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0",\
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",\
"Content-Type":"application/x-www-form-urlencoded"
}
r = requests.get(url,params=pars)
print r.url

8.获取响应码

print r.status_code

输出结果

具体更多参数可以参看w3c或图解http这本书

9.稍微深入一下看一下get函数的代码

def get(url, params=None, **kwargs):
"""Sends a GET request.

:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
"""

kwargs.setdefault('allow_redirects', True)
return request('get', url, params=params, **kwargs)

它实际上是调用的的request函数

def request(method, url, **kwargs):
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
:param json: (optional) json data to send in the body of the :class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
:param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': ('filename', fileobj)}``) for multipart encoding upload.
:param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How long to wait for the server to send data
before giving up, as a float, or a :ref:`(connect timeout, read
timeout) <timeouts>` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Boolean. Set to Tru
d856
e if POST/PUT/DELETE redirect following is allowed.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
:param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.
:param stream: (optional) if ``False``, the response content will be immediately downloaded.
:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
:return: :class:`Response <Response>` object
:rtype: requests.Response

....省略
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)

request的函数调用的是session中的request,session.request，它调用的是session.send方法具体的可以自己看源码

二、post 请求

1.得到一个post请求

url='http://www.sinopharm-henan.com/front/index/section1'
data={"sectionId":'2'}
header={"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0",\
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",\
"Content-Type":"application/x-www-form-urlencoded"
}
r = requests.post(url, data=data, headers=header)
print r.url

2.传入cookies

url='http://www.sinopharm-henan.com/front/index/section1'
cookie={'sdf':'123'}
data={"sectionId":'2'}
header={"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0",\
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",\
"Content-Type":"application/x-www-form-urlencoded"
}
r = requests.post(url, data=data, headers=header,cookies=cookie)
print r.url

抓取数据包验证一下

3.另附r.text 和r.content的区别

先看一下content函数的源码

def content(self):
"""Content of the response, in bytes."""

if self._content is False:
# Read the contents.
try:
if self._content_consumed:
raise RuntimeError(
'The content for this response was already consumed')

if self.status_code == 0:
self._content = None
else:
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()

except AttributeError:
self._content = None

self._content_consumed = True
# don't need to release the connection; that's been handled by urllib3
# since we exhausted the data.
return self._content

@property

再看一下text函数的源代码

def text(self):
"""Content of the response, in unicode.

If Response.encoding is None, encoding will be guessed using
``chardet``.

The encoding of the response content is determined based solely on HTTP
headers, following RFC 2616 to the letter. If you can take advantage of
non-HTTP knowledge to make a better guess at the encoding, you should
set ``r.encoding`` appropriately before accessing this property.
"""

# Try charset from content-type
content = None
encoding = self.encoding

if not self.content:
return str('')

# Fallback to auto-detected encoding.
if self.encoding is None:
encoding = self.apparent_encoding

# Decode unicode from given encoding.
try:
content = str(self.content, encoding, errors='replace')
except (LookupError, TypeError):
# A LookupError is raised if the encoding was not found which could
# indicate a misspelling or similar mistake.
#
# A TypeError can be raised if encoding is None
#
# So we try blindly encoding.
content = str(self.content, errors='replace')

return content

同时看一下返回值得类型

content的函数返回值类型
print type(r.content) #
<type 'str'>

text的函数返回值类型
print type(r.text)
<type 'unicode'>

源代码的注释也说得很清楚，content 返回的bytes数组转成的字符串。text是经过编码后的Unicode型的数据

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： python requests学习

相关文章推荐

新的分享

章节导航