python urllib模块
2015-11-18 22:36
666 查看
urllib提供了一系列用于操作URL的功能。
Get
urllib的request模块可以非常方便地抓取URL内容,也就是发送一个GET请求到指定的页面,然后返回HTTP的响应:方法是用urlopen函数,它的参数是url字符串或者是Request对象,他返回一个HTTPResponse对象
例如,对豆瓣的一个URLhttps://api.douban.com/v2/book/2129650进行抓取,并返回响应:
下面是HTTPResponse对象:
An HTTPResponse instance wraps the HTTP response from the server. It provides access to the request headers and the entity body. The response is an iterable object and can be used in a with statement.
HTTPResponse.read([amt])
HTTPResponse.readinto(b)
HTTPResponse.getheader(name, default=None)
HTTPResponse.getheaders()
HTTPResponse.fileno()
HTTPResponse.msg
HTTPResponse.version
HTTPResponse.status
HTTPResponse.reason
HTTPResponse.debuglevel
HTTPResponse.closed
如果我们要想模拟浏览器发送GET请求,就需要使用Request对象,通过往Request对象添加HTTP头,我们就可以把请求伪装成浏览器。例如,模拟火狐去请求Python首页:
其中User-agent是表示浏览器
Request对象都有什么属性和方法
Get模拟微博登录:
Post
如果要以POST发送一个请求,只需要把参数data以bytes形式传入。
我们模拟一个微博登录,先读取登录的邮箱和口令,然后按照weibo.cn的登录页的格式以username=xxx&password=xxx的编码传入:
浅析HTTP协议
HTTP 请求方式: GET和POST的比较
http(百度百科)
HTTP协议详解
Get
urllib的request模块可以非常方便地抓取URL内容,也就是发送一个GET请求到指定的页面,然后返回HTTP的响应:方法是用urlopen函数,它的参数是url字符串或者是Request对象,他返回一个HTTPResponse对象
例如,对豆瓣的一个URLhttps://api.douban.com/v2/book/2129650进行抓取,并返回响应:
from urllib import request url='https://api.douban.com/v2/book/2129650' #urlopen的参数是url字符串或者是Request对象,返回值为HTTPResponse with request.urlopen(url) as f: data=f.read() print('Statue: ',f.status,f.reason) for k,v in f.getheaders(): print('%s: %s' % (k,v)) print('Data: ',data.decode('utf-8'))
下面是HTTPResponse对象:
An HTTPResponse instance wraps the HTTP response from the server. It provides access to the request headers and the entity body. The response is an iterable object and can be used in a with statement.
HTTPResponse.read([amt])
Reads and returns the response body, or up to the next amt bytes.
HTTPResponse.readinto(b)
Reads up to the next len(b) bytes of the response body into the buffer b. Returns the number of bytes read. New in version 3.3.
HTTPResponse.getheader(name, default=None)
Return the value of the header name, or default if there is no header matching name. If there is more than one header with the name name, return all of the values joined by ‘, ‘. If ‘default’ is any iterable other than a single string, its elements are similarly returned joined by commas.
HTTPResponse.getheaders()
Return a list of (header, value) tuples.
HTTPResponse.fileno()
Return the fileno of the underlying socket.
HTTPResponse.msg
A http.client.HTTPMessage instance containing the response headers. http.client.HTTPMessage is a subclass of email.message.Message.
HTTPResponse.version
HTTP protocol version used by server. 10 for HTTP/1.0, 11 for HTTP/1.1.
HTTPResponse.status
Status code returned by server.
HTTPResponse.reason
Reason phrase returned by server.
HTTPResponse.debuglevel
A debugging hook. If debuglevel is greater than zero, messages will be printed to stdout as the response is read and parsed.
HTTPResponse.closed
Is True if the stream is closed.
如果我们要想模拟浏览器发送GET请求,就需要使用Request对象,通过往Request对象添加HTTP头,我们就可以把请求伪装成浏览器。例如,模拟火狐去请求Python首页:
其中User-agent是表示浏览器
Request对象都有什么属性和方法
from urllib import request url='https://www.python.org/' req=request.Request(url) req.add_header('User_agent','Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11') with request.urlopen(req) as f: print('Status:', f.status, f.reason) for k, v in f.getheaders(): print('%s: %s' % (k, v)) print('Data:', f.read().decode('utf-8'))
Get模拟微博登录:
from urllib import request,parse print('Login to weibo.cn...') url='https://passport.weibo.cn/sso/login?username=xxxxxx&password=xxxxxx' print(url) req=request.Request(url) req.add_header('Origin', 'https://passport.weibo.cn') req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25') req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F') with request.urlopen(req) as f: print('Status:', f.status, f.reason) for k, v in f.getheaders(): print('%s: %s' % (k, v)) print('Data:', f.read().decode('utf-8'))
Post
如果要以POST发送一个请求,只需要把参数data以bytes形式传入。
我们模拟一个微博登录,先读取登录的邮箱和口令,然后按照weibo.cn的登录页的格式以username=xxx&password=xxx的编码传入:
from urllib import request,parse print('Login to weibo.cn...') url='https://passport.weibo.cn/sso/login' email=input('Email: ') password=input('Password: ') login_data=parse.urlencode([ ('username',email), ('password',password), ('entry','mweibo'), ('client_id',''), ('savestate','1'), ('ec',''), ('pagerefer', 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F') ]) req=request.Request(url) req.add_header('Origin', 'https://passport.weibo.cn') req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25') req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F') with request.urlopen(req,data=login_data.encode('utf-8')) as f: print('Status:', f.status, f.reason) for k, v in f.getheaders(): print('%s: %s' % (k, v)) print('Data:', f.read().decode('utf-8'))
浅析HTTP协议
HTTP 请求方式: GET和POST的比较
http(百度百科)
HTTP协议详解
相关文章推荐
- leetcode Integer to Roman python
- python之模块hashlib(提供了常见的摘要算法,如MD5,SHA1等等)
- 练习PYTHON之EVENTLET
- python之WebSocket的开发
- 练习PYTHON之GEVENT
- Python pexpect出现错误‘module have no attribute "spawn" 解决办法
- Python/scikit-learn机器学习库(决策树)
- LeetCode OJ 系列之78 Subsets --Python
- python之string操作汇总
- python之实现ftp上传下载代码(含错误处理)
- python里面的函数参数
- 练习PYTHON协程之GREENLET
- python之模块ftplib(实现ftp上传下载代码)
- python_2
- [Python标准库]string——文本常量和模板
- python之模块ftplib(FTP协议的客户端)
- 树莓派2代B model 上手初体验,不用显示器,Python GPIO 点亮一颗LED
- Python基础学习-如何安装第三方库
- Numpy快速入门
- Ubuntu14.04安装pycharm用于Python开发环境部署,并且支持pycharm使用中文输入