使用python向服务器POST大文件
2016-03-17 17:55
393 查看
使用python向服务器POST大文件
python 对http操作有几个库 urllib 、 urllib2 还有httplibhttplib比较偏底层 一般情况下使用urllib和urllib2就行了
NOTICE
在python3中urllib与urllib2被分割合并为了 urllib.request, urllib.parse, and urllib.error
httplib重命名为 http.client
分析http协议
python的这几个库中并没有提供直接上传文件的接口 我们先看下普通浏览器是怎么上传文件的 这里我在本地创建简单php程序 若有提交文件,则打印出文件相关的信息。否则显示一个上传表单表单123456789101112131415161718192021222324 | <?phpheader('Content-Type: text/html; charset=utf-8');if(!empty($_FILES)){ var_dump($_FILES); exit;} ?> <!DOCTYPE HTML><html lang="en-US"><head> <meta charset="UTF-8"> <title></title></head><body style="text-align:center;"> <form action="" method="POST" enctype="multipart/form-data" > <input type="text" name="username" value="ksc"/> <br/> <input type="file" name="file" /> <input type="submit" value="submit"/> </form></body></html> |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | POST http://localhost/test/postfile.php HTTP/1.1 Host: localhost Connection: keep-alive Content-Length: 295 Cache-Control: max-age=0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Origin: http://localhost User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36 Content-Type: multipart/form-data; boundary=----WebKitFormBoundarynHbCm1TeYE1AHFXv DNT: 1 Referer: http://localhost/test/postfile.php Accept-Encoding: gzip,deflate,sdch Accept-Language: zh-CN,zh;q=0.8,en;q=0.6 ------WebKitFormBoundarynHbCm1TeYE1AHFXv Content-Disposition: form-data; name="username" ksc ------WebKitFormBoundarynHbCm1TeYE1AHFXv Content-Disposition: form-data; name="file"; filename="b.txt" Content-Type: text/plain 我是文件内容 ------WebKitFormBoundarynHbCm1TeYE1AHFXv-- |
1234567 | POST http://localhost/test/postfile.php HTTP/1.1Host: localhostContent-Length: 295Content-Type: multipart/form-data; boundary=----WebKitFormBoundarynHbCm1TeYE1AHFXv 内容数据 |
请求头(Request Headers) 比如上面的 Host: localhost、 Content-Length: 295
空行
消息主体 message body
NOTICE请求行与请求头必须以结尾,空行只能有不能有空格什么的 在HTTP/1.1协议中,除了Host 所有的请求头都是可选的(当然若上传文件的话,就必须设置了Content-Length和Content-Type了, 不然服务器收不到数据的,虽然也能成功响应)这里message body的类型是multipart/form-data;boundary 是随机的内容每次请求都不一样, Content-Type为 multipart/form-data; 可同时传输多项数据,而这些数据就是通过 boundary分割开来的每一项的数据都是’–‘+boundary+换行开始 ,然后是Content-Disposition: form-data;name=”表单项名”若是文件的话 还有个filename 以及Content-Type,接下来一个空行最后’–‘+boundary+’–‘+换行结束到这里整个http请求就结束了
模拟提交数据
其实http协议就是字符串按照约定规则拼接到一起 然后服务器再来解析得到数据 所以我们自己直接使用socket也能发起了一个http请求但是有了urllib2我们可以省很多事 只需“拼接”内容部分就行了1 2 | urllib2.urlopen(url[, data][, timeout]) |
1234567891011121314151617181920212223242526272829303132333435363738394041424344 | # coding=utf-8 import urllib2import mimetypesimport osimport uuid mimetypes.init() url='http://localhost/test/postfile.php'fileFieldName='myfile'file_path='/a.txt' boundary='--------ksc'+uuid.uuid4().hex;print('boundary:'+boundary)req = urllib2.Request(url)req.add_header("User-Agent", 'ksc')req.add_header("Content-Type", "multipart/form-data, boundary="+boundary)def getdata(boundary,file_path): global fileFieldName file_name=os.path.basename(file_path) file_name=file_name.encode('utf-8') file_type= mimetypes.guess_type(file_name)[0] #根据文件名匹配文件的mime类型若没有匹配到会返回None if file_type==None: file_type = "text/plain; charset=utf-8" print file_type print file_name fileData=open(file_path,'rb').read() CRLF='\r\n' body = ''; body += '--' + boundary + CRLF; body += 'Content-Disposition: form-data; name="'+fileFieldName+'"; filename="' + file_name + '"'+CRLF; body += "Content-Type: "+file_type+CRLF body += CRLF; body += fileData + CRLF; body += "--" + boundary + "--"+CRLF; print 'body size:{0}'.format(len(body)) return body #req.add_header('Content-Length',len(body) ) #urllib2会自动添加res = urllib2.urlopen(req,getdata(boundary,file_path))print res.read().decode('utf-8'); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | boundary:--------ksc114eaa38291f43fdaca15ab3b4265a82 text/plain a.txt body size:388 array(1) { ["myfile"]=> array(5) { ["name"]=> string(5) "a.txt" ["type"]=> string(10) "text/plain" ["tmp_name"]=> string(24) "D:\xampp\tmp\php151B.tmp" ["error"]=> int(0) ["size"]=> int(197) } } |
12 | fileData=open(file_path,'rb').read() |
1 2 3 4 5 6 7 8 9 10 | def read_file(fpath): BLOCK_SIZE = 1024 with open(fpath, 'rb') as f: while True: block = f.read(BLOCK_SIZE) if block: yield block else: return |
1234567891011121314151617181920 | def send(self, data): """Send data to the server.""" if self.sock is None: if self.auto_open: self.connect() else: raise NotConnected() if self.debuglevel > 0: print "send:", repr(data) blocksize = 8192 if hasattr(data,'read') and not isinstance(data, array): if self.debuglevel > 0: print "sendIng a read()able" datablock = data.read(blocksize) while datablock: self.sock.sendall(datablock) datablock = data.read(blocksize) else: self.sock.sendall(data) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | def send(self, data): """Send data to the server.""" if self.sock is None: if self.auto_open: self.connect() else: raise NotConnected() if self.debuglevel > 0: "send:", repr(data) blocksize = 8192 if hasattr(data,'next'):#支持迭代 for datablock in data: self.sock.sendall(datablock) elif hasattr(data,'read') and not isinstance(data, array): if self.debuglevel > 0: "sendIng a read()able" datablock = data.read(blocksize) while datablock: self.sock.sendall(datablock) datablock = data.read(blocksize) else: self.sock.sendall(data) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | # coding=utf-8 import httplib import mimetypes import os import uuid import socket mimetypes.init() url='http://localhost/test/postfile.php' fileFieldName='myfile' file_path='/temp/a.txt' content_length=0 class builder: def __init__(self): self.boundary='--------ksc'+uuid.uuid4().hex; self.boundary='--------kscKKJFkfo93jmjfd0ismf' self.ItemHeaders=[]; self.CRLF='\n' def getBoundary(self): return self.boundary def getHeaders(self): '''return Request Headers''' headers={"Content-Type":"multipart/form-data, boundary="+self.boundary ,"Content-Length":self.getContentLength() } return headers def putItemHeader(self,fieldName,file_path): file_name=os.path.basename(file_path) file_name file_name=file_name.encode('utf-8') file_type= mimetypes.guess_type(file_name)[0] #guess file's mimetype if file_type==None: file_type = "text/plain; charset=utf-8" file_type CRLF=self.CRLF head = ''; head += '--' + self.boundary + CRLF; head += 'Content-Disposition: form-data; name="'+fieldName+'"; filename="' + file_name + '"'+CRLF; head += "Content-Type: "+file_type+CRLF head += CRLF; self.ItemHeaders.append({'head':head,'file_path':file_path}) def getContentLength(self): length=0 for item in self.ItemHeaders: length+=len(item['head'])+os.path.getsize(item['file_path'])+len(self.CRLF) return length+len("--" + self.boundary + "--"+self.CRLF) def getdata(self): blocksize=4096 for item in self.ItemHeaders: yield item['head'] fileobj=open(item['file_path'],'rb') while True: block = fileobj.read(blocksize) if block: yield block else: yield self.CRLF break body = "--" +self.boundary + "--"+self.CRLF; yield body class MyHTTPConnection(httplib.HTTPConnection): def send(self, value): if self.sock is None: if self.auto_open: self.connect() '-----------------ksc: reconnect' else: raise NotConnected() try: if hasattr(value,'next'): "--sendIng an iterable" for data2 in value: self.sock.sendall(data2) else: '\n--send normal str' self.sock.sendall(value) except socket.error, v: if v[0] == 32: # Broken pipe print(v) self.close() raise '--send end' bl=builder() bl.putItemHeader(fileFieldName, file_path) bl.putItemHeader('bb', u'/temp/贪吃蛇.zip') #bl.putItemHeader('zipfile', u'/temp/a.zip') #注意web端post数据大小限制 #这里 php.ini post_max_size = 30M content_length=bl.getContentLength() "content_length:"+str(content_length) url='/test/postfile.php' #这里使用fiddler捕捉数据包,设置了fiddler为代理 fiddler_monitor=True fiddler_monitor=False if fiddler_monitor: httpconnect=MyHTTPConnection('127.0.0.1',8888) url='http://localhost'+url else: httpconnect=MyHTTPConnection('localhost') httpconnect.set_debuglevel(1) headers = bl.getHeaders() data = bl.getdata() httpconnect.request('POST', url, data ,headers) httpres=httpconnect.getresponse() httpres.read().decode('utf-8') |
等运行稳定了会添加到kuaipan cli 里面,就可以摆脱poster依赖
相关文章推荐
- python 的装饰器解释
- Python 图片转字符画
- python把中文文档变为拼音
- python多元赋值
- python的wifi 正弦函数接收器
- Python搜索目录下指定的文件,并返回绝对路径(包括子目录)
- Python-pillow
- 提高Python运行效率的6个小窍门
- Java之——调用python方法
- 《笨办法学python3》的学习笔记(10-15)节
- python中文处理
- python I/O编程
- Python的基本使用
- 2016.3.17.Python之循环
- python装饰器
- python进阶
- python多线程简单同步问题
- Python安装、集成Eclipse及HelloWorld
- Python和数据科学的起步指南
- python 机器学习入门资料