您的位置：首页 > 其它

werkzeug源码阅读笔记(二) 上

2015-06-30 22:32 323 查看

因为第一部分是关于初始化的部分的，我就没有发布出来~

wsgi.py
————第一部分

在分析这个模块之前, 需要了解一下

WSGI

, 大致了解了之后再继续~

get_current_url()

函数

很明显，该函数的作用是获取当前url地址。代码如下：

def get_current_url(environ, root_only=False, strip_querystring=False,
host_only=False, trusted_hosts=None):
"""
:param environ: the WSGI environment to get the current URL from.
:param root_only: set `True` if you only want the root URL.
:param strip_querystring: set to `True` if you don't want the querystring.
:param host_only: set to `True` if the host URL should be returned.
:param trusted_hosts: a list of trusted hosts, see :func:`host_is_trusted`
for more information.
"""
tmp = [environ['wsgi.url_scheme'], '://', get_host(environ, trusted_hosts)]
cat = tmp.append
if host_only:
return uri_to_iri(''.join(tmp) + '/')
#这里, temp将变成root_only的地址
cat(url_quote(wsgi_get_bytes(environ.get('SCRIPT_NAME', ''))).rstrip('/'))
cat('/')
if not root_only:
cat(url_quote(wsgi_get_bytes(environ.get('PATH_INFO', '')).lstrip(b'/')))
if not strip_querystring:
qs = get_query_string(environ)
if qs:
cat('?' + qs)
return uri_to_iri(''.join(tmp))

注意11~12行, 最开始那个append我也没懂, 网上也找不到, 于是我试了下:

>>> temp = [1,2,3]
>>> temp
[1, 2, 3]
>>> aa = temp.append
>>> aa(2)
>>> temp
[1, 2, 3, 2]

很明显, 当

aa = temp.append

之后，

aa

变成了一个函数,

aa(1)

等效于

temp.append(1)

参数

host_only

的意思是只取host地址，比如

http://www.baidu.com/xxx

,其host地址就是

http://www.baidu.com

函数最后

return uri_to_iri

, 是把该URI地址转换成IRI(IRI包含unicode字符，URI是ASCII字符编码)

get_query_string()

函数

在

wsgi.py

中, 有很多类似的函数, 用来获得对应的url字段, 这里我拿出一个来分析, 其他的都大同小异

def get_query_string(environ):
qs = wsgi_get_bytes(environ.get('QUERY_STRING', ''))
# QUERY_STRING really should be ascii safe but some browsers
# will send us some unicode stuff (I am looking at you IE).
# In that case we want to urllib quote it badly.
#上面那句我查阅了urllib.parse.quote()方法，意思好像是把部分敏感词汇使用%xx来隐藏, `safe`参数中的部分使用ascii编码，不用隐藏
return try_coerce_native(url_quote(qs, safe=':&%=+$!*\'(),'))

get_query_string(environ)

该函数的作用是把environ变量转换成latin-1编码(程序段中注释说ascii编码较安全, 但很多浏览器发送的是unicode编码的字串, 所以需要统一编码, latin-1向下兼容ascii)

接下来, 在返回值中我们可以看到

url_quote

函数, 查询源码：

def url_quote(string, charset='utf-8', errors='strict', safe='/:', unsafe=''):
"""URL encode a single string with a given encoding."""

if not isinstance(string, (text_type, bytes, bytearray)):
string = text_type(string)
if isinstance(string, text_type):
string = string.encode(charset, errors)
if isinstance(safe, text_type):
safe = safe.encode(charset, errors)
if isinstance(unsafe, text_type):
unsafe = unsafe.encode(charset, errors)
safe = frozenset(bytearray(safe) + _always_safe) - frozenset(bytearray(unsafe)) #去除unsafe的部分，并转换成bytearray
rv = bytearray()
for char in bytearray(string):
if char in safe:
rv.append(char)
else:
rv.extend(('%%%02X' % char).encode('ascii'))
return to_native(bytes(rv))

从代码中我们可以知道：传入的

string

和

safe

和

unsafe

参数将被转换成类型为

string

, 编码方式为

charset

的数据, 其中

charset

默认为

utf-8

, 可以自己指定。最后再把string转换成

bytearray

, 按规则输出

try_coerce_native

在源码中是

try_coerce_native=_identity

_identity=lambda x: x

，综合起来

try_coerce_native(a) = a

在本代码段中，还有个很重要的东西：

bytearray()

查阅文档，

bytearray(source, encoding, errors)

一共有三个参数，第一个自然是需要转换的内容，第二个是编码方式

为了理解

bytearray

, 我写了如下的代码：

>>> string = 'aaaa'
>>> temp = bytearray(string)
Traceback (most recent call last):
File "<pyshell#50>", line 1, in <module>
temp = bytearray(string)
TypeError: string argument without an encoding

提示告诉我，需要增加编码方式，于是进行改进：

>>> string = 'aaaa'.encode('utf-8')
>>> temp = bytearray(string)
>>> print(temp)
bytearray(b'aaaa')          #注意这个'b'

成功了，然后我又做了如下操作：

>>> for i in temp:
print(i, end=' ')

97 97 97 97

这个和预想的有点不一样啊，为什么不是输出4个a呢？

原来，我们把string编码成utf-8之后，放入了bytearray()中, temp自然也是utf-8编码的，当输出的时候，自然输出的是utf-8的内容了
同时，本例还说说明了bytearray()的对象是可迭代的

这样，我们就能明白
url_quote()
函数的意义了：

在函数中，先把

string

和

safe

和

unsafe

转成

utf-8

编码，然后都转成可迭代的

bytearray()

, 逐位比对

string

中是否含有

safe

中的字符，如果有，则不转换，直接输出; 如果没有，则执行

rv.extend(('%%%02X' % char).encode('ascii'))

，从而完成了url地址中

query_string

部分的转化(专业要求见

get_query_string

函数中的备注)

('%%%02X' % char)

: 前两个

%%

输出一个

, 后面

%02X

和C语言中一样: 输出2位十进制整数,不足2位的在前面补零

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

werkzeug源码阅读笔记(二) 上

wsgi.py
————第一部分

get_current_url()
函数

get_query_string()
函数

在本代码段中，还有个很重要的东西：
bytearray()

这样，我们就能明白
url_quote()
函数的意义了：

werkzeug源码阅读笔记(二) 上

wsgi.py————第一部分

get_current_url()函数

get_query_string()函数

在本代码段中，还有个很重要的东西：bytearray()

这样，我们就能明白url_quote()函数的意义了：

wsgi.py
————第一部分

get_current_url()
函数

get_query_string()
函数

在本代码段中，还有个很重要的东西：
bytearray()

这样，我们就能明白
url_quote()
函数的意义了：