您的位置：首页 > 编程语言 > Python开发

4.2.4 Python特有编码

2015-11-14 17:19 639 查看

4.2.4 Python特有编码

Python还内置一些特有的编码集。

4.2.4.1 文本编码

Python提供了下面从字符串到字节数组的编码，以及字节数据到字符串的解码：

Codec	Aliases	Purpose
idna		Implements RFC 3490, see also encodings.idna. Only errors='strict' is supported.
mbcs	dbcs	Windows only: Encode operand according to the ANSI codepage (CP_ACP)
palmos		Encoding of PalmOS 3.5
punycode		Implements RFC 3492. Stateful codecs are not supported.
raw_unicode_escape		Latin-1 encoding with \uXXXX and \UXXXXXXXX for other code points. Existing backslashes are not escaped in any way. It is used in the Python pickle protocol.
undefined		Raise an exception for all conversions, even empty strings. The error handler is ignored.
unicode_escape		Encoding suitable as the contents of a Unicode literal in ASCII-encoded Python source code, except that quotes are not escaped. Decodes from Latin-1 source code. Beware that Python source code actually uses UTF-8 by default.
unicode_internal		Return the internal representation of the operand. Stateful codecs are not supported. Deprecated since version 3.3: This representation is obsoleted by PEP 393

4.2.4.2 二进制编码转换

Python提供下面的二进制编码转换：字节对象到字节对象映射转换，不支持使用bytes.decode()。

Codec	Aliases	Purpose	Encoder / decoder
base64_codec [1]	base64, base_64	Convert operand to MIME base64 (the result always includes a trailing '\n') Changed in version 3.4: accepts any bytes-like object as input for encoding and decoding	base64.b64encode() / base64.b64decode()
bz2_codec	bz2	Compress the operand using bz2	bz2.compress() / bz2.decompress()
hex_codec	hex	Convert operand to hexadecimal representation, with two digits per byte	base64.b16encode() / base64.b16decode()
quopri_codec	quopri, quotedprintable, quoted_printable	Convert operand to MIME quoted printable	quopri.encodestring() / quopri.decodestring()
uu_codec	uu	Convert the operand using uuencode	uu.encode() / uu.decode()
zlib_codec	zip, zlib	Compress the operand using gzip	zlib.compress() / zlib.decompress()

4.2.4.3 文本编码转换

下面编解码器支持字符串到字符串的转换：

Codec	Aliases	Purpose
rot_13	rot13	Returns the Caesar-cypher encryption of the operand

4.2.5 encodings.idna--国际化域名的应用

本模块实现了RFC 3490(Internationalized Domain Names in Applications)和RFC 3492(Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN) 的功能。它实现的功能建立在punycode编码和stringprep模块之上。
这两个RFC定义了非ASCII字符表示域名的规范。如果一个域名含有非ASCII字符，需要把它转换为ASCII兼容编码的域名（ACE），因为有一些网络协议不支持非ASCII字符的域名，比如DNS查询、HTTP主机等等。因此这些转换工作可以人工转换，也可以是程序转换。在程序里转换，需要把UNICODE的域名转换为ACE兼容的域名之后，才能进行处理，当要给用户显示时需要从ACE反向转换为UNICODE域名。

Python提供了好几种方法来做转换的工作：使用idna编解码来操作UNICODE与ACE之间相互转换；把输入字符串分离成标记，然后通过RFC3490进行查表，再合并成相应的域名；最后一种是把输入字符串分成标记，通过ACE标记转换。在socket模块里，就透明地实现了从UNICODE主机名称转换为ACE域名，所以应用程序在调用这个模块时就不需要考虑UNICODE域名转换为ACE域名的工作了。基于socket模块之上的功能，比如http.client和ftplib都可以接受UNICODE域名。

当从网络收到的域名，它是不会自动转换为 UNICODE域名的，需要应用程序进行转换之后，才能以UNICODE域名显示给用户。

模块encodings.idna也实现nameprep的处理，它能实现主机名称的标准化处理，域名的大小写统一化，如果需要这些功能是可以直接使用。

encodings.idna.nameprep(label)
返回label的国际化标志名称。

encodings.idna.ToASCII(label)
转换label为ASCII表示，符合标准RFC 3490。

encodings.idna.ToUnicode(label)
转换label为UNICODE表示，符合标准RFC 3490.

4.2.6 encodings.mbcs--Windows的ANSI编码

本模块实现从ANSI（CP_ACP）代码进行编码的功能。仅在Windows系统上可用。

4.2.7 encodings.utf_8_sig-UTF-8带BOM标记的codec编码

本模块实现UTF-8的编码和解码：把带有BOM的UTF-8编码转换为不带BOM的UTF-8编码。当在生成BOM时，只在第一次时生成；当在解码时跳过相应的BOM标记字节，再进行解码。

蔡军生 QQ:9073204 深圳

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 二进制 codec python milang

相关文章推荐

新的分享

章节导航