您的位置:首页 > 编程语言 > Python开发

python处理网页时的unicode编码问题

2015-07-29 21:44 736 查看
最近调试保存博客页面的时候,遇到下面的问题:

flying-bird@flyingbird:~/Downloads/export_blog$ ./images_parser.py 2015-07-29-2/Windows平台下面的MD5算法.htm
Traceback (most recent call last):
File "./images_parser.py", line 154, in <module>
_test(sys.argv[1])
File "./images_parser.py", line 146, in _test
get_image_items(content)
File "./images_parser.py", line 133, in get_image_items
parser.feed(content)
File "/usr/lib/python2.7/HTMLParser.py", line 117, in feed
self.goahead(0)
File "/usr/lib/python2.7/HTMLParser.py", line 161, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.7/HTMLParser.py", line 308, in parse_starttag
attrvalue = self.unescape(attrvalue)
File "/usr/lib/python2.7/HTMLParser.py", line 475, in unescape
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
File "/usr/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 7: ordinal not in range(128)


解决办法参考http://blog.sina.com.cn/s/blog_6c39196501013s5b.html

主要如下:

在出现问题的页加上如下三行即可:

import sys

reload(sys)

sys.setdefaultencoding('utf-8')
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: