[汉字编码报错] UnicodeEncodeError- 'ascii' codec can't encode characters in position 0-1- ordinal not in r
2017-11-29 20:51
706 查看
原始代码
# -*- coding:utf-8 -*- import pandas as pd import jieba def cut_msg(ustr): # ustr = ustr.encode("raw_unicode_escape").decode("raw_unicode_escape").encode("utf8") return " ".join(jieba.lcut(str(ustr))) fp = "gray.xlsx" df = pd.read_excel(fp) df["msg"] = df["msg"].map(cut_msg) li = df["msg"] with file(fp.replace(".xlsx", ".txt"), "wb") as wf: for e in li: wf.write(e.encode("utf8")+"\n")
报错信息
--------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) <ipython-input-28-b71365cb72a8> in <module>() 1 fp = "gray.xlsx" 2 df = pd.read_excel(fp) ----> 3 df["msg"] = df["msg"].map(cut_msg) 4 li = df["msg"] 5 with file(fp.replace(".xlsx", ".txt"), "wb") as wf: /Users/yipu.si/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in map(self, arg, na_action) 2156 else: 2157 # arg is a function -> 2158 new_values = map_f(values, arg) 2159 2160 return self._constructor(new_values, pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66440)() <ipython-input-26-f92cd72f89b6> in cut_msg(ustr) 4 import jieba 5 def cut_msg(ustr): ----> 6 return " ".join(jieba.lcut(str(ustr))) 7 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
解决方案
加入编码和解码语句ustr.encode("raw\_unicode\_escape").decode("raw\_unicode\_escape").encode("utf8") # 这里的 utf8 根据实际数据编码而定
如下:
# -*- coding:utf-8 -*- import pandas as pd import jieba def cut_msg(ustr): ustr = ustr.encode("raw_unicode_escape").decode("raw_unicode_escape").encode("utf8") # 这里的 utf8 根据实际数据编码而定 return " ".join(jieba.lcut(str(ustr)))
后记
汉字编码问题一直困扰着处理文本数据的童鞋,我也在不断探索中,在此抛砖引玉,往路过大神指点迷津。相关文章推荐
- pthon3 UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in ran
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
- 安装Sikuli时出现UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordinal not i
- UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position xxx ordinal not in range(12
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
- UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position xxx ordinal not in range(12
- python编码错误:UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position xxx ordinal not in
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
- UnicodeEncodeError:ascii codec can't encode characters in position 9-16:ordinal not in range(128)
- UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position xxx ordinal not in range(12
- UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position xxx ordinal not in range(12
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128)
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 32-35: ordinal not in range(12
- python UnicodeEncodeError: 'ascii' codec can't encode characters in position 13-14: ordinal not in r
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
- UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 0-15: ordinal not in range(128
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)