pandas read_csv 错误: pandas.parser.CParserError: Error tokenizing data. C error
2017-03-31 19:16
761 查看
今天panda.read_csv时遇到以下错误:
File "/root/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1213, in read data = self._reader.read(nrows) File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:7988) File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8244) File "pandas/parser.pyx", line 842, in pandas.parser.TextReader._read_rows (pandas/parser.c:8970) File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838) File "pandas/parser.pyx", line 1833, in pandas.parser.raise_parser_error (pandas/parser.c:22649) pandas.parser.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
发现是因为csv文件单个item内有\r,即回车符
测试回车符
import pandas as pd a = "\r\r 汉考克在接受当地媒体采访时表示" \ "\r\r 汉考克" d = {'nid':[100], 'doc':[a]} df = pd.DataFrame(data=d, columns=('nid', 'doc')) df.to_csv('p.txt', index=False) df1 = pd.read_csv('p.txt') print df1.head()
出现上面同样的错误,去掉\r就可以
测试换行符
import pandas as pd a = "\n\n 汉考克在接受当地媒体采访时表示" \ "\n\n 汉考克" d = {'nid':[100], 'doc':[a]} df = pd.DataFrame(data=d, columns=('nid', 'doc')) df.to_csv('p.txt', index=False) df1 = pd.read_csv('p.txt') print df1.head()
结果没有出现上面的错误
回车与换行
\r —回车符,光标移动到行首\n—换行符, 光标移动到下一行
经测试linux、mac系统中没有回车\r
echo -en '12\n34\r56\n\r78\r\n' > tmp 可以看到\r会被处理成^M: 12 34^M56 ^M78^M
但window有\r,将光标移动到行首, \n是换行
这样,带\r的字符在mac,linux系统下出现^M符,pandas.read_csv异常
总得来说,自己使用\n就够了;带\r的字符串在linus和mac系统下要处理一下,例如python 的string 有split方法去除。
相关文章推荐
- pandas读取csv处理时报错:ParserError: Error tokenizing data. C error: Expected 1 fields in line 29, saw 2
- Python3 pandas read_csv 报错:IOError: Initializing from file failed
- 编译ssd出现错误:json_parser_read.hpp:257:264: error: ‘type name’ declared as function returning an array e
- CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file
- 解决"利用Html5的FileRead接口和Formdata上传文件到appweb服务器"报500(internal server error)错误
- Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.
- Python编码格式导致的csv读取错误(csv.reader, pandas.csv_read)
- 解决pandas.read_csv()出现OSError:Initializing from file failed问题
- ParserError: Error tokenizing data
- git错误提示error: RPC failed; curl 18 transfer closed with outstanding read data remaining
- iOS国际化编译错误:error: read failed: The data couldn’t be read because it isn’t in the correct format.
- Python3 pandas read_csv 读取txt文件报错:IOError: Initializing from file failed
- Sys.WebForms.PageRequestManagerParserErrorException 错误的解决办法
- ResultSet can not re-read row data for column XX 错误及其原因
- 关于oracle存储过程调用问题。execute method error : DataAccessException,或者时参数个数或类型错误,等等
- Microsoft JDBC "ResultSet Can Not Re-Read Row Data" Error
- Sys.WebForms.PageRequestManagerParserErrorException错误
- MPSENG上 "MPFTranLogData with error code 80004005. "错误的处理
- ABAP syntax_error 错误: form send_cmplx_data_015 does not exist.
- PageRequestManagerParserErrorException 错误