.gz文件上载到hdfs中
2016-01-18 15:06
459 查看
.gz文件上载到hdfs中
用 dfs -copyFormLocal的方式,上载后的文件正常,可以用mapreduce直接读取;
终于找到原因了:一个配置问题,
HdfsSink中默认的serializer会每写一行在行尾添加一个换行符,这样会导致每条日志后面多一个空行,修改配置不要自动添加换行符;
agentb2.sinks.hdfs_sink2.serializer.appendNewline = false
OK
用flume的方式,datastream类型,上载后mapreduce操作异常,为何呢?
Error: java.io.EOFException: Unexpected end of input stream
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
设定为hdfs.filetype为sequencefile,然后mr中用sequencefile的文件输入方式也不行:
job.setInputFormatClass(SequenceFileInputFormat.class);
Error: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.Text
at com.gzmrdemo.GzFileMapper.map(GzFileMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
用 dfs -copyFormLocal的方式,上载后的文件正常,可以用mapreduce直接读取;
终于找到原因了:一个配置问题,
HdfsSink中默认的serializer会每写一行在行尾添加一个换行符,这样会导致每条日志后面多一个空行,修改配置不要自动添加换行符;
agentb2.sinks.hdfs_sink2.serializer.appendNewline = false
OK
用flume的方式,datastream类型,上载后mapreduce操作异常,为何呢?
Error: java.io.EOFException: Unexpected end of input stream
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
设定为hdfs.filetype为sequencefile,然后mr中用sequencefile的文件输入方式也不行:
job.setInputFormatClass(SequenceFileInputFormat.class);
Error: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.io.Text
at com.gzmrdemo.GzFileMapper.map(GzFileMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
相关文章推荐
- MapReduce产生RCFile文件在HDFS,HIve将RCFile文件加载到hive的表中
- hbase安装(使用已经安装的HDFS和zookeeper)
- 将HDFS中的数据通过MapReduce产生HFile,然后将HFile导入到HBase具体案例分析
- HDFS的体系结构
- HDFS简介
- HBase正确安装配置单机和分布式【独立zk_quoram和data在hdfs上】
- HDFS实际应用场景之文件合并
- hdfs工具类加注释
- 解决因block的损坏而导致hdfs启动后进入安全模式
- 通过使用API来操作HDFS
- HDFS之Qurom Journal Manager(QJM)实现机制分析
- HDFS HA与QJM(Quorum Journal Manager)介绍及官网内容整理
- HDFS简介【全面讲解】
- HDFS+MapReduce+Hive+HBase十分钟快速入门
- 使用客户端的命令操作HDFS中的数据
- HDFS的介绍
- HIVE导出到HDFS没有分割符解决方案
- HDFS文件与本地文件操作
- 对HDFS上多个文件并行执行grep操作
- HDFS启动过程