DataNode上执行文件读写时报java.io.IOException: Bad connect ack with firstBadLink as 192.168.X.X错误解决记录
2015-08-08 21:57
781 查看
今天在集群上看到有两个任务跑失败了:
Err log:
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
java.io.IOException: Bad connect ack with firstBadLink as 192.168.44.57:50010
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1460)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
Job Submission failed with exception 'java.io.IOException(Bad connect ack with firstBadLink as 192.168.44.57:50010)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 16 Reduce: 8 Cumulative CPU: 677.62 sec HDFS Read: 1794135673 HDFS Write: 136964041 SUCCESS
Stage-Stage-5: Map: 16 Reduce: 8 Cumulative CPU: 864.95 sec HDFS Read: 1794135673 HDFS Write: 120770083 SUCCESS
Stage-Stage-4: Map: 70 Reduce: 88 Cumulative CPU: 5431.46 sec HDFS Read: 22519878178 HDFS Write: 422001541 SUCCESS
Total MapReduce CPU Time Spent: 0 days 1 hours 56 minutes 14 seconds 30 msec
task BFD_JOB_TASK_521_20150721041704 is complete.
错误的大概意思是:Job在在运行Map的时候,map的输出正准备往磁盘上写的时候,报:
java.io.IOException: Bad connect ack with firstBadLink as 192.168.44.57:5001了
Datanode往hdfs上写时,实际上是通过使用xcievers这个中间服务往linux上的文件系统上写文件的。其实这个xcievers就是一些负责在DataNode和本地磁盘上读,写文件的线程。
DataNode上Block越多,这个线程的数量就应该越多。然后问题来了,这个线程数有个上线(默认是配置的4096)。所以,当Datenode上的Block数量过多时,就会有些Block文件找不到
线程来负责他的读和写工作了。所以就出现了上面的错误(写块失败)。
在hdfs-site.xml中添加:
Tips:
这个漏洞还可能照成这个错误:
Err log:
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
java.io.IOException: Bad connect ack with firstBadLink as 192.168.44.57:50010
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1460)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
Job Submission failed with exception 'java.io.IOException(Bad connect ack with firstBadLink as 192.168.44.57:50010)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 16 Reduce: 8 Cumulative CPU: 677.62 sec HDFS Read: 1794135673 HDFS Write: 136964041 SUCCESS
Stage-Stage-5: Map: 16 Reduce: 8 Cumulative CPU: 864.95 sec HDFS Read: 1794135673 HDFS Write: 120770083 SUCCESS
Stage-Stage-4: Map: 70 Reduce: 88 Cumulative CPU: 5431.46 sec HDFS Read: 22519878178 HDFS Write: 422001541 SUCCESS
Total MapReduce CPU Time Spent: 0 days 1 hours 56 minutes 14 seconds 30 msec
task BFD_JOB_TASK_521_20150721041704 is complete.
错误的大概意思是:Job在在运行Map的时候,map的输出正准备往磁盘上写的时候,报:
java.io.IOException: Bad connect ack with firstBadLink as 192.168.44.57:5001了
原因是:
Datanode往hdfs上写时,实际上是通过使用xcievers这个中间服务往linux上的文件系统上写文件的。其实这个xcievers就是一些负责在DataNode和本地磁盘上读,写文件的线程。DataNode上Block越多,这个线程的数量就应该越多。然后问题来了,这个线程数有个上线(默认是配置的4096)。所以,当Datenode上的Block数量过多时,就会有些Block文件找不到
线程来负责他的读和写工作了。所以就出现了上面的错误(写块失败)。
解决方案是:
在hdfs-site.xml中添加: <property> <name>dfs.datanode.max.transfer.threads</name> <value>16000</value> </property>
Tips:
这个漏洞还可能照成这个错误:
10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get new b 9eee lock locations from namenode and retry...
相关文章推荐
- Node.js文件模块fs监视文件变化
- [leedcode 237] Delete Node in a Linked List
- [leetcode] Delete Node in a Linked List
- LeetCode(19) Remove Nth Node From End of List
- LeetCode(19) Remove Nth Node From End of List
- 在64位ubuntu14.04上安装校园客户端iNodeClient
- Node安装之windows篇
- node.js调用bat
- nodejs中Buffer的创建和转换
- Node.js开发入门—Express里的路由和中间件
- [leedcode 222] Count Complete Tree Nodes
- LeetCode222 Count Complete Tree Nodes
- 【leetcode每日一题】NO19.Remove Nth Node From End of List
- 将Visual Studio打造成为Node.js IDE
- npm常用命令->nodejs
- nodejs配置与入门
- node.js 事件
- node.js 数组
- (medium)LeetCode 222.Count Complete Tree Nodes
- node msgpack5 数据传输 简单实现