您的位置:首页 > 运维架构

flume写入hadoop hdfs报错 Too many open files

2013-02-17 16:37 344 查看
故障现象:

[hadoop@dtydb6 logs]$ vi hadoop-hadoop-datanode-dtydb6.log

at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)

at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:1094)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:168)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:81)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyBlock(DataBlockScanner.java:453)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyFirstBlock(DataBlockScanner.java:519)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:617)

at java.lang.Thread.run(Thread.java:722)

2013-02-17 00:00:29,023 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Second Verification failed for blk_1408462853104263034_39617. Exception : java.io.FileNotFoundException: /hadoop/logdata/current/subdir2/subdir2/blk_1408462853104263034
(Too many open files)


at java.io.RandomAccessFile.open(Native Method)

at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)

at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:1094)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:168)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:81)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyBlock(DataBlockScanner.java:453)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyFirstBlock(DataBlockScanner.java:519)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:617)

at java.lang.Thread.run(Thread.java:722)

2013-02-17 00:00:29,023 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Reporting bad block blk_1408462853104263034_39617 to namenode.

2013-02-17 00:00:53,076 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: First Verification failed for blk_4328439663130931718_44579. Exception : java.io.FileNotFoundException: /hadoop/logdata/current/subdir9/subdir12/blk_4328439663130931718
(Too many open files)

at java.io.RandomAccessFile.open(Native Method)

at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)

at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:1094)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:168)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:81)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyBlock(DataBlockScanner.java:453)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyFirstBlock(DataBlockScanner.java:519)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:617)

at java.lang.Thread.run(Thread.java:722)

2013-02-17 00:00:53,077 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Second Verification failed for blk_4328439663130931718_44579. Exception : java.io.FileNotFoundException: /hadoop/logdata/current/subdir9/subdir12/blk_4328439663130931718 (Too
many open files)

at java.io.RandomAccessFile.open(Native Method)

at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)

at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:1094)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:168)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:81)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyBlock(DataBlockScanner.java:453)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyFirstBlock(DataBlockScanner.java:519)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:617)

at java.lang.Thread.run(Thread.java:722)

2013-02-17 00:00:53,077 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Reporting bad block blk_4328439663130931718_44579 to namenode.

2013-02-17 00:01:10,115 WARN org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: First Verification failed for blk_2833765807455012512_10228. Exception : java.io.FileNotFoundException: /hadoop/logdata/current/subdir63/subdir25/blk_2833765807455012512
(Too many open files)

at java.io.RandomAccessFile.open(Native Method)

at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)

at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputStream(FSDataset.java:1094)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:168)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:81)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyBlock(DataBlockScanner.java:453)

at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.verifyFirstBlock(DataBlockScanner.java:519)

网络搜索,怀疑linux nofile超过最大限制,当前设置大小1024,默认值

[hadoop@dtydb6 logs]$ ulimit -a

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 1064960

max locked memory (kbytes, -l) 32

max memory size (kbytes, -m) unlimited

open files (-n) 1024

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 10240

cpu time (seconds, -t) unlimited

max user processes (-u) 1064960

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

而查看flume进程打开的文件数量为2932(这个比较奇怪,怎么超过1024了呢?)

12988 Jps

26903 JobTracker

29828 Application

26545 DataNode

27100 TaskTracker

26719 SecondaryNameNode

26374 NameNode

[root@dtydb6 ~]# lsof -p 29828|wc -l

2932

[root@dtydb6 ~]# ps -ef|grep 29828

root 13133 12914 0 14:05 pts/3 00:00:00 grep 29828

hadoop 29828 1 32 Jan22 ? 8-10:51:15 /usr/java/jdk1.7.0_07/bin/java -Xmx2048m -cp /monitor/flume-1.3/conf:/monitor/flume-1.3/lib/*:/hadoop/hadoop-1.0.4/libexec/../conf:/usr/java/jdk1.7.0_07/lib/tools.jar:/hadoop/hadoop-1.0.4/libexec/..:/hadoop/hadoop-1.0.4/libexec/../hadoop-core-1.0.4.jar:/hadoop/hadoop-1.0.4/libexec/

解决方案:

1.修改nfile配置文件,手工增加nofile的大小

vi /etc/security/limits.conf

* soft nofile 12580

* hard nofile 65536

2.重启flume进程,也就是进程29828,问题解决



参考资料:

http://eryk.iteye.com/blog/1193487 http://blog.csdn.net/rzhzhz/article/details/7577122
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: