您的位置:首页 > 运维架构

Hadoop MapReduce时Too many open files解决办法

2012-05-17 17:36 726 查看
在HIVE执行MR的时候,报如下错误

java.io.IOException: Call to server/10.64.49.21:9001 failed on local exception: java.io.IOException: Too many open files
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
at org.apache.hadoop.ipc.Client.call(Client.java:1033)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1011)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1023)
at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:285)
at org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:685)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:458)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:191)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:629)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:617)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Too many open files
at sun.nio.ch.IOUtil.initPipe(Native Method)
at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:343)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)

文件数打开过多,造成这种情况的原因一般来说是由于应用程序对资源使用不当造成,比如没有及时关闭Socket或数据库连接等。但也可能应用确实需要打开比较多的文件句柄,而系统本身的设置限制了这一数量。

1.ulimit -a 查看linux系统打开文件的最大句柄数,如下

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 16308
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16308
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

open files (-n) 1024 默认为1024

2.ps ef|grep hadoop 查看hadoop进程,记录每个进程ID

3.查看每个进程的文件操作情况 lsof -p ID |wc -l

修改默认的文件句柄数

修改/etc/security/limits.conf文件

在文件末尾加入下内容

* soft nofile 65536

* hard nofile 65536

重启shell生效。

在Debian下,重启root用户的shell后看不到文件句柄数的改变,其实在其他用户下此设置已经生效。所以要对root用户生效还需加上以下两句。

root soft nofile 65536

root hard nofile 65536

否则在root用户下看不到参数生效。在centos系统下就不必加这两句。

另外修改系统环境变量的时候一般为了不重启系统,自然的就会想source下这个配置文件。此时在Debian下source会提示如下错误

-bash: iptab.sh: command not found

-bash: iptab.sh: command not found

而且在不同目录下红色部分都不一样,而在centos下可以正常执行该命令。

至于出现该问题的原因应该与内核相关,不是很理解,暂时没找到原因。有知道原因的请拍砖,谢谢。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: