您的位置:首页 > Web前端 > Node.js

hadoop启动报错-namenode无法启动-GC overhead limit exceeded

2016-03-22 14:51 585 查看
报错场景:凌晨4:30分钟

报错日志:2016-03-22 04:30:29,075 WARN org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.10.10.4

3:54994 Call#7 Retry#0: error: java.lang.OutOfMemoryError: Java heap space

java.lang.OutOfMemoryError: Java heap space

2016-03-22 04:30:43,111 WARN org.apache.hadoop.ipc.Server: Error serializing call response for call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations fro

m 10.10.10.43:55003 Call#4 Retry#0

2016-03-22 04:30:39,756 WARN org.apache.hadoop.ipc.Server: Error serializing call response for call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations fro

m 10.10.10.43:54997 Call#4 Retry#0

java.lang.OutOfMemoryError: Java heap space

2016-03-22 04:30:34,398 FATAL org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: ReplicationMonitor thread received Runtime exception.

java.lang.OutOfMemoryError: Java heap space

2016-03-22 04:30:34,398 WARN org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 10.1

0.10.43:55000 Call#4 Retry#0: error: java.lang.OutOfMemoryError: Java heap space

java.lang.OutOfMemoryError: Java heap space

2016-03-22 04:30:34,398 ERROR org.mortbay.log: EXCEPTION

java.lang.OutOfMemoryError: Java heap space

2016-03-22 04:35:37,684 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

2016-03-22 04:37:27,793 ERROR org.mortbay.log: EXCEPTION

java.lang.OutOfMemoryError: Java heap space

2016-03-22 04:37:27,793 WARN org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 1

0.10.10.148:45637 Call#5441484 Retry#0: error: java.lang.OutOfMemoryError: Java heap space

java.lang.OutOfMemoryError: Java heap space

2016-03-22 04:37:27,793 WARN org.apache.hadoop.ipc.Server: Out of Memory in server select

java.lang.OutOfMemoryError: Java heap space

2016-03-22 04:37:27,793 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://bis-newnamenode-s-02:50070/getimage?getimage=1&txid=42
4042092&storageInfo=-47:1574903840:0:CID-d29c5605-82ec-474f-950a-fd106ad23daa

2016-03-22 04:37:27,804 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

接着重启服务,报如下的错误:GC问题

java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.nio.CharBuffer.allocate(CharBuffer.java:331)

at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)

at org.apache.hadoop.io.Text.decode(Text.java:405)

at org.apache.hadoop.io.Text.decode(Text.java:377)

at org.apache.hadoop.io.Text.readString(Text.java:470)

at org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.java:90)

at org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:105)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINode(FSImageFormat.java:682)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINodeWithLocalName(FSImageFormat.java:616)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadChildren(FSImageFormat.java:453)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:495)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadLocalNameINodesWithSnapshot(FSImageFormat.java:398)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:339)

at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:823)

at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)

at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:664)

at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:633)

at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:264)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:787)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:568)

at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443)

at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491)

at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:684)

at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:669)

at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254)

at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)

2016-03-22 06:54:29,716 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

2016-03-22 06:54:29,758 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG

调整了mapred-site.xml的mapred.child.java.opts成2000

hadoop-env.sh的HADOOP_HEAPSIZE=2000

启动还是不行,查看报错信息,是加载FSImage太大导致,后翻备份的元数据目录(每小时备份一次),查看FSImage文件是不断增大

2016-03-22 09:07:46,848 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 5189987

2016-03-22 09:10:03,405 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join

java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.nio.ByteBuffer.wrap(ByteBuffer.java:369)

at java.nio.ByteBuffer.wrap(ByteBuffer.java:392)

at org.apache.hadoop.io.Text.decode(Text.java:377)

at org.apache.hadoop.io.Text.readString(Text.java:470)

at org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.java:91)

at org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:105)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINode(FSImageFormat.java:682)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINodeWithLocalName(FSImageFormat.java:616)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadChildren(FSImageFormat.java:453)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:495)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadLocalNameINodesWithSnapshot(FSImageFormat.java:398)

at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:339)

at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:823)

at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)

at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:664)

at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:633)

at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:264)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:787)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:568)

at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443)

at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491)

at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:684)

at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:669)

at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254)

at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)

最后调整了hadoop-env.sh的HADOOP_NAMENODE_INIT_HEAPSIZE,启动成功!

根据群的好友沟通,好像是小文件太多,导致的问题,一查是500多万个,果然把一些表的小文件合并或者删除了

FSImage文件由原来的600M变成了330M,减少了一半;再果断在程序里面加规则,处理小文件。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: