hadoop2.2.0 unhelthy nodes:log-dirs turned bad
2014-03-03 21:00
363 查看
问题:遇到一个mapreduce job运行迟迟不出结果,到8088端口看,发现nodemanager检测到一个unhealthy node
google之,得到如下信息:
hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html#Monitoring_Health_of_NodeManagers
Hadoop provides a mechanism by which administrators can configure the NodeManager to run an administrator supplied script periodically to determine if a node is healthy or not.
Administrators can determine if the node is in a healthy state by performing any checks of their choice in the script. If the script detects the node to be in an unhealthy state, it must print a line to standard output beginning with the string ERROR. The
NodeManager spawns the script periodically and checks its output. If the script's output contains the string ERROR, as described above, the node's status is reported asunhealthy and the node is black-listed by the ResourceManager. No further tasks
will be assigned to this node. However, the NodeManager continues to run the script, so that if the node becomes healthy again, it will be removed from the blacklisted nodes on the ResourceManager automatically. The node's health along with the output of the
script, if it is unhealthy, is available to the administrator in the ResourceManager web interface. The time since the node was healthy is also displayed on the web interface.
The following parameters can be used to control the node health monitoring script inconf/yarn-site.xml.
The health checker script is not supposed to give ERROR if only some of the local disks become bad.NodeManager has the ability to periodically check the health of the local disks (specifically checks nodemanager-local-dirs and
nodemanager-log-dirs) and after reaching the threshold of number of bad directories based on the value set for the config property yarn.nodemanager.disk-health-checker.min-healthy-disks, the whole node is marked unhealthy and this info is sent to resource
manager also. The boot disk is either raided or a failure in the boot disk is identified by the health checker script.
由于是单机环境,只有一个node可用,而这个node被标记为unhealthy后,便不再接受任务,因此任务卡住,无法得到执行。
www.cs.colostate.edu/~koontz/me/wiki/doku.php?id=hadoop
This happens when you don't have write access to the folder listed (in this case,
看来是因为我是以root用户在R上运行mapreduce,对log路径没有写权限,所以才会导致这个错误。。。
修改权限为777后重启hadoop,问题解决!
hadoop2.2.0 web interface可用端口:(转自http://blog.csdn.net/twlkyao/article/details/17317157)
google之,得到如下信息:
hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html#Monitoring_Health_of_NodeManagers
Hadoop provides a mechanism by which administrators can configure the NodeManager to run an administrator supplied script periodically to determine if a node is healthy or not.
Administrators can determine if the node is in a healthy state by performing any checks of their choice in the script. If the script detects the node to be in an unhealthy state, it must print a line to standard output beginning with the string ERROR. The
NodeManager spawns the script periodically and checks its output. If the script's output contains the string ERROR, as described above, the node's status is reported asunhealthy and the node is black-listed by the ResourceManager. No further tasks
will be assigned to this node. However, the NodeManager continues to run the script, so that if the node becomes healthy again, it will be removed from the blacklisted nodes on the ResourceManager automatically. The node's health along with the output of the
script, if it is unhealthy, is available to the administrator in the ResourceManager web interface. The time since the node was healthy is also displayed on the web interface.
The following parameters can be used to control the node health monitoring script inconf/yarn-site.xml.
Parameter | Value | Notes |
---|---|---|
yarn.nodemanager.health-checker.script.path | Node health script | Script to check for node's health status. |
yarn.nodemanager.health-checker.script.opts | Node health script options | Options for script to check for node's health status. |
yarn.nodemanager.health-checker.script.interval-ms | Node health script interval | Time interval for running health script. |
yarn.nodemanager.health-checker.script.timeout-ms | Node health script timeout interval | Timeout for health script execution. |
nodemanager-log-dirs) and after reaching the threshold of number of bad directories based on the value set for the config property yarn.nodemanager.disk-health-checker.min-healthy-disks, the whole node is marked unhealthy and this info is sent to resource
manager also. The boot disk is either raided or a failure in the boot disk is identified by the health checker script.
由于是单机环境,只有一个node可用,而这个node被标记为unhealthy后,便不再接受任务,因此任务卡住,无法得到执行。
www.cs.colostate.edu/~koontz/me/wiki/doku.php?id=hadoop
This happens when you don't have write access to the folder listed (in this case,
/tmp/nodemanagerLog). To fix the problem, change this directory in your -site.xml file.
看来是因为我是以root用户在R上运行mapreduce,对log路径没有写权限,所以才会导致这个错误。。。
修改权限为777后重启hadoop,问题解决!
hadoop2.2.0 web interface可用端口:(转自http://blog.csdn.net/twlkyao/article/details/17317157)
1.ResourceManage
http://localhost:80882.NameNode
http://localhost:500703.NodeManager
http://localhost:80424.FileSystem
http://localhost:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=127.0.0.1:9000相关文章推荐
- Hadoop YARN ERROR 1/1 local-dirs are bad *, 1/1 log-dirs are bad *
- Hadoop YARN ERROR 1/1 local-dirs are bad *, 1/1 log-dirs are bad *
- Hadoop YARN ERROR 1/1 local-dirs are bad *, 1/1 log-dirs are bad *
- Hadoop YARN ERROR 1/1 local-dirs are bad *, 1/1 log-dirs are bad *
- Hadoop YARN: 1/1 local-dirs are bad: /var/lib/hadoop-yarn/cache/yarn/nm-local-dir; 1/1 log-dirs are
- Hadoop-2.2.0集群部署时live nodes数目不对的问题
- Hadoop-2.2.0集群部署时live nodes数目不对的问题
- org.apache.hadoop.hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null问题可能出现的原因
- Hadoop-2.2.0编译!!!
- hadoop伪分布式下 无法启动datanode的原因及could only be replicated to > 0 nodes, instead of 1的错误
- hadoop2.2.0版本伪分布模式安装
- Hadoop-2.2.0中文文档—— MapReduce 下一代--容量调度器
- 第九章 搭建Hadoop 2.2.0版本HDFS的HA配置
- 【甘道夫】Hadoop2.2.0环境使用Sqoop-1.4.4将Oracle11g数据导入HBase0.96,并自动生成组合行键
- Hadoop配置常见Log错误指导
- Hadoop概念学习系列之关于hadoop-2.2.0和hadoop2.6.0的winutils.exe、hadoop.dll版本混用(易出错)(四十三)
- hadoop2.2.0 core-site.xml--file system properties
- hadoop2.2.0 maprd-site.xml--Proxy Configuration
- hadoop2.2.0 单节点安装 -翻译自hadoop官方文档-与原文有出入
- fedora17中hadoop2.2.0在eclipse下运行wordcount