您的位置:首页 > Web前端 > Node.js

Removing and adding DataNodes in cluster by hadoop

2013-08-06 10:12 841 查看
You
may want to remove or add some DataNodes from your HDFS cluster at some point. In fact ,Removing or adding nodes in Hadoop can be straightforward.Like this, we only do some simply operations, in which we will not affect ongoing other jobs.But in order
to removing or adding more safe and
efficient ,we
must note replication of blocks and other points.

I think you should known that:


dfs.xxx ==> datanode ==> hdfs-site.xml

mapred.xxx ==>
tasktracker
==>
mapred-site.xml


It's means:you can do some operations on datanode and tasktracker respectively.Because
they are hardly in the same of operations.


1.Removing DataNodes

Modify
your cluster file of hdfs-site.xml on namenode:add some property like this

<property>

<name>dfs.hosts</name>

<value>/usr/hadoop/conf/datanode-allow-list</value>

</property>

<property>

<name>dfs.hosts.exclude</name>

<value>/usr/hadoop/conf/datanode-deny-list</value>

</property>

The dfs.host list you datanode what can contanct your namenode.If it's NULL ,all of your datanodes can contanct your namenode.If it's not NULL,then only the listing datanodes can line on your namenode. dfs.hosts.exclude
list some datanodes that can't connect your namenode.When a datanode in not only dfs.hosts but also dfs.hosts.exclude ,it can not connectt!


(1)#touch datanode-deny-list

Then put some datanodes's IP or hostname in the file datanode-deny-list.

(2)
#hadoop dfsadmin -refreshNodes

Dynamic refresh the configuration, do
not need to restart the namenode




(3)
#hadoop
dfsadmin -report


look some info show that:datanode turn to "Decommissioning"
or in webui.

(4)wati for a while,the datanode turn to "dead"



(5)remove the datanode's IP from /usr/hadoop/conf/datanode-allow-list



2.Adding
datanode

If the dfs.hosts not empty ,you can add datanode's IP to this file.

#hadoop dfsadmin -refreshNodes

Dynamic refresh the configuration, do
not need to restart the namenode.

#bin/hadoop-daemon.sh
start datanode


start the job of DataNode on you datanode that
you want to add.



If the dfs.hosts is NULL or is
not exist,you only start datanode.




As for Removing or Adding
tasktracker:it's the
same to datanode.

you only use the property
in
mapred-site.xml

<property>

<name>mapred.hosts</name>

<value>/usr/local/hadoop/conf/tasktracker-allow-list</value>

</property>

<property>

<name>mapred.hosts.exclude</name>

<value>/usr/hadoop/conf/tasktracker-deny-list</value>

</property>

#bin/hadoop-daemon.sh start tasktracker

3.Do
balance

#bin/stop-balancer.sh

A cluster is considered balanced when the utilization rates of all the DataNodes are within the range of the average utilization rate plus or minus a threshold.
This thresh-old is 10 percent by default. You can specify a different threshold when you start the balancer script. For example, to set the threshold to 5 percent for a more evenly distributed cluster, start the balancer with

#bin/start-balancer.sh -threshold 5

As balancing can be network intensive, we recommend doing it overnight or over a weekend when your cluster may be less busy. Alternatively, you can set the dfs.balance.bandwidthPerSec configuration parameter to limit the bandwidth devoted to balancing.

The dfs.balance.bandwidthPerSec
is 1M/s by
default so low. In
some case that no mr job,you can modify the value up for balance faster.

Like this :modify hdfs-site.xml on your namenode.

<property>

<name>dfs.balance.bandwidthPerSec</name>

<value>xxxxxxxx</value>

<description>

Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.

</description>

</property>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐