您的位置:首页 > 运维架构

hadoop学习笔记

2015-06-24 16:29 453 查看
一、增加新的数据节点

1.1 配置数据节点相关的所有信息(slaves,master, core-site, hdfs-site,mapred-site,yarn-site 等)

1.2 /etc/hosts 设置

1.3 /etc/hostname 设置

1.4 启动 bin/hadoop-deamon.sh start datanode

1.5 启动 bin/hadoop-deamon.sh start tasktracker

通过相关日志检查,是否成功

二、负载均衡

启动 bin/start-balancer.sh –threshold 15

三、卸载数据节点

在名称节点上操作

conf/hdfs-site.xml中增加

<property>
<name>dfs.hosts.exclude</name>
<value>[FULL_PATH_TO_THE_EXCLUDE_FILE]</value>
<description>Names a file that contains a list of hosts thatare
not permitted to connect to the namenode. The full pathname of
the file must be specified. If the value is empty, no hosts are
excluded.</description>
</property>


执行 bin/hadoop dfsadmin -refreshNodes

监控

四、 Using multiple disks/volumes and limiting HDFS disk usage

1.1 指定多分区

conf/hdfs-site.xml

<property>
<name>dfs.data.dir</name>
<value>/u1/hadoop/data,/u2/hadoop/data</value>
</property>


1.2 限定磁盘大小

<property>
<name>dfs.datanode.du.reserved</name>
<value>6000000000</value>
<description>Reserved space in bytes per volume. Always leave
this much space free for non dfs use.
</description>
</property>


五、 Setting HDFS block size

方法1、 修改conf/hdfs-site.xml .

<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>


方法2:有些特定场景使用,比如上传

bin/hadoop fs -Ddfs.blocksize=134217728 -put data.in /user/foo

六、Setting the file replication factor

方法1: 修改conf/hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>2</value>
</property>


方法2:上传时设置

bin/hadoop fs -D dfs.replication=1 -copyFromLocal non-critical-file.txt /user/foo

方法3:改变复制因子

bin/hadoop fs -setrep 2 non-critical-file.txt
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: