您的位置:首页 > Web前端 > HTML5

hadoop-cdh5 分布式环境手工安装步骤

2018-01-19 21:58 549 查看
用cloudera的CM安装cdh5是一件轻松的事情。但如果有追究本源的习惯,还是喜欢自己折腾一番手工部署。自己部署也可以更好的控制一些环境。

比如说,为了可以灵活升级,一般我建议配置文件和执行模块分离,日志分离等等,这些都可以用环境变量来控制。

我部署的套路是:

运行数据、日志与运行程序配置分离

运行程序部署在/opt目录下

环境变量放在/etc/profile.d

启用批量copy和执行脚本

比如:在/etc/motd上将登录欢迎消息就表明了大概这样的方式:

######################################################################
#                              Notice                                #
#                                                                    #
#  Hadoop environment:                                               #
#      /etc/profile.d/hadoop.sh                                      #
#      /etc/profile.d/java.sh                                        #
#      Such as HADOOP_HOME, HADOOP_CONF_DIR, HADOOP_LOG_DIR etc.     #
#                                                                    #
#  /opt:                                                             #
#      cdh/home:                                                     #
#               hadoop-x.y.z-cdh5.?.?                                #
#               zookeeper-x.y.z-cdh5.?.?                             #
#                                                                    #
#      cdh/bin:  operating scripts                                   #
#      cdh/conf: configuration files                                 #
#      cdh/logs: log files                                           #
#      cdh/data: data path                                           #
#      cdh/temp: temporary path                                      #
#                                                                    #
######################################################################

以下作为备忘。

1. 准备阶段

CDH5 的下载地址:  http://archive.cloudera.com/cdh5/cdh/5/

CentOS7, x64

每个节点上要做的工作:

时间同步
yum install ntp -y
service ntpd start
chkconfig ntpd on
chkconfig --list ntpd
service ntpd restart


修改每台服务器的Hostname
vi /etc/hostname
vi /etc/hosts

解压jdk
在这里我喜欢放到 /opt/jdk-1.7.0

所有跟cdh相关的东西,创建到 /opt/cdh 路径上。

解压hadoop

tar zxf hadoop-2.5.0-cdh5.3.1.tar.gz -C /opt/cdh/


环境变量

vi /etc/profile.d/hadoop.sh

#!/bin/bash

# HADOOP

export HADOOP_HOME=/opt/cdh/home/hadoop-2.5.0-cdh5.3.2
export HADOOP_LOG_DIR=/opt/cdh/logs/hadoop/hdfs.logs
export HADOOP_CONF_DIR=/opt/cdh/conf/hadoop

# YARN

export YARN_LOG_DIR=/opt/cdh/logs/hadoop/yarn.logs
export YARN_CONF_DIR=/opt/cdh/conf/hadoop


 java的环境变量:

echo "export JAVA_HOME=/opt/jdk1.7.0_60" > /etc/profile.d/java.sh
ln -snf /opt/jdk1.7.0_60/bin/java /usr/bin/java
ln -snf /opt/jdk1.7.0_60/bin/jps /usr/bin/jps


并让其生效:
. /etc/profile

Hadoop的配置文件

vi $HADOOP_CONF_DIR/core-site.xml

<configuration>
<property>
<name>io.native.lib.available</name>
<value>true</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master1:9001</value>
<description>The name of the default file system.Either the literal string "local" or a host:port for NDFS.</description>
<final>true</final>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/cdh/temp/hadoop</value>
</property>

</configuration>


vi $HADOOP_CONF_DIR/hdfs-site.xml

<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/cdh/data/hadoop/dfs/name</value>
<description>Determines where on the local filesystem the DFS name node should store the name table.If this is a comma-delimited list of directories,then name table is replicated in all of the directories,for redundancy.</description>
<final>true</final>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/cdh/data/hadoop/dfs/data</value>
<description>Determines where on the local filesystem an DFS data node should store its blocks.If this is a comma-delimited list of directories,then data will be stored in all named directories,typically on different devices.Directories that do not exist are ignored.                </description>
<final>true</final>
</property>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.permission</name>
<value>false</value>
</property>

<!-- HBASE -->
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
</configuration>


vi $HADOOP_CONF_DIR/mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.job.tracker</name>
<value>hdfs://master1:9001</value>
<final>true</final>
</property>

<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>

<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>

<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1024M</value>
</property>

<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>

<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>

<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>

<property>
<name>mapred.system.dir</name>
<value>/tmp/hadoop/mapred/system</value>
<final>true</final>
</property>

<property>
<name>mapred.local.dir</name>
<value>/tmp/hadoop/mapred/local</value>
<final>true</final>
</property>
</configuration>

vi $HADOOP_CONF_DIR/yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>0.0.0.0:8088</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address</name>
<value>0.0.0.0:8033</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>master1:8032</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master1:8030</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master1:8031</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

</configuration>


修改/增加 masters  文件,添加secondary-name-node  运行节点
编辑$HADOOP_CONF_DIR/slaves文件,添加data-node 运行节点

slave1
slave2


在  hdfs-site.xml  中配置的  dfs.datanode.data.dir  和 dfs.namenode.name.dir  目录需要准备好(创建,赋予读写权限)。
如果  dfs.datanode.data.dir  已经存在,必须清空,否则数据版本,与namenode格式化后的版本不一致。

好,到这里,就可以给这个主机做个镜像(如果在云平台上),或者将这些/opt/cdh  和 /etc/profile.d/[hadoop,java].sh 同步copy到其余的节点上了。

2. 准备好几个节点的配置后

打通免密码登录
ssh-keygen
ssh-copy-id master1
ssh-copy-id slave1
ssh-copy-id slave2
记得关掉防火墙,或配置好防火墙
systemctl stop firewalld.service
格式化namenode

$HADOOP_HOME/bin/hdfs namenode -format
注意:
执行了格式化  hdfs namenode -format  之后,
原来的  dfs.datanode.data.dir  需要清除,否则会得到集群ID与数据不一致的问题,类似日志:

WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /data1/hadoop/dfs/data: namenode clusterID = CID-338fa625-66d8-488c-b5cf-eadcf4e3a5f7; datanode clusterID = CID-128d2365-f3ec-427d-b986-0ed9d599ae39


3. 启动

启动:
$HADOOP_HOME/sbin/start-all.sh
$HADOOP_HOME/sbin/stop-all.sh

测试:

$HADOOP_HOME/bin/hadoop fs -mkdir /test
$HADOOP_HOME/bin/hadoop fs -copyFromLocal ../test.tar.gz /test/


内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: