您的位置:首页 > 运维架构 > 网站架构

centos7 搭建ha(高可用)hadoop集群

2017-07-22 00:00 489 查看
一台操作(开发)机器:centos 7 完全版安装的机器dev

要创建的五台虚拟机:NameNode1(nn1)、NameNode2(nn2)、DataNode1(dn1)、DataNode2(dn2)、DataNode3(dn3)

预想实现效果如下表:

主机系统ip地址软件进程
nn1centos7192.168.206.129jdk8+,hadoopNameNode、DFSZKFailoverController(zkfc)、ResourceManager
nn2centos7192.168.206.130jdk8+,hadoopNameNode、DFSZKFailoverController(zkfc)、ResourceManager
dn1centos7192.168.206.131jdk8+,hadoop,zookeeperDataNode、NodeManager、JournalNode、QuorumPeerMain
dn2centos7192.168.206.132jdk8+,hadoop,zookeeperDataNode、NodeManager、JournalNode、QuorumPeerMain
dn3centos7192.168.206.133jdk8+,hadoop,zookeeperDataNode、NodeManager、JournalNode、QuorumPeerMain
开始安装:

一、主机防火墙和网络设置

在dev机器中创建set-net.sh脚本

#!/bin/bash
ssh root@192.168.203.138 << setnet
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
systemctl disable firewalld
sed -i "s/BOOTPROTO=dhcp/BOOTPROTO=static/g" /etc/sysconfig/network-scripts/ifcfg-ens33
sed -i "/^UUID.*/d" /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'IPADDR=192.168.203.121' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'GATEWAY=192.168.203.2' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'DNS1=202.106.0.20' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo nn1 > /etc/hostname
echo '192.168.203.121	nn1' >> /etc/hosts
echo '192.168.203.122	nn2' >> /etc/hosts
echo '192.168.203.123	dn1' >> /etc/hosts
echo '192.168.203.124	dn2' >> /etc/hosts
echo '192.168.203.125	dn3' >> /etc/hosts
shutdown -r
exit
setnet

ssh root@192.168.203.149 << setnet
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
systemctl disable firewalld
sed -i "s/BOOTPROTO=dhcp/BOOTPROTO=static/g" /etc/sysconfig/network-scripts/ifcfg-ens33
sed -i "/^UUID.*/d" /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'IPADDR=192.168.203.122' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'GATEWAY=192.168.203.2' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'DNS1=202.106.0.20' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo nn2 > /etc/hostname
echo '192.168.203.121	nn1' >> /etc/hosts
echo '192.168.203.122	nn2' >> /etc/hosts
echo '192.168.203.123	dn1' >> /etc/hosts
echo '192.168.203.124	dn2' >> /etc/hosts
echo '192.168.203.125	dn3' >> /etc/hosts
shutdown -r
exit
setnet

ssh root@192.168.203.147 << setnet
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
systemctl disable firewalld
sed -i "s/BOOTPROTO=dhcp/BOOTPROTO=static/g" /etc/sysconfig/network-scripts/ifcfg-ens33
sed -i "/^UUID.*/d" /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'IPADDR=192.168.203.123' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'GATEWAY=192.168.203.2' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'DNS1=202.106.0.20' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo dn1 > /etc/hostname
echo '192.168.203.121	nn1' >> /etc/hosts
echo '192.168.203.122	nn2' >> /etc/hosts
echo '192.168.203.123	dn1' >> /etc/hosts
echo '192.168.203.124	dn2' >> /etc/hosts
echo '192.168.203.125	dn3' >> /etc/hosts
shutdown -r
exit
setnet

ssh root@192.168.203.150 << setnet
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
systemctl disable firewalld
sed -i "s/BOOTPROTO=dhcp/BOOTPROTO=static/g" /etc/sysconfig/network-scripts/ifcfg-ens33
sed -i "/^UUID.*/d" /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'IPADDR=192.168.203.124' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'GATEWAY=192.168.203.2' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'DNS1=202.106.0.20' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo dn2 > /etc/hostname
echo '192.168.203.121	nn1' >> /etc/hosts
echo '192.168.203.122	nn2' >> /etc/hosts
echo '192.168.203.123	dn1' >> /etc/hosts
echo '192.168.203.124	dn2' >> /etc/hosts
echo '192.168.203.125	dn3' >> /etc/hosts
shutdown -r
exit
setnet

ssh root@192.168.203.148 << setnet
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
systemctl disable firewalld
sed -i "s/BOOTPROTO=dhcp/BOOTPROTO=static/g" /etc/sysconfig/network-scripts/ifcfg-ens33
sed -i "/^UUID.*/d" /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'IPADDR=192.168.203.125' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'GATEWAY=192.168.203.2' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo 'DNS1=202.106.0.20' >> /etc/sysconfig/network-scripts/ifcfg-ens33
echo dn3 > /etc/hostname
echo '192.168.203.121	nn1' >> /etc/hosts
echo '192.168.203.122	nn2' >> /etc/hosts
echo '192.168.203.123	dn1' >> /etc/hosts
echo '192.168.203.124	dn2' >> /etc/hosts
echo '192.168.203.125	dn3' >> /etc/hosts
shutdown -r
exit
setnet

更改.sh脚本的执行权限命令:chmod a+x set-net.sh

运行脚本,按提示输入要求代码。等待五台机器重启。

二、在所有机器上创建hadoop用户,并设置密码(我设置的密码是hadoop)

在dev机器中创建add-user.sh脚本

#!/bin/bash
ssh root@192.168.203.121 << addusershell
adduser -m hadoop -G root -s /bin/bash
echo "hadoop" | passwd hadoop --stdin
exit
addusershell
ssh root@192.168.203.122 << addusershell
adduser -m hadoop -G root -s /bin/bash
echo "hadoop" | passwd hadoop --stdin
exit
addusershell
ssh root@192.168.203.123 << addusershell
adduser -m hadoop -G root -s /bin/bash
echo "hadoop" | passwd hadoop --stdin
exit
addusershell
ssh root@192.168.203.124 << addusershell
adduser -m hadoop -G root -s /bin/bash
echo "hadoop" | passwd hadoop --stdin
exit
addusershell
ssh root@192.168.203.125 << addusershell
adduser -m hadoop -G root -s /bin/bash
echo "hadoop" | passwd hadoop --stdin
exit
addusershell

使用Xshell登陆五台机器的hadoop用户。

三、ssh无密码登陆

实现如下功能:
nn1 ssh无密码登陆nn2、dn1、dn2、dn3
nn2 ssh无密码登陆nn1、dn1、dn2、dn3

nn1 ssh无密码登陆nn2、dn1、dn2、dn3:
输入命令检查每个虚拟机上是否安装了ssh和启动了sshd服务:
rpm -qa | grep ssh
ps aux | grep ssh



如上图,说明表示安装了。

在每个虚拟机上生成.ssh隐藏文件夹,执行命令:
ssh localhost

在nn1虚拟机上生成ssh密钥
cd .ssh
ssh-keygen -t rsa #遇到提示一路回车就行
ll #会看到 id_rsa、id_rsa.pub两个文件,前为私钥,后为公钥
cat id_rsa.pub >> authorized_keys #把公钥内容追加到authorized_keys文件中
chmod 600 authorized_keys #修改文件权限,本条命令很重要,必须执行

输入命令:ssh localhost
测试nn1可以无密码登陆自己,不再提示输入密码,如果不能无密码登陆自己,检查少了上面那一步。

将nn1的公钥传送到其他四台机器上,实现nn1无密码登陆到所有机器上。在nn1的.ssh目录下创建keys.sh脚本:
#!/bin/bash
scp authorized_keys hadoop@192.168.203.122:~/.ssh/
scp authorized_keys hadoop@192.168.203.123:~/.ssh/
scp authorized_keys hadoop@192.168.203.124:~/.ssh/
scp authorized_keys hadoop@192.168.203.125:~/.ssh/

登陆到每个机器hadoop用户上执行:
cd ~/.ssh
chmod 600 authorized_keys

测试nn1 ssh登陆nn2、dn1、dn2、dn3观察是否需要输入密码,如果不需要则成功;否则失败,检查以上步骤。(注:第一次无密码登陆时需输入yes)

nn2 ssh无密码登陆nn1、dn1、dn2、dn3:
在nn2虚拟机上生成ssh密钥
cd .ssh
ssh-keygen -t rsa #遇到提示一路回车就行
ll #会看到 id_rsa、id_rsa.pub两个文件,前为私钥,后为公钥
cat id_rsa.pub >> authorized_keys #把公钥内容追加到authorized_keys文件中
chmod 600 authorized_keys #修改文件权限,本条命令很重要,必须执行

将nn2的公钥传送到其他四台机器上,实现nn1无密码登陆到所有机器上。在nn1的.ssh目录下创建keys.sh脚本:
#!/bin/bash
scp authorized_keys hadoop@nn1:~/.ssh/
scp authorized_keys hadoop@dn1:~/.ssh/
scp authorized_keys hadoop@dn2:~/.ssh/
scp authorized_keys hadoop@dn3:~/.ssh/

测试nn2 ssh登陆nn1、dn1、dn2、dn3观察是否需要输入密码,如果不需要则成功;否则失败,检查以上步骤。(注:第一次无密码登陆时需输入yes)

、安装jdk、hadoop、zookeeper

1、在每台机器上登陆hadoop用户分别安装jdk。

使用xftp把jdk-8u121-Linux-x64.rpm传送到每个虚拟机上。使用Xftp将jdk的安装包上传到每台机器中(提示:一定要确定安装包正确!!!)。
使用下面代码安装 :

sudo yum -y install jdk-8u121-linux-x64.rpm

如果出现 hadoop 不在 sudoers 文件中这样的错误
可以使用两种途径解决 :

a:切换到root用户

su
yum -y install jdk-8u121-linux-x64.rpm

b:以后一劳永逸的做法

su
visudo

然后输入89 回车确认,添加如下代码

hadoop ALL=(ALL) ALL

注意:不是空格是制表符(Teb)

如图:



保存后退出 (先 esc 后输入 :wq 退出)。

之后退出root 就是输入exit 。

然后再执行:sudo yum -y install jdk-8u121-linux-x64.rpm

输入hadoop密码开始安装。

2、上面的安装方法太过麻烦,所以我将jdk、hadoop、zookeeper三个安装好的文件夹打包传到每台机器上,配置环境变量的时候写清即可。

2.1在dev中安装jdk、hadoop、zookeeper,并将安装好的文件夹压缩到一个ha-hadoop.tar.gz压缩包中

tar -zcf ha-hadoop.tar.gz hadoop-2.7.3 zookeeper-3.4.9 jdk1.8.0_121/

创建jhz-copy-install.sh脚本文件,将ha-hadoop.tar.gz压缩包传到每台机子上,并解压,脚本内容如下:

#!/bin/bash
scp ha-hadoop.tar.gz hadoop@192.168.203.121:~/
scp ha-hadoop.tar.gz hadoop@192.168.203.122:~/
scp ha-hadoop.tar.gz hadoop@192.168.203.123:~/
scp ha-hadoop.tar.gz hadoop@192.168.203.124:~/
scp ha-hadoop.tar.gz hadoop@192.168.203.125:~/

ssh 192.168.203.121 << a
tar -xzvf ha-hadoop.tar.gz
exit
a
ssh 192.168.203.122 << a
tar -xzvf ha-hadoop.tar.gz
exit
a
ssh 192.168.203.123 << b
tar -xzvf ha-hadoop.tar.gz
exit
b
ssh 192.168.203.124 << c
tar -xzvf ha-hadoop.tar.gz
exit
c
ssh 192.168.203.125 << d
tar -xzvf ha-hadoop.tar.gz
exit
d

按提示输入hadoop用户密码。

分别查看五台机器上面是否有解压好的文件。如果没有,可以在运行一次脚本。

2.2配置环境变量

在nn1中cd进入到jdk1.8.0_121文件中,用pwd命令查看路径,我的路径是/home/hadoop/jdk1.8.0_121。

在dev中创建path文件

gedit path

写入要配置的环境变量内容,注意JAVA_HOME的路径:

# Java Environment Variables
export JAVA_HOME=/home/hadoop/jdk1.8.0_121
export PATH=$PATH:$JAVA_HOME/bin

# Hadoop Environment Variables
export HADOOP_HOME=/home/hadoop/hadoop-2.7.3
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

# Zookeeper Environment Variables
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.9
export PATH=$PATH:$ZOOKEEPER_HOME/bin

创建add-path.sh脚本文件,用来将path文件分别传到五台机器上,并将内容添加到.bashrc中,配置环境变量,脚本内容如下:

#!/bin/bash
scp path hadoop@192.168.203.121:~/
scp path hadoop@192.168.203.122:~/
scp path hadoop@192.168.203.123:~/
scp path hadoop@192.168.203.124:~/
scp path hadoop@192.168.203.125:~/
ssh 192.168.203.121 << a
cat path >> ~/.bashrc
exit
a
ssh 192.168.203.122 << a
cat path >> ~/.bashrc
exit
a
ssh 192.168.203.123 << a
cat path >> ~/.bashrc
exit
a
ssh 192.168.203.124 << a
cat path >> ~/.bashrc
exit
a
ssh 192.168.203.125 << a
cat path >> ~/.bashrc
exit
a

修改执行权限,运行脚本。

加载修改后的设置,使之立即生效,执行如下命令:

source ~/.bashrc

测试是否成功,执行:java -version 。五台机器都要测试。(我只测试了jdk环境变量是否成功)

五、hadoop ha(高可用)配置,在每个node上

/home/hadoop/hadoop-2.7.3/etc/hadoop
目录中,编辑如下5个文件:

1.core-site.xml

<configuration>

<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.7.3/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>dn1:2181,dn2:2181,dn3:2181</value>
</property>
</configuration>

2.hdfs-site.xml

<configuration>

<property>
<name>dfs.nameservices</name>
<value>cluster</value>
</property>

<property>
<name>dfs.ha.namenodes.cluster</name>
<value>nn1,nn2</value>
</property>

<property>
<name>dfs.namenode.rpc-address.cluster.nn1</name>
<value>nn1:9000</value>
</property>

<property>
<name>dfs.namenode.http-address.cluster.nn1</name>
<value>nn1:50070</value>
</property>

<property>
<name>dfs.namenode.rpc-address.cluster.nn2</name>
<value>nn2:9000</value>
</property>

<property>
<name>dfs.namenode.http-address.cluster.nn2</name>
<value>nn2:50070</value>
</property>

<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://dn1:8485;dn2:8485;dn3:8485/cluster</value>
</property>

<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/hadoop-2.7.3/journaldata</value>
</property>

<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

<property>
<name>dfs.client.failover.proxy.provider.cluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>

</configuration>

3.mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>nn1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>nn1:19888</value>
</property>
</configuration>

4.yarn-site.xml

<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>

<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>

<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>nn1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>nn2</value>
</property>

<property>
<name>yarn.resourcemanager.zk-address</name>
<value>dn1:2181,dn2:2181,dn3:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

5.slaves

dn1
dn2
dn3


六、zookeeper配置,dn1、dn2、dn3

在zookeeper安装目录的conf目录下conf中创建zoo.cfg文件(/home/hadoop/zookeeper-3.4.9/)

在dev中创建zoo.cfg文件,zoo.cfg文件中的内容如下:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
#dataDir=/tmp/zookeeper 在你的主机中建立相应的目录
dataDir=/home/hadoop/zookeeper-3.4.9/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance #
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=dn1:2888:3888
server.2=dn2:2888:3888
server.3=dn3:2888:3888

再创建zoo-copy.sh脚本文件,将zoo.cfg文件传到dn1、dn2、dn3三台机器上,/home/hadoop/zookeeper-3.4.9/路径下。

zoo-copy.sh脚本文件内容如下:

#!/bin/bash
scp zoo.cfg hadoop@192.168.203.123:~/zookeeper-3.4.9/conf/
scp zoo.cfg hadoop@192.168.203.124:~/zookeeper-3.4.9/conf/
scp zoo.cfg hadoop@192.168.203.125:~/zookeeper-3.4.9/conf/

在dn1、dn2、dn3的
/home/hadoop/zookeeper-3.4.9/data
下创建一个myid的文件里面写一个数字 要和上面配置中的信息一直如
server.1=node3:2888:3888 表示要在node3的myid文件中写一个1的数字
server.2=node4:2888:3888表示要在node4的myid文件中写一个2的数字
server.3=node5:2888:3888表示要在node5的myid文件中写一个3的数字

七、启动步骤

先启动zookeeper服务,分别在node3,node4,node5上执行
zkServer.sh start


启动journalnode,分别在node3,node4,node5上执行
hadoop-daemon.sh start journalnode
注意只有第一次需要这么启动,之后启动hdfs会包含journalnode

格式化HDFS,在node1上执行
hdfs namenode -format
注意:格式化之后需要把tmp目录拷给node2(不然node2的namenode起不来)

格式化ZKFC,在node1上执行
hdfs zkfc -formatZK


启动HDFS,在node1上执行,
start-dfs.sh


启动YARN,在node1上执行,
start-yarn.sh


node2的resourcemanager需要手动单独启动:
yarn-daemon.sh start resourcemanager


在每个节点上执行
jps
如果看到内容和我上面表中的进程对应,成功,不对应以失败。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息