您的位置:首页 > 运维架构

爱尚实训解析|hadoopHA的安装

2018-02-07 13:14 204 查看


主机规划

机器名

域名

IP地址

 
Hdp-01

192.168.33.31

Namenode(1.1)、DFSZKFailoverController(zkfc)

Hdp-02

192.168.33.32

Namenode(1.2)、DFSZKFailoverController(zkfc)

Hdp-03

192.168.33.33

resourceManager

Hdp-04

192.168.33.34

resourceManager

Hdp-05

192.168.33.35

datanode、nodeManager、zookeeper(QuorumPeerMain)、journalNode

Hdp-06

192.168.33.36

datanode、nodeManager、zookeeper(QuorumPeerMain)、journalNode

Hdp-07

192.168.33.37

datanode、nodeManager、zookeeper(QuorumPeerMain)、journalNode

一、安装zookeeper

所有机器进行域名配置

[hadoop@hdp-05 conf]$ vi /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.33.31 hdp-01

192.168.33.32 hdp-02

192.168.33.33 hdp-03

192.168.33.34 hdp-04

192.168.33.35 hdp-05

192.168.33.36 hdp-06

192.168.33.37 hdp-07

在Hdp-05、Hdp-06、Hdp-07上安装zookeeper

1. 先将zookeeper拷贝到hdp-05上,解压到/home/hadoop/apps下

[hadoop@hdp-05 ~]$ tar -zxvf zookeeper-3.4.5.tar.gz -C apps/

然后可以删除刚才拷贝的的zookeeper了

[hadoop@hdp-05 ~]$ rm -f zookeeper-3.4.5.tar.gz

2. 进入到zookeeper的安装目录下:

cd /home/hadoop/apps/zookeeper-3.4.5

删除zookeeper下的一些不必要的文件

[hadoop@hdp-05 zookeeper-3.4.5]$ rm -rf *.xml *.txt docs src *.asc *.md5 *.sha1

[hadoop@hdp-05 zookeeper-3.4.5]$ rm -rf dist-maven/

目前目录结构

[hadoop@hdp-05 zookeeper-3.4.5]$ ll

总用量1308

drwxr-xr-x. 2 hadoop hadoop 4096 1月 22 22:34 bin

drwxr-xr-x. 2 hadoop hadoop 4096 1月 22 22:34 conf

drwxr-xr-x. 10 hadoop hadoop 4096 1月 22 22:34 contrib

drwxr-xr-x. 4 hadoop hadoop 4096 1月 22 22:34 lib

drwxr-xr-x. 5 hadoop hadoop 4096 1月 22 22:34 recipes

-rw-r--r--. 1 hadoop hadoop 1315806 11月 5 2012 zookeeper-3.4.5.jar

3. 进入到conf下,修改配置文件

[hadoop@hdp-05 zookeeper-3.4.5]$ cd conf/

将配置文件改名

[hadoop@hdp-05 conf]$ mv zoo_sample.cfgzoo.cfg

4. 编辑zoo.cfg

[hadoop@hdp-05 conf]$ vi zoo.cfg

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

#存放数据的目录

dataDir=/home/hadoop/zkdata

# the port at which the clients will connect

clientPort=2181

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1

server.1=hdp-05:2888:3888

server.2=hdp-06:2888:3888

server.3=hdp-07:2888:3888

5. 创建/home/hadoop/zkdata目录,并创建文件myid,写入节点id “1”

[hadoop@hdp-05 ~]$ mkdir zkdata

[hadoop@hdp-05 ~]$ echo 1 > zkdata/myid

ll查看apps/目录

[hadoop@hdp-05 apps]$ ll

总用量8

drwxrwxr-x. 3 hadoop hadoop 4096 1月 22 23:37 apps

drwxrwxr-x. 2 hadoop hadoop 4096 1月 22 23:38 zkdata

6. 将apps/ zkdata/分别拷贝到hdp-06、hdp-07上

[hadoop@hdp-05 ~]$ scp -r apps/ zkdata/ hdp-06:/home/hadoop/

[hadoop@hdp-05 ~]$ scp -r apps/ zkdata/ hdp-07:/home/hadoop/

7. 分别修改hdp-06、hdp-07上的/home/hadoop/zkdata/myid中的内容,分别为2,3

8. 启动三台机器上的zookeeper

[hadoop@hdp-05 ~]$cd apps/zookeeper-3.4.5/

[hadoop@hdp-05 zookeeper-3.4.5]$bin/zkServer.sh start

JMX enabled by default

Using config: /home/hadoop/apps/zookeeper-3.4.5/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

查看zookeeper的状态

[hadoop@hdp-05 zookeeper-3.4.5]$ bin/zkServer.sh status

JMX enabled by default

Using config: /home/hadoop/apps/zookeeper-3.4.5/bin/../conf/zoo.cfg

Mode: follower

或者查看进程

[hadoop@hdp-07 zookeeper-3.4.5]$jps

1465 QuorumPeerMain

1515 Jps

能起来,能查看状态证明成功!

二、安装hadoop集群

1. 上传hadoop软件到hdp-01上

2. 创建/home/hadoop/apps目录,将hadoop解压到apps/中

[hadoop@hdp-01 ~]$ tar -zxvf cenos-6.5-hadoop-2.6.4.tar.gz -C apps/

删除安装包

[hadoop@hdp-01 ~]$ rm -rf cenos-6.5-hadoop-2.6.4.tar.gz

进入到hadoop解压后的目录:/home/hadoop/apps/hadoop-2.6.4

(1)删除所有txt文档

[hadoop@hdp-01 hadoop-2.6.4]$ rm -rf *.txt

(2)删除share/doc

[hadoop@hdp-01 hadoop-2.6.4]$ rm -rf share/doc

3. 修改配置文件:进入到 etc/hadoop

[hadoop@hdp-01 hadoop-2.6.4]$ cd etc/hadoop/

[hadoop@hdp-01 hadoop]$ ll

总用量152

-rw-r--r--. 1 hadoop hadoop 4436 3月 8 2016 capacity-scheduler.xml

-rw-r--r--. 1 hadoop hadoop 1335 3月 8 2016 configuration.xsl

-rw-r--r--. 1 hadoop hadoop 318 3月 8 2016 container-executor.cfg

-rw-r--r--. 1 hadoop hadoop 774 3月 8 2016 core-site.xml

-rw-r--r--. 1 hadoop hadoop 3670 3月 8 2016 hadoop-env.cmd

-rw-r--r--. 1 hadoop hadoop 4224 3月 8 2016 hadoop-env.sh

-rw-r--r--. 1 hadoop hadoop 2598 3月 8 2016 hadoop-metrics2.properties

-rw-r--r--. 1 hadoop hadoop 2490 3月 8 2016 hadoop-metrics.properties

-rw-r--r--. 1 hadoop hadoop 9683 3月 8 2016 hadoop-policy.xml

-rw-r--r--. 1 hadoop hadoop 775 3月 8 2016 hdfs-site.xml

-rw-r--r--. 1 hadoop hadoop 1449 3月 8 2016 httpfs-env.sh

-rw-r--r--. 1 hadoop hadoop 1657 3月 8 2016 httpfs-log4j.properties

-rw-r--r--. 1 hadoop hadoop 21 3月 8 2016 httpfs-signature.secret

-rw-r--r--. 1 hadoop hadoop 620
4000
3月 8 2016 httpfs-site.xml

-rw-r--r--. 1 hadoop hadoop 3523 3月 8 2016 kms-acls.xml

-rw-r--r--. 1 hadoop hadoop 1325 3月 8 2016 kms-env.sh

-rw-r--r--. 1 hadoop hadoop 1631 3月 8 2016 kms-log4j.properties

-rw-r--r--. 1 hadoop hadoop 5511 3月 8 2016 kms-site.xml

-rw-r--r--. 1 hadoop hadoop 11291 3月 8 2016 log4j.properties

-rw-r--r--. 1 hadoop hadoop 938 3月 8 2016 mapred-env.cmd

-rw-r--r--. 1 hadoop hadoop 1383 3月 8 2016 mapred-env.sh

-rw-r--r--. 1 hadoop hadoop 4113 3月 8 2016 mapred-queues.xml.template

-rw-r--r--. 1 hadoop hadoop 758 3月 8 2016 mapred-site.xml.template

-rw-r--r--. 1 hadoop hadoop 10 3月 8 2016 slaves

-rw-r--r--. 1 hadoop hadoop 2316 3月 8 2016 ssl-client.xml.example

-rw-r--r--. 1 hadoop hadoop 2268 3月 8 2016 ssl-server.xml.example

-rw-r--r--. 1 hadoop hadoop 2237 3月 8 2016 yarn-env.cmd

-rw-r--r--. 1 hadoop hadoop 4567 3月 8 2016 yarn-env.sh

-rw-r--r--. 1 hadoop hadoop 690 3月 8 2016 yarn-site.xml

4. 修改hadoop-env.sh配置文件

[hadoop@hdp-01 hadoop]$echo $JAVA_HOME

/usr/local/jdk1.7.0_45

[hadoop@hdp-01 hadoop]$vi hadoop-env.sh

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements. See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership. The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License. You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0
#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME. All others are

# optional. When running a distributed configuration it is best to

# set JAVA_HOME in this file, so that it is correctly defined on

# remote nodes.

# The java implementation to use.

export JAVA_HOME=/usr/local/jdk1.7.0_45

# The jsvc implementation to use. Jsvc is required to run secure datanodes

# that bind to privileged ports to provide authentication of data transfer

5. 修改配置文件 core-site.xml

[hadoop@hdp-01 hadoop]$ vi core-site.xml

<configuration>

<!--指定hdfs的nameservice为n1 -->

<property>

<name>fs.defaultFS</name>

<value>hdfs://n1/</value>

</property>

<!--指定hadoop临时目录-->

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/apps/hdpdata/</value>

</property>

<!--指定zookeeper地址-->

<property>

<name>ha.zookeeper.quorum</name>

<value>hdp-05:2181,hdp-06:2181,hdp-07:2181</value>

</property>

</configuration>

6. 修改配置文件hdfs-site.xml

[hadoop@hdp-01 hadoop]$ vi hdfs-site.xml

<configuration>

<!--指定hdfs的nameservice为n1,需要和core-site.xml中的保持一致-->

<property>

<name>dfs.nameservices</name>

<value>n1</value>

</property>

<!-- n1下面有两个NameNode,分别是nn1,nn2 -->

<property>

<name>dfs.ha.namenodes.n1</name>

<value>nn1,nn2</value>

</property>

<!-- nn1的RPC通信地址-->

<property>

<name>dfs.namenode.rpc-address.n1.nn1</name>

<value>hdp-01:9000</value>

</property>

<!-- nn1的http通信地址-->

<property>

<name>dfs.namenode.http-address.n1.nn1</name>

<value>hdp-01:50070</value>

</property>

<!-- nn2的RPC通信地址-->

<property>

<name>dfs.namenode.rpc-address.n1.nn2</name>

<value>hdp-02:9000</value>

</property>

<!-- nn2的http通信地址-->

<property>

<name>dfs.namenode.http-address.n1.nn2</name>

<value>hdp-02:50070</value>

</property>

<!--指定NameNode的edits元数据在JournalNode上的存放位置-->

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://hdp-05:8485;hdp-06:8485;hdp-07:8485/n1</value>

</property>

<!--指定JournalNode在本地磁盘存放数据的位置-->

<property>

<name>dfs.journalnode.edits.dir</name>

<value>/home/hadoop/journaldata</value>

</property>

<!--开启NameNode失败自动切换-->

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<!--配置失败自动切换实现方式-->

<property>

<name>dfs.client.failover.proxy.provider.n1</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<!--配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->

<property>

<name>dfs.ha.fencing.methods</name>

<value>

sshfence

shell(/bin/true)

</value>

</property>

<!--使用sshfence隔离机制时需要ssh免登陆-->

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hadoop/.ssh/id_rsa</value>

</property>

<!--配置sshfence隔离机制超时时间-->

<property>

<name>dfs.ha.fencing.ssh.connect-timeout</name>

<value>30000</value>

</property>

</configuration>

7. 修改mapred-site.xml.template 名为mapred-site.xml

[hadoop@hdp-01 hadoop]$ mv mapred-site.xml.template mapred-site.xml

修改mapred-site.xml

<configuration>

<!--指定mr框架为yarn方式-->

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

8. 修改yarn-site.xml配置文件

<configuration>

<!--开启RM高可用-->

<property>

<name>yarn.resourcemanager.ha.enabled</name>

<value>true</value>

</property>

<!--指定RM的cluster id -->

<property>

<name>yarn.resourcemanager.cluster-id</name>

<value>yrc</value>

</property>

<!--指定RM的名字-->

<property>

<name>yarn.resourcemanager.ha.rm-ids</name>

<value>rm1,rm2</value>

</property>

<!--分别指定RM的地址-->

<property>

<name>yarn.resourcemanager.hostname.rm1</name>

<value>hdp-03</value>

</property>

<property>

<name>yarn.resourcemanager.hostname.rm2</name>

<value>hdp-04</value>

</property>

<!--指定zk集群地址-->

<property>

<name>yarn.resourcemanager.zk-address</name>

<value>hdp-05:2181,hdp-06:2181,hdp-07:2181</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

9. 修改slaves

slaves是指定子节点的位置,因为要在hdp-01上启动HDFS、在hdp-02启动yarn,所以hdp-01上的slaves文件指定的是datanode的位置,hdp-03上的slaves文件指定的是nodemanager的位置

hdp-05

hdp-06

hdp-07

10. 将hdp-01下的hadoop分发到其余的hdp-02、hdp-03、hdp-04、hdp-05、hdp-06、hdp-07主机上(可以在第11步配完免密登录后再分发效果更好)

[hadoop@hdp-01 apps]$ scp -r hadoop-2.6.4/ hdp-02:apps/

11. 配置免密登录

(1)首先要配置hdp-01到hdp-02、hdp-03、hdp-04、hdp-05、hdp-06、hdp-07的免密码登陆(启动hdfs的时候需要)

在hdp-01上生产钥匙

ssh-keygen -t rsa

将公钥拷贝到其他节点(包括自己)

ssh-copy-id hdp-01

ssh-copy-id hdp-02

ssh-copy-id hdp-03

ssh-copy-id hdp-04

ssh-copy-id hdp-05

ssh-copy-id hdp-06

ssh-copy-id hdp-07

(2)配置hdp-03到hdp-05、hdp-06、hdp-07的免密码登陆(启动yarn的时候需要)

在hdp-03上生产钥匙

ssh-keygen -t rsa

将公钥拷贝到其他节点(包括自己)

ssh-copy-id hdp-03

ssh-copy-id hdp-05

ssh-copy-id hdp-06

ssh-copy-id hdp-07

(3)两个namenode的主备机之间要配置ssh免密码登陆,配置hdp-02到hdp-01的免登陆(hdp-01到hdp-02之间的免密登录已经配置完毕)

在hdp-02上生产钥匙之前已经生成完毕,直接拷贝到hdp-01就可以

ssh-copy-id hdp-01

三、第一次启动并初始化HA集群(严格按照下面的步骤执行)

1. 启动hdp-05、hdp-06、hdp-07上的zookeeper,参考第二步,启动了就不用再启动

Journalnode之间的工作需要zookeeper、namenode的主备之间需要zookeeper

2. 将HADOOP_HOME配置到/etc/profile下

vi /etc/profile

export JAVA_HOME=/usr/local/jdk1.7.0_45

export HADOOP_HOME=/home/hadoop/apps/hadoop-2.6.4

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

将/etc/profile拷贝到其它的机器上(现在是hadoop登录,切换到root用户拷贝)

[hadoop@hdp-01 ~] sudo scp /etc/profile hdp-02:/etc/

[hadoop@hdp-01 ~] sudo scp /etc/profile hdp-03:/etc/

[hadoop@hdp-01 ~] sudo scp /etc/profile hdp-04:/etc/

[hadoop@hdp-01 ~] sudo scp /etc/profile hdp-05:/etc/

[hadoop@hdp-01 ~] sudo scp /etc/profile hdp-06:/etc/

[hadoop@hdp-01 ~] sudo scp /etc/profile hdp-07:/etc/

然后在每一台机器上source /etc/profile 提升一下变量,这样可以在环境变量中使用hadoop的命令了

3. 启动journalnode(分别在在hdp-05、hdp-06、hdp-07上执行)

[hadoop@hdp-05 ~]$hadoop-daemon.sh start journalnode

starting journalnode, logging to /home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-journalnode-hdp-05.out

运行jps命令检验,hdp-05、hdp-06、hdp-07上多了JournalNode进程

4. 在hdp-01上格式化hdfs

hdfs namenode -format

格式化成功后,可以看到hdfs的数据目录下多出的文件

[hadoop@hdp-01 current]$ pwd

/home/hadoop/apps/hdpdata/dfs/name/current

[hadoop@hdp-01 current]$ ll

总用量16

-rw-rw-r--. 1 hadoop hadoop 352 1月 23 03:35 fsimage_0000000000000000000

-rw-rw-r--. 1 hadoop hadoop 62 1月 23 03:35 fsimage_0000000000000000000.md5

-rw-rw-r--. 1 hadoop hadoop 2 1月 23 03:35 seen_txid

-rw-rw-r--. 1 hadoop hadoop 204 1月 23 03:35 VERSION

[hadoop@hdp-01 current]$ cat VERSION

#Tue Jan 23 03:35:56 CST 2018

namespaceID=108848608

clusterID=CID-b7ca0c20-01a6-4dda-a731-f105e71d4e41

cTime=0

storageType=NAME_NODE

blockpoolID=BP-837921991-192.168.33.31-1516649756022

layoutVersion=-60

格式化hdfs后,会根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是

/home/hadoop/apps/hdpdata/,

5. 然后将/home/hadoop/apps/hdpdata/拷贝到hdp-02的/home/hadoop/apps/hdpdata/下。hdp-02将要作为hdp-01的standby,两者拥有相同的clusterID和blockpoolID

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/apps/hdpdata/</value>

</property>

scp -r /home/hadoop/apps/hdpdata/ hdp-02:/home/hadoop/apps/

##也可以这样,建议hdfs namenode -bootstrapStandby

6. 格式化ZKFC:在hdp-01上执行一次即可,其实是向zookeeper上注册父节点的,主备

hdfs zkfc -formatZK

结果:

18/01/23 03:50:51 INFO zookeeper.ClientCnxn: Opening socket connection to server hdp-06/192.168.33.36:2181. Will not attempt to authenticate using SASL (unknown error)

18/01/23 03:50:51 INFO zookeeper.ClientCnxn: Socket connection established to hdp-06/192.168.33.36:2181, initiating session

18/01/23 03:50:51 INFO zookeeper.ClientCnxn: Session establishment complete on server

hdp-06/192.168.33.36:2181, sessionid = 0x2611f3d65970000, negotiated timeout = 5000

18/01/23 03:50:51 INFO ha.ActiveStandbyElector: Session connected.

18/01/23 03:50:51 INFO ha.ActiveStandbyElector:Successfully created /hadoop-ha/n1 in ZK.

18/01/23 03:50:51 INFO zookeeper.ClientCnxn: EventThread shut down

18/01/23 03:50:51 INFO zookeeper.ZooKeeper: Session: 0x2611f3d65970000 closed

上zookeeper上验证是否注册了节点,可以在hdp-05、hdp-06、hdp-07上的任意一台验证

启动一个zk的客户端

[hadoop@hdp-05 ~]$ cd apps/zookeeper-3.4.5/

[hadoop@hdp-05 zookeeper-3.4.5]$ bin/zkCli.sh

[zk: localhost:2181(CONNECTED) 0] ls /

[hadoop-ha, zookeeper]

[zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha

[n1]

目前没数据,因为没有启动集群,谁也不是主

[zk: localhost:2181(CONNECTED) 2] get /hadoop-ha/n1

cZxid = 0x200000003

ctime = Tue Jan 23 03:51:35 CST 2018

mZxid = 0x200000003

mtime = Tue Jan 23 03:51:35 CST 2018

pZxid =
c058
0x200000003

cversion = 0

dataVersion = 0

aclVersion = 0

ephemeralOwner = 0x0

dataLength = 0

numChildren = 0

7. 在hdp-01上启动HDFS

[hadoop@hdp-01 ~]$start-dfs.sh

Starting namenodes on [hdp-01 hdp-02]

hdp-02: starting namenode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-namenode-hdp-02.out

hdp-01: starting namenode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-namenode-hdp-01.out

hdp-07: starting datanode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hdp-07.out

hdp-06: starting datanode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hdp-06.out

hdp-05: starting datanode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hdp-05.out

Starting journal nodes [hdp-05 hdp-06 hdp-07]

hdp-05: journalnode running as process 1781. Stop it first.

hdp-07: journalnode running as process 1761. Stop it first.

hdp-06: journalnode running as process 1768. Stop it first.

Starting ZK Failover Controllers on NN hosts [hdp-01 hdp-02]

hdp-01: starting zkfc, logging to /home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-zkfc-hdp-01.out

hdp-02: starting zkfc, logging to /home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-zkfc-hdp-02.out

8. 在hdp-02上启动YARN

[hadoop@hdp-02 ~]$ start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-hdp-02.out

hdp-07: starting nodemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hdp-07.out

hdp-05: starting nodemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hdp-05.out

hdp-06: starting nodemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hdp-06.out

9. 在hdp-04上也要启动一个yarn的备份机器(hdp-02不会自动启动这个备份机的,需要手动启动)

[hadoop@hdp-04 apps]$yarn-daemon.sh start resourcemanager

starting resourcemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-hdp-04.out

四、验证高可用

1. 验证hdfs

可以通过浏览器访问:http://192.168.33.31:50070/ 结果:NameNode 'hdp-01:9000' (active)
http://192.168.33.32:50070/结果:NameNode 'hdp-02:9000' (standby)

2. 验证HA

(1)首先向hdfs上传一个文件

[hadoop@hdp-01 ~]$ hadoop fs -mkdir /test

[hadoop@hdp-01 ~]$ hadoop fs -ls /

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2018-01-23 12:41 /test

[hadoop@hdp-01 ~]$hadoop fs -put /etc/profile /test

[hadoop@hdp-01 ~]$ hadoop fs -ls /test

Found 1 items

-rw-r--r-- 3 hadoop supergroup 1954 2018-01-23 12:42 /test/profile

(2)看一下namenode的进程

[hadoop@hdp-01 ~]$ jps

2128 DFSZKFailoverController

7200 Jps

2349 NameNode

(3)干掉2349的进程

[hadoop@hdp-01 ~]$kill -9 2349

[hadoop@hdp-01 ~]$ jps

2128 DFSZKFailoverController

7220 Jps

(4)通过浏览器访问hdp-01上的客户端:http://192.168.33.31:50070/

结果无法访问

再访问hdp-02上的客户端:这时候hdp-02的hdfs变成了active,刚才上传的文件依旧存在

(5)手动启动hdp-01中那个挂掉的NameNode

[hadoop@hdp-01 ~]$hadoop-daemon.sh start namenode

starting namenode, logging to /home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-namenode-hdp-01.out

通过浏览器访问hdp-01的那个namenode:http://192.168.33.31:50070/,已经变成了standby

3. 验证yarn的ha

打开浏览器:访问hdp-03的yarn的主机:http://192.168.33.33:8088

访问hdp-04的yarn的备份机:http://192.168.33.34:8088

五、以后启动集群的时候:

1. 首先在hdp-05、hdp-06、hdp-07开启zookeeper

2. 在hdp-01上运行

[hadoop@hdp-01 ~]$ start-dfs.sh

Starting namenodes on [hdp-01 hdp-02]

hdp-01: starting namenode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-namenode-hdp-01.out

hdp-02: starting namenode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-namenode-hdp-02.out

hdp-06: starting datanode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hdp-06.out

hdp-07: starting datanode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hdp-07.out

hdp-05: starting datanode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hdp-05.out

Starting journal nodes [hdp-05 hdp-06 hdp-07]

hdp-06: starting journalnode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-journalnode-hdp-06.out

hdp-05: starting journalnode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-journalnode-hdp-05.out

hdp-07: starting journalnode, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-journalnode-hdp-07.out

Starting ZK Failover Controllers on NN hosts [hdp-01 hdp-02]

hdp-01: starting zkfc, logging to /home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-zkfc-hdp-01.out

hdp-02: starting zkfc, logging to /home/hadoop/apps/hadoop-2.6.4/logs/hadoop-hadoop-zkfc-hdp-02.out

3. 在hdp-03上启动yarn

[hadoop@hdp-03 ~]$ start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-hdp-03.out

hdp-07: starting nodemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hdp-07.out

hdp-06: starting nodemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hdp-06.out

hdp-05: starting nodemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hdp-05.out

4. hdp-04是hdp-03的备份机器,手动在hdp-04上启动yarn

[hadoop@hdp-04 ~]$ yarn-daemon.sh start resourcemanager

starting resourcemanager, logging to

/home/hadoop/apps/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-hdp-04.out

六、关闭集群

1. 在hdp-03上关闭yarn集群

[hadoop@hdp-03 ~]$ stop-yarn.sh

stopping yarn daemons

stopping resourcemanager

hdp-07: stopping nodemanager

hdp-05: stopping nodemanager

hdp-06: stopping nodemanager

no proxyserver to stop

2. 手动关闭hdp-04上的resourcemanager

[hadoop@hdp-04 ~]$ yarn-daemon.sh stop resourcemanager

stopping resourcemanager

3. 在hdp-01上关闭hdfs

[hadoop@hdp-01 ~]$ stop-dfs.sh

Stopping namenodes on [hdp-01 hdp-02]

hdp-01: stopping namenode

hdp-02: stopping namenode

hdp-05: stopping datanode

hdp-07: stopping datanode

hdp-06: stopping datanode

Stopping journal nodes [hdp-05 hdp-06 hdp-07]

hdp-07: stopping journalnode

hdp-06: stopping journalnode

hdp-05: stopping journalnode

Stopping ZK Failover Controllers on NN hosts [hdp-01 hdp-02]

hdp-01: stopping zkfc

hdp-02: stopping zkfc

4. 在hdp-05、hdp-06、hdp-07上关闭zk

[hadoop@hdp-05 zookeeper-3.4.5]$ bin/zkServer.sh stop

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hadoopHA