您的位置:首页 > 运维架构 > 网站架构

Hadoop 搭建高可用完全分布式集群

2017-12-09 21:20 661 查看
部署规划

主机
用途
IP
rm01.hadoop.com
ResourceManager01
192.168.137.11
nn01.hadoop.com
NameNode01、DFSZKFailoverController
192.168.137.12
rm02.hadoop.com
(backup resourcemanager)
ResourceManager02
192.168.137.13
nn02.hadoop.com
(backup namenode)
NameNode02、DFSZKFailoverController
192.168.137.14
dn01.hadoop.com
DataNode、NodeManager、QuorumPeerMain、JournalNode
192.168.137.21
dn02.hadoop.com
DataNode、NodeManager、QuorumPeerMain、JournalNode
192.168.137.22
dn03.hadoop.com
DataNode、NodeManager、QuorumPeerMain、JournalNode
192.168.137.23
[hadoop@dn01 ~]$ tar -zxf /nfs_share/software/zookeeper-3.4.11.tar.gz -C ~
[hadoop@dn01 ~]$ vi .bashrc
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.11
export PATH=$PATH:/home/hadoop/zookeeper-3.4.11/bin
[hadoop@dn01 ~]$ source .bashrc
[hadoop@dn01 ~]$ cd zookeeper-3.4.11/conf
[hadoop@dn01 conf]$ mv zoo_sample.cfg zoo.cfg
[hadoop@dn01 conf]$ vi zoo.cfg

dataLogDir=/home/hadoop/zookeeper-3.4.11/log
dataDir=/home/hadoop/zookeeper-3.4.11/data
server.1=192.168.137.21:2888:3888
server.2=192.168.137.22:2888:3888
server.3=192.168.137.23:2888:3888

[hadoop@dn01 conf]$ cd ..
[hadoop@dn01 zookeeper-3.4.11]$ mkdir data && mkdir log && cd data && echo "1">>myid
[hadoop@dn01 zookeeper-3.4.11]$ cd
[hadoop@dn01 ~]$ scp -r zookeeper-3.4.11
dn02.hadoop.com:/home/hadoop
[hadoop@dn01 ~]$ scp -r zookeeper-3.4.11
dn03.hadoop.com:/home/hadoop
[hadoop@dn01 ~]$ ssh

hadoop@dn02.hadoop.com 'cd /home/hadoop/zookeeper-3.4.11/data && echo "2">myid'
[hadoop@dn01 ~]$ ssh
hadoop@dn03.hadoop.com 'cd /home/hadoop/zookeeper-3.4.11/data && echo "3">myid'
[hadoop@dn01 ~]$ zkServer.sh start
[hadoop@dn02 ~]$ zkServer.sh start
[hadoop@dn03 ~]$ zkServer.sh start
[hadoop@dn01 ~]$ zkServer.sh status

[hadoop@dn02 ~]$ zkServer.sh status

[hadoop@dn03 ~]$ zkServer.sh status

[hadoop@dn01 ~]$ cd hadoop-2.9.0 && mkdir journal
[hadoop@dn02 ~]$ cd hadoop-2.9.0 && mkdir journal
[hadoop@dn03 ~]$ cd hadoop-2.9.0 && mkdir journal

[hadoop@nn01 ~]$ cd hadoop-2.9.0/etc/hadoop/
[hadoop@nn01 hadoop]$ vi
core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ns1</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/hadoop/hadoop-2.9.0/tmp</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
    <property>
           <name>hadoop.proxyuser.hadoop.hosts</name>
           <value>*</value>
       </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
    <!--HDFS HA Configuration,HDFS联邦都不需要配置  -->
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>dn01.hadoop.com:2181,dn02.hadoop.com:2181,dn03.hadoop.com:2181</value>
    </property>
    <property>
        <name>ha.zookeeper.session-timeout.ms</name>
        <value>1000</value>
    </property>
</configuration>
[hadoop@nn01 hadoop]$ vi
hdfs-site.xml

<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/hadoop/hadoop-2.9.0/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/hadoop/hadoop-2.9.0/dfs/data</value>
    </property>
    <property>
        <name>dfs.blocksize</name>
        <value>64m</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.permissions</name>
         <value>false</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
         <value>false</value>
    </property>
    <!-- NameNode HA Configuation-->
    <property>
        <name>dfs.nameservices</name>
         <value>ns1</value> <!--可以指定多个命令空间来配置HDFS联邦-->
    </property>
    <!--ns1,并且无需在配置SecondaryNameNode了,standby的namenode同时扮演了这角色-->
    <property>
        <name>dfs.ha.namenodes.ns1</name>
         <value>nn1,nn2</value> <!--ns1中namenode的唯一标识号-->
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn1</name>
         <value>nn01.hadoop.com:8020</value>
    </property>    
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn2</name>
         <value>nn02.hadoop.com:8020</value>
    </property>    
    <property>
        <name>dfs.namenode.servicerpc-address.ns1.nn1</name>
         <value>nn01.hadoop.com:53310</value>
    </property>    
    <property>
        <name>dfs.namenode.servicerpc-address.ns1.nn2</name>
         <value>nn02.hadoop.com:53310</value>
    </property>    
    <property>
        <name>dfs.namenode.http-address.ns1.nn1</name>
         <value>nn01.hadoop.com:50070</value>
    </property>    
    <property>
        <name>dfs.namenode.http-address.ns1.nn2</name>
         <value>nn02.hadoop.com:50070</value>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir.ns1.nn1</name>
        <value>qjournal://dn01.hadoop.com:8485;dn02.hadoop.com:8485;dn03.hadoop.com:8485/ns1</value>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir.ns1.nn2</name>
        <value>qjournal://dn01.hadoop.com:8485;dn02.hadoop.com:8485;dn03.hadoop.com:8485/ns1</value>
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled.ns1</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.ns1</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/home/hadoop/hadoop-2.9.0/journal</value>
    </property>
    <property>
        <name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
        <value>50000</value>
    </property>
    <property>
        <name>ipc.client.connect.timeout</name>
        <value>60000</value>
    </property>
    <property>
        <name>dfs.image.transfer.bandwidthPerSec</name>
        <value>4194304</value>
    </property>
</configuration>
[hadoop@nn01 hadoop]$ vi mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>rm01.hadoop.com:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>rm01.hadoop.com:19888</value>
    </property>
</configuration>
[hadoop@nn01 hadoop]$ vi yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>2000</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
        <value>5000</value>
    </property>
    
  <!--ResourceManager Restart Configuration-->
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>
  <property>
    <name>yarn.resourcemanager.zk-address</name>
    <value>dn01.hadoop.com:2181,dn02.hadoop.com:2181,dn03.hadoop.com:2181</value>
  </property>
  <property>
    <name>yarn.resourcemanager.zk-state-store.parent-path</name>
    <value>/rmstore</value>
  </property>
  <property>
    <name>yarn.resourcemanager.zk-num-retries</name>
    <value>500</value>
  </property>
  <property>
    <name>yarn.resourcemanager.zk-retry-interval-ms</name>
    <value>2000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.zk-timeout-ms</name>
    <value>10000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.zk-acl</name>
    <value>world:anyone:rwcda</value>
  </property>
  <property>
    <name>yarn.resourcemanager.am.max-attempts</name>
    <value>2</value>
  </property>
  
  <!--ResourceManager HA Configuration-->
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>yarn-cluster</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>rm01.hadoop.com</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>rm02.hadoop.com</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
    <value>true</value>
  </property>
  
  <!-- rm1 Configuration-->
  <property>
    <name>yarn.resourcemanager.address.rm1</name>
    <value>rm01.hadoop.com:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.rm1</name>
    <value>rm01.hadoop.com:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.rm1</name>
    <value>rm01.hadoop.com:8033</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    <value>rm01.hadoop.com:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>rm01.hadoop.com:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.rm1</name>
    <value>rm01.hadoop.com:8090</value>
  </property>
  
  <!-- rm2 Configuration-->
  <property>
    <name>yarn.resourcemanager.address.rm2</name>
    <value>rm02.hadoop.com:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.rm2</name>
    <value>rm02.hadoop.com:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.rm2</name>
    <value>rm02.hadoop.com:8033</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    <value>rm02.hadoop.com:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>rm02.hadoop.com:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.rm2</name>
    <value>rm02.hadoop.com:8090</value>
  </property>
</configuration>
[hadoop@nn01 hadoop]$ vi slaves

dn01.hadoop.com
dn02.hadoop.com
dn03.hadoop.com
[hadoop@nn01 ~]$ hdfs zkfc -formatZK
启动journalnode节点用于namenode主备数据同步
[hadoop@dn01 ~]$ hadoop-daemon.sh start journalnode
[hadoop@dn02 ~]$ hadoop-daemon.sh start journalnode
[hadoop@dn03 ~]$ hadoop-daemon.sh start journalnode
启动主namenode
[hadoop@nn01 ~]$ hdfs namenode -format -clusterId c1
[hadoop@nn01 ~]$ hadoop-daemon.sh start namenode
启动备用namenode
[hadoop@nn02 ~]$ hdfs namenode -bootstrapStandby
[hadoop@nn02 ~]$ hadoop-daemon.sh start namenode
启动namenode故障转移程序
[hadoop@nn01 ~]$ hadoop-daemon.sh start zkfc
[hadoop@nn02 ~]$ hadoop-daemon.sh start zkfc
启动datanode
[hadoop@dn01 ~]$ hadoop-daemon.sh start datanode
[hadoop@dn02 ~]$ hadoop-daemon.sh start datanode
[hadoop@dn03 ~]$ hadoop-daemon.sh start datanode
启动主resoucemanager
[hadoop@rm01 ~]$ start-yarn.sh
启动备用resoucemanager
[hadoop@rm02 ~]$ yarn-daemon.sh start resourcemanager
http://nn01.hadoop.com:50070/dfshealth.html#tab-overview



http://nn02.hadoop.com:50070/dfshealth.html#tab-overview



http://rm01.hadoop.com:8088/cluster/cluster





http://rm02.hadoop.com:8088/cluster/cluster



HDFS HA 检验实验
[hadoop@nn01 ~]$ jps
2352 DFSZKFailoverController
2188 NameNode
3105 Jps
执行命令
[hadoop@nn01 ~]$ kill -9 2188
刷新页面,看到



说明切换成功。

ResourceManager HA 检验实验
[hadoop@rm01 ~]$ jps
1599 ResourceManager
1927 Jps
启动wordcount程序
kill掉主ResourceManager进程
[hadoop@rm01 ~]$ kill -9 1599
看控制台输出,可以看到备的ResourceManager被启用





说明切换成功。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: