Hadoop2.6.5高可用集群搭建
2017-02-20 16:21
316 查看
软件环境:
linux系统: CentOS6.7 Hadoop版本: 2.6.5 zookeeper版本: 3.4.8
主机配置:
一共m1, m2, m3这五部机, 每部主机的用户名都为centos192.168.179.201: m1 192.168.179.202: m2 192.168.179.203: m3 m1: Zookeeper, Namenode, DataNode, ResourceManager, NodeManager, Master, Worker m2: Zookeeper, Namenode, DataNode, ResourceManager, NodeManager, Worker m3: Zookeeper, DataNode, NodeManager, Worker
前期准备
1.配置主机IP:
sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0
2.配置主机名:
sudo vi /etc/sysconfig/network
3.配置主机名和IP的映射关系:
sudo vi /etc/hosts
4.关闭防火墙
(1)临时关闭:service iptables stop
service iptables status
(2)开机时自动关闭:
chkconfig iptables off
chkconfig iptables --list
搭建步骤:
一.安装配置Zookeeper集群(在m3.m4,m5三部主机上)
1.解压
tar -zxvf zookeeper-3.4.8.tar.gz -C /home/hadoop/soft/zookeeper
2.配置环境变量
vi /etc/profile
## Zookeeper export ZK_HOME=/home/centos/soft/zookeeper export CLASSPATH=$CLASSPATH:$ZK_HOME/lib export PATH=$PATH:$ZK_HOME/sbin:$ZK_HOME/bin
source /etc/profile
3.修改配置
(1)配置zoo.cfg文件cd /home/centos/soft/zookeeper/conf/
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg ## 修改dataDir此项配置 dataDir=/home/centos/soft/zookeeper/tmp ## 添加以下三项配置 server.1=m3:2888:3888 server.2=m4:2888:3888 server.3=m5:2888:3888
(2)创建tmp目录
mkdir /home/centos/soft/zookeeper/tmp
(3)编辑myid文件
touch /home/centos/soft/zookeeper/tmp/myid
echo 1 > /home/centos/soft/zookeeper/tmp/myid ## 在m3主机上myid=1
4.配置zookeeper日志存放位置
编辑zkEnv.sh文件
vi /home/centos/soft/zookeeper/bin/zkEnv.sh # 编辑下列该项配置 if [ "x${ZOO_LOG_DIR}" = "x" ] then ZOO_LOG_DIR="/home/centos/soft/zookeeper/logs" ## 修改此项 fi
(5)创建
logs目录
mkdir /home/centos/soft/zookeeper/logs
5. 拷贝到其他主机并修改myid
(1)拷贝到其他主机scp -r /home/centos/soft/zookeeper/ m4:/home/centos/soft/ scp -r /home/centos/soft/zookeeper/ m5:/home/centos/soft/
(2)修改myid
echo 2 > /home/centos/soft/zookeeper/tmp/myid ## m4主机 echo 3 > /home/centos/soft/zookeeper/tmp/myid ## m5主机
二.安装配置hadoop集群
1.解压
tar -zxvf hadoop-2.6.5.tar.gz -C /home/centos/soft/hadoop
2.将Hadoop配置进环境变量
vi /etc/profile
## Java export JAVA_HOME=/home/centos/soft/jdk export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib export PATH=$PATH:$JAVA_HOME/bin ## Hadoop export HADOOP_USER_NAME=centos export HADOOP_HOME=/home/centos/soft/hadoop export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile
3.编辑hadoop-env.sh文件
1.编辑hadoop-env.sh文件export JAVA_HOME=/home/centos/soft/jdk
2.编辑core-site.xml文件
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://ns1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/centos/soft/hadoop/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>m3:2181,m4:2181,m5:2181</value> </property> <!-- 在Hive的hplsql功能中用到: Hadoop的代理接口与代理名, 其中centos为HDFS的主NameNode的用户, 根据实际情况修改 --> <property> <name>hadoop.proxyuser.centos.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.centos.groups</name> <value>*</value> </property> </configuration>
3.编辑hdfs-site.xml文件
<configuration> <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>m1:9000</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>m1:50070</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>m2:9000</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>m2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://m3:8485;m4:8485;m5:8485/ns1</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/centos/soft/hadoop/journal</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/centos/soft/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/centos/soft/hadoop/tmp/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/centos/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>heartbeat.recheck.interval</name> <value>2000</value> </property> <property> <name>dfs.heartbeat.interval</name> <value>1</value> </property> <property> <name>dfs.blockreport.intervalMsec</name> <value>3600000</value> <description>Determines block reporting interval in milliseconds.</description> </property> </configuration>
4.编辑mapred-site.xml文件
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>0.0.0.0:10020</value> <description>MapReduce JobHistory Server IPC host:port</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>0.0.0.0:19888</value> <description>MapReduce JobHistory Server Web UI host:port</description> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>1</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/user/history/done_intermediate</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/user/history</value> </property> </configuration>
5.编辑yarn-site.xml文件
<configuration> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>m1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>m2</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>m3:2181,m4:2181,m5:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark_shuffle</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>4096</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/home/centos/soft/hadoop/logs</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> <description>是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true</description> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> <description>是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true</description> </property> <property> <name>spark.shuffle.service.port</name> <value>7337</value> </property> </configuration>
6.编辑slaves文件
编辑slaves文件, slaves是指定子节点的位置, 在HDFS上为DataNode的节点位置, 在YARN上为NodeManager的节点位置, 以你的实际情况而定
m3 m4 m5
三.初始化Hadoop
1. 配置主机之间免密码登陆
(1)在m1上生产一对密匙ssh-keygen -t rsa
(2)将公钥拷贝到其他节点,包括本主机
ssh-coyp-id 127.0.0.1 ssh-coyp-id localhost ssh-coyp-id m1 ssh-coyp-id m2 ssh-coyp-id m3
(3)在其他主机上重复(1)(2)的操作
2.将配置好的hadoop拷贝到其他节点
scp -r /home/centos/soft/hadoop m2:/home/centos/soft/ scp -r /home/centos/soft/hadoop m3:/home/centos/soft/ scp -r /home/centos/soft/hadoop m4:/home/centos/soft/ scp -r /home/centos/soft/hadoop m5:/home/centos/soft/
注意:严格按照下面的步骤
3.启动zookeeper集群(分别在m3、m4、m5上启动zk)
启动zookeeper服务cd /home/centos/soft/zookeeper-3.4.5/bin/
./zkServer.sh start
查看状态:一个leader,两个follower
./zkServer.sh status
4.启动journalnode (分别在m3、m4、m5主机上执行, 必须在HDFS格式化前执行, 不然会报错)
(1)启动JournalNode服务cd /home/centos/soft/hadoop
sbin/hadoop-daemon.sh start journalnode
(2)运行jps命令检验,m3、m4、m5上多了JournalNode进程
jps
5.格式化HDFS(在m1上执行即可)
(1)在m1上执行命令:hdfs namenode -format
(2)格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/home/centos/soft/hadoop/tmp,然后将m1主机上的/home/centos/soft/hadoop下的tmp目录拷贝到m2主机上的/home/centos/soft/hadoop目录下
scp -r /home/centos/soft/hadoop/tmp/ m2:/home/centos/soft/hadoop/
6.格式化ZK(在m1上执行)
hdfs zkfc -formatZK
7.启动HDFS(在m1上执行)
sbin/start-dfs.sh
8.启动YARN(在m1,m2上执行)
sbin/start-yarn.sh
至此,Hadoop-2.6.5配置完毕!!!
四.检验Hadoop集群搭建成功
0.在Windows下编辑hosts文件, 配置主机名与IP的映射(此步骤可跳过)**
C:\Windows\System32\drivers\etc\hosts 192.168.179.201 m1 192.168.179.202 m2 192.168.179.203 m3 192.168.179.204 m4 192.168.179.205 m5
1.可以统计浏览器访问:
http://m1:50070 NameNode 'm1:9000' (active) http://m2:50070 NameNode 'm2:9000' (standby)
2.验证HDFS HA
(1)首先向hdfs上传一个文件hadoop fs -put /etc/profile /profile
(2)查看是否已上传到HDFS上
hadoop fs -ls /
(3)然后再kill掉active的NameNode
kill -9 <pid of NN>
(4)通过浏览器访问:http://m2:50070
NameNode 'm2:9000' (active) ## 主机m2上的NameNode变成了active
(5)执行命令:
hadoop fs -ls / ## 看之前在m1上传的文件是否还存在!!!
(6)手动在m1上启动挂掉的NameNode
sbin/hadoop-daemon.sh start namenode
(7)通过浏览器访问:http://m1:50070
NameNode 'm1:9000' (standby)
3.验证YARN:
用浏览器访问: http://m1:8088, 查看是否有NodeManager服务在运行运行一下hadoop提供的demo中的WordCount程序, 在linux上执行以下命令
hadoop jar /home/centos/soft/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount InputParameter OutputParameter
在http://m1:8088 上是否有application在运行,若有则YARN没问题
OK,大功告成!!!
相关文章推荐
- hadoop集群搭建二: hadoop+zookeeper 配置namenode与resourcemanager的高可用
- Hadoop2.7.2之集群搭建(高可用)
- Hadoop2.6.0 + Spark1.4.0 在Ubuntu14.10环境下的伪分布式集群的搭建(实践可用)
- 高可用Hadoop平台-HBase集群搭建
- 搭建高可用的hadoop分布式集群HA
- Hadoop入门之集群高可用HA的搭建及原理图
- Hadoop2.2.0 HA高可用分布式集群搭建(hbase,hive,sqoop,spark)
- [置顶] 高可用Hadoop集群的搭建
- hadoop2.5.2的本地模式、伪分布式集群、分布式集群和HDFS系统的高可用的环境搭建
- 利用ubuntu搭建高可用的hadoop集群
- hadoop集群搭建(二)YARN高可用
- centos7 搭建ha(高可用)hadoop集群
- 大数据 hadoop2.6.0 高可用集群搭建(HA集群搭建)--亲测可用,入门必备
- Hadoop2.7.2集群搭建详解(高可用)
- 利用ubuntu搭建高可用的hadoop集群系列之五——hadoop集群搭建
- centos7 搭建ha(高可用)hadoop2.7.3集群
- hadoop 单机/伪分布式/集群、分布式/Ha高可用搭建环境配置
- 搭建高可用的分布式hadoop2.5.2集群 HDFS HA
- 利用ubuntu12.04搭建高可用的hadoop集群
- centos7 搭建ha(高可用)hadoop2.7.3集群