您的位置:首页 > Web前端

Hadoop-2.6.2 HA + Federation 生产环境集群搭建实例以及源码编译案例分享

2016-06-01 04:30 1006 查看

Hadoop HA + Federation 生产环境集群搭建实例

下载安装hadoop-2.6.2 //每台机器都要操作

[l下载安装hadoop-2.6.2](https://yunpan.cn/cSDHT9LnsNQFQ 访问密码 4c22)

[示例配置文件下载-七个配置文件](https://yunpan.cn/cSDFV3RdtX4zE 访问密码 a3bb)

core-site.xml

---

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into core-site.xml and change them -->
<!-- there.  If core-site.xml does not already exist, create it.      -->

<configuration>

<property>
<name>fs.defaultFS</name>
<value>hdfs://HTY-1:8020</value>
<description>The name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

</configuration>

---

fairscheduler.xml

---
<?xml version="1.0"?>
<allocations>

<queue name="infrastructure">
<minResources>102400 mb, 50 vcores </minResources>
<maxResources>153600 mb, 100 vcores </maxResources>
<maxRunningApps>200</maxRunningApps>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
<weight>1.0</weight>
<aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
</queue>

<queue name="tool">
<minResources>102400 mb, 30 vcores</minResources>
<maxResources>153600 mb, 50 vcores</maxResources>
</queue>

<queue name="sentiment">
<minResources>102400 mb, 30 vcores</minResources>
<maxResources>153600 mb, 50 vcores</maxResources>
</queue>

</allocations>

---
hadoop-env.sh//加入下面配置行

---

export JAVA_HOME=/root/hadoop/jdk1.7.0_67

---
hdfs-site.xml
---

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into hdfs-site.xml and change them -->
<!-- there.  If hdfs-site.xml does not already exist, create it.      -->

<configuration>

<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster1,hadoop-cluster2</value>
<description>
Comma-separated list of nameservices.
</description>
</property>

<!--
hadoop cluster1
-->

<property>
<name>dfs.ha.namenodes.hadoop-cluster1</name>
<value>nn1,nn2</value>
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
</description>
</property>

<property>
<name>dfs.namenode.rpc-address.hadoop-cluster1.nn1</name>
<value>HTY-1:8020</value>
<description>
RPC address for nomenode1 of hadoop-cluster1
</description>
</property>

<property>
<name>dfs.namenode.rpc-address.hadoop-cluster1.nn2</name>
<value>HTY-2:8020</value>
<description>
RPC address for nomenode2 of hadoop-test
</description>
</property>

<property>
<name>dfs.namenode.http-address.hadoop-cluster1.nn1</name>
<value>HTY-1:50070</value>
<description>
The address and the base port where the dfs namenode1 web ui will listen on.
</description>
</property>

<property>
<name>dfs.namenode.http-address.hadoop-cluster1.nn2</name>
<value>HTY-2:50070</value>
<description>
The address and the base port where the dfs namenode2 web ui will listen on.
</description>
</property>

<!--
hadoop cluster2
-->
<property>
<name>dfs.ha.namenodes.hadoop-cluster2</name>
<value>nn3,nn4</value>
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
</description>
</property>

<property>
<name>dfs.namenode.rpc-address.hadoop-cluster2.nn3</name>
<value>HTY-3:8020</value>
<description>
RPC address for nomenode1 of hadoop-cluster1
</description>
</property>

<property>
<name>dfs.namenode.rpc-address.hadoop-cluster2.nn4</name>
<value>HTY-4:8020</value>
<description>
RPC address for nomenode2 of hadoop-test
</description>
</property>

<property>
<name>dfs.namenode.http-address.hadoop-cluster2.nn3</name>
<value>HTY-3:50070</value>
<description>
The address and the base port where the dfs namenode1 web ui will listen on.
</description>
</property>

<property>
<name>dfs.namenode.http-address.hadoop-cluster2.nn4</name>
<value>HTY-4:50070</value>
<description>
The address and the base port where the dfs namenode2 web ui will listen on.
</description>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:///root/hadoop/hadoop-2.6.2/hdfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage).  If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>

<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://HTY-2:8485;HTY-3:8485;HTY-4:8485/hadoop-cluster2</value>
<description>A directory on shared storage between the multiple namenodes
in an HA cluster. This directory will be written by the active and read
by the standby in order to keep the namespaces synchronized. This directory
does not need to be listed in dfs.namenode.edits.dir above. It should be
left empty in a non-HA cluster.
</description>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:///root/hadoop/hadoop-2.6.2/hdfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks.  If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>

<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>false</value>
<description>
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
</description>
</property>

<property>
<name>dfs.journalnode.edits.dir</name>
<value>/root/hadoop/hadoop-2.6.2/hdfs/journal/</value>
</property>

</configuration>

---

mapred-site.xml

---
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into mapred-site.xml and change them -->
<!-- there.  If mapred-site.xml does not already exist, create it.      -->

<configuration>

<!-- MR YARN Application properties -->

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>

<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>HTY-2:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>HTY-2:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>

</configuration>

---
slaves //非master节点之外的其它三个节点
---
HTY-2
HTY-3
HTY-4

---
yarn-site.xml
---
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into yarn-site.xml and change them -->
<!-- there.  If yarn-site.xml does not already exist, create it.      -->

<configuration>

<!-- Resource Manager Configs -->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>HTY-1</value>
</property>

<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>

<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>

<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>

<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>

<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>

<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
<description>fair-scheduler conf location</description>
<name>yarn.scheduler.fair.allocation.file</name>
<value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>
</property>

<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/root/hadoop/hadoop-2.6.2/yarn/local</value>
</property>

<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>

<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>30720</value>
</property>

<property>
<description>Number of CPU cores that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
</property>

<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

</configuration>


下载安装jdk7 //每台机器都要操作

jdk7-linux-64位下载

修改环境变量 vim /etc/profile

export JAVA_HOME=/root/hadoop/jdk1.7.0_6

export CLASSPATH=.:JAVAHOME/lib/dt.jar:JAVA_HOME/lib/tools.jar

使修改生效source /etc/profile

配置ssh(免密码登陆,不配置此项集群启动时每一项都会要求重新输入密码)

如何建立SSH互信:假设有三台主机host1,host2,host3
(1) 关闭防火墙和SELinux
#/sbin/service iptables stop
该命令可以关闭防火墙,但是当重启后,防火墙会重新开启,输入下面的命令,让防火墙在重启后也不会开启。
#chkconfig --level 35 iptables off
关闭SELINUX
#vim /etc/selinux/config
编辑,令SELINUX=disabled。保存退出。
立即生效。
分别在host1,host2和host3上执行上述命令。
(2) 修改SSH配置文件
#vim /etc/ssh/sshd_config
找到下列行 去掉注释井号#
RSAAuthentication yes //字面意思..允许RSA认证
PubkeyAuthentication yes //允许公钥认证
AuthorizedKeysFile .ssh/authorized_keys //公钥存放在.ssh/au..文件中
保存退出。
修改后需要重启ssh
#/etc/init.d/sshd restart //或者 service sshd restart
分别在host1,host2和host3上执行上述命令。
(3) 生成密码对
$ ssh-keygen -t rsa
直接回车几次,可在默认路径~/.ssh/下生成私钥idrsa公钥idrsa.pub。
分别在host1,host2和host3上执行上述命令
(4) 生成authorized_keys
将host2、host3的公钥传到host1上。
在host2上输入
$scp /home/hadoop/.ssh/id_rsa.pub hadoop@host1:~/.ssh/id_rsa.pub.host2
在host3上输入
$scp /home/hadoop/.ssh/id_rsa.pub hadoop@host1:~/.ssh/id_rsa.pub.host3
以上命令目前还需要输入目标机用户密码。
在host1上输入
$cd ~/.ssh/
$ls
查看idrsa.pub.host2、idrsa.pub.host3是否已经传输过来。
$ cat id_rsa.pub >> authorized_keys
$ cat id_rsa.pub.host2 >> authorized_keys
$ cat id_rsa.pub.host3 >> authorized_keys
生成authorized_keys。
给authorized_keys修改权限
#chmod 644 authorized_keys
利用scp把该文件传送到host2、host3的.ssh/下
#scp authorized_keys hadoop@host2:~/.ssh/
#scp authorized_keys hadoop@host3:~/.ssh/
(5) 验证
测试在host1下输入
$ssh host2
$exit
$ssh host3
$exit
应该都不需要密码。 这三个互联都应该不需要密码。
上面是三台机器,更多台机器也可以这样操作。


重新编译hadoop源码(apache hadoop 在操作hdfs时,直接解压运行的集群存在问题,主要解决binary not load 问题)

hadoop-2.6.2源码下载

源码编译环境搭建所需插件protobuf 2.5.0下载

[本人自己编译的hadoop-2.6.2集群 native 文件夹下需要替换的文件下载](https://yunpan.cn/cSD3sbk7wmhYL 访问密码 8c2a)

如果具体细节有任何问题欢迎留言!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hadoop