您的位置:首页 > 大数据

DayDayUP_大数据学习课程[1]_hadoop2.6.0完全分布式集群环境和伪分布式集群搭建

2015-11-08 10:17 931 查看

1. 环境说明

系统 :Centos6.5

软件版本: hadoop2.6.0 jdk1.8

集群状态:

master: www 192.168.78.110

slave1: node1 192.168.78.111

slave2: node2 192.168.78.112

hosts 文件

192.168.78.110 www

192.168.78.111 node1

192.168.78.112 node2

确保三台机器之间互ping 主机名能ping通

2. 下载 hadoop2.6.0 和jdk1.8

[root@www ~]# wget http://download.oracle.com/otn-pub/java/jdk/8u65-b17/jdk-8u65-linux-x64.rpm?AuthParam=1446899640_8da8d9b13f8bbe63b3bc0bc80b730f55 //下载后将.rpm后面的乱码去掉
[root@www ~]# wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.2/hadoop-2.6.2.tar.gz[/code] 

3. 配置java环境

3.1 安装jdk

# rpm -ivh jdk-8u45-linux-i586.rpm


3.2 配置java环境变量

[root@www ~]# vimx /etc/profile
#set java environment
export JAVA_HOME=/usr/java/jdk1.8.0_45  //注意若下载了其他版本,注意变通
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export JAVA_HOME CLASSPATH PATH
[root@www ~]# source !$


3.3 测试java环境

[root@www ~]# java -version


java version "1.8.0_65"

Java(TM) SE Runtime Environment (build 1.8.0_65-b17)

Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)


[root@www ~]# javac -version


javac 1.8.0_65


4. 安装hadoop

4.1 解压安装

[root@www opt]# tar -xzvf hadoop-2.6.2.tar.gz
[root@www opt]# mkdir /opt/hadoop
[root@www src]# mv hadoop-2.6.2  /opt/hadoop
[root@www src]# cd /opt/hadoop/hadoop-2.6.2
[root@www hadoop-2.6.2]# ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share


4.2 添加hadoop用户

[root@www hadoop-2.6.2]# useradd hadoop
[root@www hadoop-2.6.2]# passwd hadoop
[root@www hadoop-2.6.2]# chown -R hadoop:hadoop /opt/hadoop


4.3 修改hadoop配置文件

[root@www hadoop-2.6.2]# su - hadoop //切换为hadoop用户
[hadoop@www ~]$ mkdir -p ~/hadoop/tmp ~/dfs/data ~/dfs/name //这些目录后期要用
[hadoop@www ~]$ ls
dfs  hadoop
[hadoop@www ~]$ cd /opt/hadoop/hadoop-2.6.2/


4.3.1 配置 hadoop-env.sh文件–>修改JAVA_HOME

[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_65


4.3.2 配置 yarn-env.sh 文件–>>修改JAVA_HOME

[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/yarn-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_65


4.3.3 配置slaves文件–>>增加slave节点

[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/slaves
node1
node2


4.3.4 配置 core-site.xml文件–>>增加hadoop核心配置(hdfs文件端口是9000、file:/home/hadoop/opt/hadoop-2.6.0/tmp、)

[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://www:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file: /home/hadoop/hadoop/tmp</value>
<description>Abasefor other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
</configuration>


4.3.5 配置 hdfs-site.xml 文件–>>增加hdfs配置信息(namenode、datanode端口和目录位置)

[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/hdfs-site.xml
<configuration>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>www:9001</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/dfs/data</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/dfs/name</value>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
</property>

<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///home/hadoop/hadoop/hdfs/namesecondary</value>
</property>

</configuration>


4.3.6 配置 mapred-site.xml 文件–>>增加mapreduce配置(使用yarn框架、jobhistory使用地址以及web地址)

[hadoop@www hadoop-2.6.2]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>www:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>www:19888</value>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/home/hadoop/hadoop</value>
</property>
</configuration>


4.3.7 配置 yarn-site.xml 文件–>>增加yarn功能

[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>www:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>www:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>www:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>www:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>www:8088</value>
</property>

</configuration>


4.3.8 将所有文件(hadoop2.6.0和hosts)复制到node1 和node2 上

4.4.1 设置ssh免密码登陆

在三台服务器上分别执行

[hadoop@www ~]$ ssh-keygen -t rsa //直接回车不用设置密码
[hadoop@node2 ~]$ ssh-copy-id -i  ~/.ssh/id_rsa.pub hadoop@192.168.78.110
[hadoop@node2 ~]$ ssh-copy-id -i  ~/.ssh/id_rsa.pub hadoop@192.168.78.111
[hadoop@node2 ~]$ ssh-copy-id -i  ~/.ssh/id_rsa.pub hadoop@192.168.78.112


4.4.2 测试ssh免密码登录

在三台服务器上分别执行

[hadoop@node2 ~]$ ssh www
[hadoop@node2 ~]$ ssh node1
[hadoop@node2 ~]$ ssh node2


验证hadoop

5.1 格式化namenode

[hadoop@www hadoop-2.6.2]$ ./bin/hadoop namenode -format
[hadoop@node1 hadoop-2.6.2]$ ./bin/hadoop namenode -format
[hadoop@node2 hadoop-2.6.2]$ ./bin/hadoop namenode -format


5.2 启动hadoop

启动所有

[hadoop@www hadoop-2.6.2]$ ./sbin/start-all.sh//任意一台执行即可


正确的进程情况

master:

[hadoop@www hadoop-2.6.2]$ jps
7136 ResourceManager
6993 SecondaryNameNode
6819 NameNode
7399 Jps


slave:

[hadoop@node1 hadoop-2.6.2]$ jps
3186 Jps
3064 NodeManager
2974 DataNode


6 运行wordcount程序

6.1 创建目录和文件

[hadoop@node1 hadoop-2.6.2]$ mkdir input
[hadoop@node1 hadoop-2.6.2]$ touch input/test.log
[hadoop@node1 hadoop-2.6.2]$ echo "hello world hello hadoop" > input/test.log
[hadoop@node1 hadoop-2.6.2]$ cat input/test.log
hello world hello hadoop


6.2 在hdfs创建/input目录

[hadoop@node1 hadoop-2.6.2]$ ./bin/hadoop fs  -mkdir /input


6.3 将test.log文件copy到hdfs /input目录

[hadoop@www hadoop-2.6.2]$ ./bin/hadoop fs  -put input/ /


6.4 查看hdfs上是否有test.log文件

[hadoop@www hadoop-2.6.2]$ ./bin/hadoop fs  -ls /input


15/11/08 17:59:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 1 items

-rw-r--r--   2 hadoop supergroup         25 2015-11-08 17:59 /input/test.log


6.5 执行wordcount程序

[hadoop@www hadoop-2.6.2]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /input /output


6.6 查看结果

[hadoop@www hadoop-2.6.2]$  ./bin/hadoop fs -cat /output/part-r-00000


15/11/08 18:07:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

hadoop  1

hello   2

world   1


7. 伪分布式集群环境的搭建

只需修改namenode的两个文件

7.1etc/hadoop/hdfs-site.xml

[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/hdfs-site.xml
<configuration>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>www:9001</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/dfs/data</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/dfs/name</value>
</property>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///home/hadoop/hadoop/hdfs/namesecondary</value>
</property>

</configuration>


7.2 etc/hadoop/slaves

[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/slaves


7.3 格式化namenode

[hadoop@www hadoop-2.6.2]$ ./bin/hadoop namenode -format


7.4 启动

[hadoop@www hadoop-2.6.2]$ ./sbin/start-all.sh


7.5 查看进程

[hadoop@www hadoop-2.6.2]$ jps
4048 NameNode
4545 NodeManager
4130 DataNode
4459 ResourceManager
5469 Jps
4286 SecondaryNameNode


7.6 上传文件

[hadoop@www hadoop-2.6.2]$ ./bin/hadoop fs  -put input/ /


7.7 运行wordcount

[hadoop@www hadoop-2.6.2]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /input /output


7.8 查看执行结果

[hadoop@www hadoop-2.6.2]$  ./bin/hadoop fs -cat /output/part-r-00000
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: