64位Ubuntu1404集群安装配置hadoop-2.2.0
2014-06-14 13:46
489 查看
1 先决条件
确保在你集群中的每个节点上都安装了所有必需软件:sun-JDK ,ssh,HadoopJavaTM1.7.x,必须安装,建议选择Sun公司发行的Java版本。
ssh 必须安装并且保证 sshd一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。
2 实验环境搭建
2.1 准备工作
操作系统:Ubuntu部署:Vmvare
在vmvare安装好一台Ubuntu虚拟机后,可以导出或者克隆出另外两台虚拟机。
说明:
保证虚拟机的ip和主机的ip在同一个ip段,这样几个虚拟机和主机之间可以相互通信。
为了保证虚拟机的ip和主机的ip在同一个ip段,虚拟机连接设置为桥连。
准备机器:一台master,若干台slave,配置每台机器的/etc/hosts保证各台机器之间通过机器名可以互访,例如:
192.168.0.107 cloud001(master)
192.168.0.108 cloud002
(slave1)
192.168.0.109 cloud003
(slave2)
为保证环境一致先安装好JDK和ssh
配置修改hosts
$ sudo vi /etc/hosts
127.0.0.1 localhost
192.168.0.107 cloud001
#127.0.1.1 cloud001
192.168.0.108 cloud002
192.168.0.109 cloud003
2.2 安装JDK
略...
2.3 创建用户
$ useradd hadoop$ cd /home/hadoop
在所有的机器上都建立相同的目录,也可以就建立相同的用户,最好是以该用户的home路径来做hadoop的安装路径。
例如在所有的机器上的安装路径都是:/home/hadoop/hadoop-2.2.0,这个不需要mkdir,在/home/hadoop/下解压hadoop包的时候,会自动生成)
(最好不要使用root安装,因为不推荐各个机器之间使用root访问 )
2.4 安装ssh和配置
1) 安装:sudo apt-get install ssh这个安装完后,可以直接使用ssh命令 了。
执行$ netstat -nat 查看22端口是否开启了。
测试:ssh localhost。
输入当前用户的密码,回车就ok了。说明安装成功,同时ssh登录需要密码。
(这种默认安装方式完后,默认配置文件是在/etc/ssh/目录下。sshd配置文件是:/etc/ssh/sshd_config):
注意:在所有机子都需要安装ssh。
2) 配置:
在Hadoop启动以后,Namenode是通过SSH(Secure Shell)来启动和停止各个datanode上的各种守护进程的,这就须要在节点之间执行指令的时候是不须要输入密码的形式,故我们须要配置SSH运用无密码公钥认证的形式。
以本文中的三台机器为例,现在cloud001是主节点,他须要连接cloud002和cloud003。须要确定每台机器上都安装了ssh,并且datanode机器上sshd服务已经启动。
( 说明:hadoop@hadoop~]$ssh-keygen -t rsa
这个命令将为hadoop上的用户hadoop生成其密钥对,询问其保存路径时直接回车采用默认路径,当提示要为生成的密钥输入passphrase的时候,直接回车,也就是将其设定为空密码。生成的密钥对id_rsa,id_rsa.pub,默认存储在/home/hadoop/.ssh目录下然后将id_rsa.pub的内容复制到每个机器(也包括本机)的/home/dbrg/.ssh/authorized_keys文件中,如果机器上已经有authorized_keys这个文件了,就在文件末尾加上id_rsa.pub中的内容,如果没有authorized_keys这个文件,直接复制过去就行.)
3) 首先设置namenode的ssh为无需密码的、自动登录。
切换到hadoop用户( 保证用户hadoop可以无需密码登录,因为我们后面安装的hadoop属主是hadoop用户。)
$ su hadoop
cd /home/hadoop
$ ssh-keygen -t rsa
然后一直按回车
完成后,在home跟目录下会产生隐藏文件夹.ssh
$ cd .ssh
之后ls 查看文件
cp id_rsa.pub authorized_keys
测试:
$ssh localhost
或者:
$ ssh node1
第一次ssh会有提示信息:
The authenticity of host ‘node1 (10.64.56.76)’ can’t be established.
RSA key fingerprint is 03:e0:30:cb:6e:13:a8:70:c9:7e:cf:ff:33:2a:67:30.
Are you sure you want to continue connecting (yes/no)?
输入 yes 来继续。这会把该服务器添加到你的已知主机的列表中
发现链接成功,并且无需密码。
4 ) 复制authorized_keys到cloud002 和cloud003 上
为了保证cloud001可以无需密码自动登录到cloud002和cloud003,先在cloud002和cloud003上执行
$ su hadoop
cd /home/hadoop
$ ssh-keygen -t rsa
一路按回车.
然后回到cloud001,复制authorized_keys到cloud002 和cloud003
[hadoop@cloud001 .ssh]$ scp authorized_keys cloud002:/home/hadoop/.ssh/
[hadoop@cloud001 .ssh]$ scp authorized_keys cloud003:/home/hadoop/.ssh/
这里会提示输入密码,输入hadoop账号密码就可以了。
改动你的 authorized_keys 文件的许可权限
[hadoop@cloud001 .ssh]$chmod 644 authorized_keys
测试:ssh cloud002或者ssh cloud003(第一次需要输入yes)。
如果不须要输入密码则配置成功,如果还须要请检查上面的配置能不能正确。
2.5 安装Hadoop
由于hadoop集群中每个机器上面的配置基本相同,所以我们先在namenode上面进行配置部署,然后再复制到其他节点。所以这里的安装过程相当于在每台机器上面都要执行。1、 解压文件
将编译后的hadoop-2.2.0.tar.gz解压到/home/hadoop路径下(或者将在64位机器上编译的结果存放在此路径下)。然后为了节省空间,可删除此压缩文件,或将其存放于其他地方进行备份。
注意:每台机器的安装路径要相同!!
2、 hadoop配置过程
配置之前,需要在cloud001本地文件系统创建以下文件夹:
~/dfs/name
~/dfs/data
~/tmp
这里要涉及到的配置文件有7个:
~/hadoop-2.2.0/etc/hadoop/hadoop-env.sh
~/hadoop-2.2.0/etc/hadoop/yarn-env.sh
~/hadoop-2.2.0/etc/hadoop/slaves
~/hadoop-2.2.0/etc/hadoop/core-site.xml
~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
~/hadoop-2.2.0/etc/hadoop/mapred-site.xml
~/hadoop-2.2.0/etc/hadoop/yarn-site.xml
以上个别文件默认不存在的,可以复制相应的template文件获得。
配置文件1:hadoop-env.sh
修改JAVA_HOME值(export JAVA_HOME=/usr/java/jdk1.7.0_55)
配置文件2:yarn-env.sh
修改JAVA_HOME值(export JAVA_HOME=/usr/java/jdk1.7.0_55)
配置文件3:slaves(这个文件里面保存所有slave节点)
写入以下内容:
cloud002
cloud003
配置文件4:core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cloud001:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>Should native hadoop libraries, if present, be used.</description>
</property>
</configuration>
配置文件5:hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>cloud001:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
</configuration>
配置文件6:mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>cloud001:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>cloud001:19888</value>
</property>
</configuration>
配置文件7:yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>cloud001:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>cloud001:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>cloud001:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>cloud001:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>cloud001:8088</value>
</property>
</configuration>
3、复制到其他节点
这里可以写一个shell脚本进行操作(有大量节点时比较方便)
cp2slave.sh
#!/bin/bash
$ scp -r /home/hadoop/hadoop-2.2.0 hadoop@cloud002:/home/hadoop/
$ scp -r /home/hadoop/hadoop-2.2.0 hadoop@cloud003:/home/hadoop/
也可以复制替换相关配置文件:
Cp2slave2.sh
#!/bin/bash
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/slaves hadoop@cloud002:~/hadoop-2.2.0/etc/hadoop/slaves
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/slaves hadoop@cloud003:~/hadoop-2.2.0/etc/hadoop/slaves
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/core-site.xml hadoop@cloud002:~/hadoop-2.2.0/etc/hadoop/core-site.xml
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/core-site.xml hadoop@cloud003:~/hadoop-2.2.0/etc/hadoop/core-site.xml
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml hadoop@cloud002:~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml hadoop@cloud003:~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/mapred-site.xml hadoop@cloud002:~/hadoop-2.2.0/etc/hadoop/mapred-site.xml
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/mapred-site.xml hadoop@cloud003:~/hadoop-2.2.0/etc/hadoop/mapred-site.xml
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-site.xml hadoop@cloud002:~/hadoop-2.2.0/etc/hadoop/yarn-site.xml
scp /home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-site.xml hadoop@cloud003:~/hadoop-2.2.0/etc/hadoop/yarn-site.xml
4、启动验证
4.1 启动hadoop
进入安装目录: cd ~/hadoop-2.2.0/
格式化namenode:./bin/hdfs namenode –format
启动hdfs: ./sbin/start-dfs.sh
此时在001上面运行的进程有:namenode secondarynamenode
002和003上面运行的进程有:datanode
启动yarn: ./sbin/start-yarn.sh
此时在001上面运行的进程有:namenode secondarynamenode resourcemanager
002和003上面运行的进程有:datanode nodemanaget
查看集群状态:./bin/hdfs dfsadmin –report
查看文件块组成: ./bin/hdfs fsck / -files -blocks
查看HDFS: http://192.168.0.107:50070
查看RM: http://192.168.0.107:8088
4.2 运行示例程序:
完成Hadoop2.2.0集群环境搭建之后需要利用一个例子程序来检验hadoop2的mapreduce的功能
.首先先在一个文件夹里面建立两个文件file01.txt和file02.txt里面加入如下内容
file01.txt
hello
hehe
hey
haha
miaomiao
file02.txt
hello world
hehe
haha
miaomiao
heihei
h
hh
hey
houhou
hadoop
hbase
hawk
pengfei
hadoop的shell脚本学习:
./bin/hadoop fs -
ls/
//
查看hdfs目录情况
./bin/hadoop fs -
mkdir
-p /input
//-p
这个参数是必须加入的hadoop2和之前的版本是不一样的
./bin/hadoop fs -put
file
*.txt /input //将刚才的两个文件放入到hadoop的文件系统之中
./bin/hadoop fs -
cat/
input
/file01
.txt //查看文件内容
./bin/hadoop fs -
rm
-r /input/file02.txt //删除文件命令
先在hdfs上创建一个文件夹并导入文件
$./bin/hdfs dfs –mkdir /input
$./bin/hadoop fs -put
file
*.txt /input
$./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output
运行过程摘录:
$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
wordcount /input /output
14/06/14 17:08:41 INFO client.RMProxy: Connecting to ResourceManager at cloud001/192.168.0.107:8032
14/06/14 17:08:43 INFO input.FileInputFormat: Total input paths to process : 2
14/06/14 17:08:43 INFO mapreduce.JobSubmitter: number of splits:2
14/06/14 17:08:43 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/06/14 17:08:43 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/06/14 17:08:43 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/06/14 17:08:43 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/06/14 17:08:43 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/06/14 17:08:43 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/06/14 17:08:43 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/06/14 17:08:43 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/06/14 17:08:43 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/06/14 17:08:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/06/14 17:08:43 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/06/14 17:08:43 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/06/14 17:08:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1402718202415_0002
14/06/14 17:08:44 INFO impl.YarnClientImpl: Submitted application application_1402718202415_0002 to ResourceManager at cloud001/192.168.0.107:8032
14/06/14 17:08:44 INFO mapreduce.Job: The url to track the job: http://cloud001:8088/proxy/application_1402718202415_0002/
14/06/14 17:08:44 INFO mapreduce.Job: Running job: job_1402718202415_0002
14/06/14 17:08:54 INFO mapreduce.Job: Job job_1402718202415_0002 running in uber mode : false
14/06/14 17:08:54 INFO mapreduce.Job: map 0% reduce 0%
14/06/14 17:09:12 INFO mapreduce.Job: map 50% reduce 0%
14/06/14 17:09:13 INFO mapreduce.Job: map 100% reduce 0%
14/06/14 17:10:00 INFO mapreduce.Job: map 100% reduce 100%
14/06/14 17:10:01 INFO mapreduce.Job: Job job_1402718202415_0002 completed successfully
14/06/14 17:10:02 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=229
FILE: Number of bytes written=238142
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=313
HDFS: Number of bytes written=108
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=30879
Total time spent by all reduces in occupied slots (ms)=43860
Map-Reduce Framework
Map input records=18
Map output records=19
Map output bytes=185
Map output materialized bytes=235
Input split bytes=204
Combine input records=19
Combine output records=19
Reduce input groups=14
Reduce shuffle bytes=235
Reduce input records=19
Reduce output records=14
Spilled Records=38
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=666
CPU time spent (ms)=5390
Physical memory (bytes) snapshot=483500032
Virtual memory (bytes) snapshot=1987952640
Total committed heap usage (bytes)=257171456
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=109
File Output Format Counters
Bytes Written=108
查看运行结果如下:
$./bin/hadoop fs -ls /output
Found 2 items
-rw-r--r-- 3 hadoop supergroup 0 2014-06-14 17:09 /output/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 108 2014-06-14 17:09 /output/part-r-00000
$ ./bin/hadoop fs -cat /output/part-r-00000
h 1
hadoop 1
haha 2
hawk 1
hbase 1
hehe 2
heihei 1
hello 2
hey 2
hh 1
houhou 1
miaomiao 2
pengfei 1
world 1
相关文章推荐
- Ubuntu系统(64位)下安装并配置Hadoop-2.2.0集群
- hadoop-2.2.0全分布集群安装与配置(接上篇伪分布式)
- 64位ubuntu 12.04 LTS 一步一步安装 hadoop2.2.0
- Hadoop2.2.0 CentOS 32/64位 集群配置
- Hadoop-2.4.1 ubuntu集群安装配置教程
- ubuntu64位hadoop2.2.0全分布安装部署
- hadoop 2.2.0 集群模式安装配置和测试
- Ubuntu系统下Hadoop 2.0.4集群安装配置
- ubuntu14.04下hadoop2.2.0集群安装
- Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程~(心血之作啊~~)
- Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程
- 在Ubuntu 13.10 中安装配置 Hadoop 2.2.0
- ubuntu安装Ganglia监控Hadoop及Hbase集群性能(安装配置)
- Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程
- 64位Ubuntu1404编译hadoop-2.2.0
- Hadoop-2.2.0集群安装配置实践
- Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程
- hadoop2.2.0集群安装和配置
- Ubuntu13.10 hadoop2.2.0 多机集群配置及wordcount 运行
- hadoop-2.2.0伪分布式与(全分布集群安装于配置续,很详细的哦~)