您的位置:首页 > 运维架构

Hadoop的安装与配置

2013-12-20 11:30 274 查看

1. 系统初始化

一)软件准备
1)VMware-workstation-full-8.0.4-744019.exe
(license key :NF64A-DMJ92-TZNQO-9VCXK-AAJJT)
2)(ubuntu-11.10-server-i386.iso)
3)hadoop-1.0.3.tar.gz(1.0.3)
4)ssh secure shell

二)安装VM

三)安装redhat

四)安装ssh secure shell
可以在本机登录虚拟机

五)设置hostname、hosts
vi/etc/sysconfig/network
修改:HOSTNAME=redhat1
$>hostnameredhat1(此命令后不用重启)
$>hostname (显示修改后的主机名)

vi/etc/hosts
修改:
127.0.0.1localhost
192.168.229.128 redhat1
#192.168.229.129 redhat2
#192.168.229.130 redhat3

六)安装SSH、免密码SSH设置
1) sudoapt-get install ssh(ubuntu)
yum install ssh(centos)

2)生成密钥对:ssh-keygen–t rsa
一路回车,文件保存在/root/.ssh里
3)进入.ssh目录,执行命令:
cpid_rsa.pub authorized_keys
sshlocalhost

七)安装与配置JDK
1、./jdk-6u26-linux-i586.bin
输入java javacjava -version版本有信息
2、安装vim软件,方便以后编辑文件apt-get install vim
(强烈建议安装,因为vi工具使用起来很不方便)
3、配置java环境:
1)vim /etc/profile 编辑profile文件
2)在文件末尾加入如下信息:
exportJAVA_HOME=/usr/java/jdk1.6.0_26

exportJRE_HOME=/usr/java/jdk1.6.0_26/jre

exportCLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

exportPATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATT
3)编辑完毕 :wq保存退出
4)source /etc/profile

八)安装Hadoop
1、tar -zvxf hadoop-1.0.3.tar.gz
mvhadoop-1.0.3 hadoop

2、编辑usr/hadoop2-0.20.2/conf/hadoop-env.sh文件
vim conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_26

3、reboot重启电脑
输入 hadoop version出现版本信息安装完毕

2. 三种工作模式

Ø 单机模式

单机模式是Hadoop的默认模式。当首次解压Hadoop的源码包时,Hadoop无法了解硬件安装环境,便保守地选择了最小配置。在这种默认情况下所有3个XML文件均为空。

当配置文件为空时,Hadoop会完全的运行在本地。因为不需要与其他节点交互,单机模式不使用HDFS,也不加载任何Hadoop的守护进程(不用运行任何.sh命令,如start-all.sh)。该模式主要用于开发调试MapReduce程序的应用逻辑,而不会与守护进程交互,避免引起额外的复杂性。

不做任何配置(连hadoop-env.sh里的jre都不用配置),直接运行bin/hadoop jar hadoop-examples-x.y.z.jar wordcount input output 命令就是单机模式运行方式,它输入输出都是本地文件。

[root@localhosthadoop]# mkdir input

[root@localhosthadoop]# cd input

[root@localhostinput]# ls

[root@localhostinput]# echo "hello lsr" > test1.txt

[root@localhostinput]# ls

test1.txt

[root@localhostinput]# echo "hello hadoop" > test2.txt

[root@localhostinput]# ls

test1.txt test2.txt

[root@localhostinput]# cd ../

[root@localhosthadoop]# bin/hadoop jar hadoop-examples-1.0.3.jar wordcount input output

12/10/2310:34:03 INFO util.NativeCodeLoader: Loaded the native-hadoop library

12/10/2310:34:03 INFO input.FileInputFormat: Total input paths to process : 2

12/10/2310:34:03 WARN snappy.LoadSnappy: Snappy native library not loaded

12/10/2310:34:03 INFO mapred.JobClient: Running job: job_local_0001

12/10/2310:34:04 INFO util.ProcessTree: setsid exited with exit code 0

12/10/2310:34:04 INFO mapred.Task: UsingResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@cd5f8b

12/10/2310:34:04 INFO mapred.MapTask: io.sort.mb = 100

12/10/2310:34:04 INFO mapred.JobClient: map 0%reduce 0%

12/10/2310:34:09 INFO mapred.MapTask: data buffer = 79691776/99614720

12/10/2310:34:09 INFO mapred.MapTask: record buffer = 262144/327680

12/10/2310:34:09 INFO mapred.MapTask: Starting flush of map output

12/10/2310:34:09 INFO mapred.MapTask: Finished spill 0

12/10/2310:34:09 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And isin the process of commiting

12/10/2310:34:10 INFO mapred.LocalJobRunner:

12/10/2310:34:10 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.

12/10/2310:34:10 INFO mapred.Task: UsingResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@c4fe76

12/10/2310:34:10 INFO mapred.MapTask: io.sort.mb = 100

12/10/2310:34:10 INFO mapred.MapTask: data buffer = 79691776/99614720

12/10/2310:34:10 INFO mapred.MapTask: record buffer = 262144/327680

12/10/2310:34:10 INFO mapred.MapTask: Starting flush of map output

12/10/2310:34:10 INFO mapred.MapTask: Finished spill 0

12/10/2310:34:10 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And isin the process of commiting

12/10/2310:34:10 INFO mapred.JobClient: map 100%reduce 0%

12/10/2310:34:13 INFO mapred.LocalJobRunner:

12/10/2310:34:13 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.

12/10/2310:34:13 INFO mapred.Task: UsingResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e28b9

12/10/2310:34:13 INFO mapred.LocalJobRunner:

12/10/2310:34:13 INFO mapred.Merger: Merging 2 sorted segments

12/10/2310:34:13 INFO mapred.Merger: Down to the last merge-pass, with 2 segments leftof total size: 51 bytes

12/10/2310:34:13 INFO mapred.LocalJobRunner:

12/10/2310:34:13 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And isin the process of commiting

12/10/2310:34:13 INFO mapred.LocalJobRunner:

12/10/2310:34:13 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed tocommit now

12/10/2310:34:13 INFO output.FileOutputCommitter: Saved output of task'attempt_local_0001_r_000000_0' to output

12/10/2310:34:16 INFO mapred.LocalJobRunner: reduce > reduce

12/10/2310:34:16 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.

12/10/2310:34:16 INFO mapred.JobClient: map 100%reduce 100%

12/10/2310:34:16 INFO mapred.JobClient: Job complete: job_local_0001

12/10/2310:34:16 INFO mapred.JobClient: Counters: 20

12/10/2310:34:16 INFO mapred.JobClient: FileOutput Format Counters

12/10/2310:34:16 INFO mapred.JobClient: BytesWritten=35

12/10/2310:34:16 INFO mapred.JobClient: FileSystemCounters

12/10/2310:34:16 INFO mapred.JobClient: FILE_BYTES_READ=428713

12/10/2310:34:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=528398

12/10/2310:34:16 INFO mapred.JobClient: FileInput Format Counters

12/10/2310:34:16 INFO mapred.JobClient: BytesRead=23

12/10/2310:34:16 INFO mapred.JobClient: Map-Reduce Framework

12/10/2310:34:16 INFO mapred.JobClient: Mapoutput materialized bytes=59

12/10/2310:34:16 INFO mapred.JobClient: Mapinput records=2

12/10/2310:34:16 INFO mapred.JobClient: Reduce shuffle bytes=0

12/10/2310:34:16 INFO mapred.JobClient: Spilled Records=8

12/10/23 10:34:16INFO mapred.JobClient: Map outputbytes=39

12/10/2310:34:16 INFO mapred.JobClient: Totalcommitted heap usage (bytes)=548130816

12/10/2310:34:16 INFO mapred.JobClient: CPUtime spent (ms)=0

12/10/2310:34:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=200

12/10/2310:34:16 INFO mapred.JobClient: Combine input records=4

12/10/2310:34:16 INFO mapred.JobClient: Reduce input records=4

12/10/2310:34:16 INFO mapred.JobClient: Reduce input groups=3

12/10/2310:34:16 INFO mapred.JobClient: Combine output records=4

12/10/2310:34:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=0

12/10/2310:34:16 INFO mapred.JobClient: Reduce output records=3

12/10/2310:34:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0

12/10/2310:34:16 INFO mapred.JobClient: Mapoutput records=4

[root@localhosthadoop]# cd output

[root@localhostoutput]# ls

part-r-00000 _SUCCESS

[root@localhostoutput]# cat part-r-00000

hadoop 1

hello 2

lsr 1

[root@localhostoutput]#

Ø 伪分布模式

1) 解压文件

tar xzvf hadoop-1.0.3.tar.gz

2) 修改配置文件

1、hadoop-env.sh

增加:

export JAVA_HOME=/soft/java/jdk1.6.0_13

2、core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://ubuntu:9000</value>

</property>

</configuration>

3、hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/hadoop/hadoopdata/tmp</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>/hadoop/hadoopdata/fs/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/hadoop/hadoopdata/fs/data</value>

</property>

</configuration>

4、mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>ubuntu:9001</value>

</property>

</configuration>

3) 格式文件系统

bin/hadoop namenode –format

4) 启动

root@ubuntu:/_nosql/hadoop# bin/start-all.sh

root@ubuntu:/_nosql/hadoop# jps

9117 JobTracker

8927 DataNode

9228 TaskTracker

9266 Jps

9037 SecondaryNameNode

8814 NameNode

5) 验证安装是否成功
http://ubuntu:50030 (MapReduce页面)
http://ubuntu:50070 (HDFS页面)

Ø 全分布(集群)模式

节点说明:

节点类型      节点IP      节点hostname

master节点     192.168.40.4 master

slave节点      192.168.40.5 salve1

192.168.40.6 salve2

192.168.40.7 slave3

secondaryName节点 192.168.40.4

配置步骤:

一、按本章第一节系统初始化安装一台虚拟机
二、用VMWare(manager中的clone) clone三台虚拟机
三、修改每台虚拟机的hosts,hostname
vi /etc/sysconfig/network

修改:HOSTNAME=master

$>hostname master(此命令后不用重启)

$>hostname (显示修改后的主机名)

vi /etc/hosts

修改:

127.0.0.1 localhost

192.168.40.4 master

192.168.40.5 salve1

192.168.40.6 salve2

192.168.40.7 slave3

四、修改测试四台虚拟机的ssh
① 生成密钥并配置ssh无密码登陆主机(在master主机)

ssh -keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

② 将authorized_keys文件拷贝到两台slave主机

scp authorized_keys slave1:~/.ssh/

scp authorized_keys slave2:~/.ssh/

③ 检查是否可以从master无密码登陆slave机

ssh slave1(在master主机输入)登陆成功则配置成功,exit退出slave1返回master

五、关闭四台虚拟机的防火墙
关闭防火墙

Shell代码

service iptables stop

机器重启后,防火墙还会开启。

关闭/开启Red hat防火墙

/* 关闭防火墙 */

service iptables stop

/* 开启防火墙 */

service iptables start

/* 默认关闭防火墙 */

chkconfig iptables off

六、配置hadoop
1、hadoop-env.sh

增加:

export JAVA_HOME=/_work/jdk

2、core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://master:9000</value>

</property>

</configuration>

3、hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/_work/hadoop/hadoopdata/tmp</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>/_work/hadoop/hadoopdata/fs/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/_work/hadoop/hadoopdata/fs/data</value>

</property>

</configuration>

4、mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>
5、masters

master
6、slavers

slaver1
slaver2
slaver3
七、启动hadoop
hadoop namenode -format

start-all.sh,或者执行start-dfs.sh,再执行start-mapred.sh

八、测试
master节点:

[root@master ~]# jps

25429 SecondaryNameNode

25500 JobTracker

25201 NameNode

18474 Jps

slave节点:

[root@slave1 ~]# jps

4469 TaskTracker

4388 DataNode

29622 Jps

hadoop fs -ls /

hadoop fs -mkdir /newDir

[root@slave1 hadoop-0.20.2]#hadoop jar hadoop-0.20.2-examples.jar pi 4 2

Number of Maps
= 4

Samples per Map = 2

Wrote input for Map #0

Wrote input for Map #1

Wrote input for Map #2

Wrote input for Map #3

Starting Job

12/05/20 09:45:19 INFO mapred.FileInputFormat:Total input paths to process : 4

12/05/20 09:45:19 INFOmapred.JobClient: Running job: job_201205190417_0005

12/05/20 09:45:20 INFOmapred.JobClient: map 0% reduce 0%

12/05/20 09:45:30 INFOmapred.JobClient: map 50% reduce 0%

12/05/20 09:45:31 INFOmapred.JobClient: map 100% reduce 0%

12/05/20 09:45:45 INFOmapred.JobClient: map 100% reduce 100%

12/05/20 09:45:47 INFOmapred.JobClient: Job complete: job_201205190417_0005

12/05/20 09:45:47 INFOmapred.JobClient: Counters: 18

12/05/20 09:45:47 INFOmapred.JobClient: Job Counters

12/05/20 09:45:47 INFOmapred.JobClient: Launched reducetasks=1

12/05/20 09:45:47 INFOmapred.JobClient: Launched maptasks=4

12/05/20 09:45:47 INFOmapred.JobClient: Data-local maptasks=4

12/05/20 09:45:47 INFOmapred.JobClient: FileSystemCounters

12/05/20 09:45:47 INFOmapred.JobClient: FILE_BYTES_READ=94

12/05/20 09:45:47 INFOmapred.JobClient: HDFS_BYTES_READ=472

12/05/20 09:45:47 INFOmapred.JobClient: FILE_BYTES_WRITTEN=334

12/05/20 09:45:47 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=215

12/05/20 09:45:47 INFOmapred.JobClient: Map-Reduce Framework

12/05/20 09:45:47 INFOmapred.JobClient: Reduce inputgroups=8

12/05/20 09:45:47 INFOmapred.JobClient: Combine outputrecords=0

12/05/20 09:45:47 INFOmapred.JobClient: Map input records=4

12/05/20 09:45:47 INFOmapred.JobClient: Reduce shufflebytes=112

12/05/20 09:45:47 INFOmapred.JobClient: Reduce outputrecords=0

12/05/20 09:45:47 INFOmapred.JobClient: Spilled Records=16

12/05/20 09:45:47 INFOmapred.JobClient: Map output bytes=72

12/05/20 09:45:47 INFOmapred.JobClient: Map input bytes=96

12/05/20 09:45:47 INFOmapred.JobClient: Combine inputrecords=0

12/05/20 09:45:47 INFO mapred.JobClient: Map output records=8

12/05/20 09:45:47 INFOmapred.JobClient: Reduce inputrecords=8

Job Finished in 28.952 seconds

Estimated value of Pi is3.50000000000000000000

说明:

NameNode和SecondNameNode分开部署的方法:

1、 master文件:

在master文件中直接写上SecondNameNode要部署的机器的主机名或IP

说明:master文件不决定哪个是NameNode,而决定的是SecondNameNode(决定谁是NameNode的关键是core-site.xml的fs.default.name这个参数)。

2、 hdfs-site.xml增加一个参数

<name>dfs.http.address</name>

<value>(NameNode的主机名或IP):50070</value>

3、 core-site.xml增加两个参数,一般使用默认就可以

fs.checkpoint.period表示多长时间记录一次HDFS的镜像,默认一个小时

<name>fs.checkpoint.period</name>

<value>3600</value>

fs.checkpoint.size表示一次记录多大的size,默认64M

<name>fs.checkpoint.size</name>

<size>67108864</size>

4、 验证

到目标机器上jps查看进程是否启动

进入hdfs-site.xml文件配置的fs.checkpoint.dir({fs.default.dir}/dfs/namesecondary文件夹下用ll命令查看文件,cd
current再用ll命令查看文件。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: