您的位置:首页 > 运维架构

hadoop安装配置:使用cloudrea

2012-01-13 17:14 423 查看
这里使用cloudrea的rpm源,安装hadoop

环境为:

192.168.255.132 test01.linuxjcq.com =》master

192.168.255.133 test02.linuxjcq.com =》slave01

192.168.255.134 test03.linuxjcq.com =》slave02

每台主机中的/etc/hosts文件有以上配置和基本的java环境设置,使用的java包为openjdk

1. 安装cloudrea

wget http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm -P /usr/local/src

yum localinstall --nogpgcheck /usr/local/src/cdh3-repository-1.0-1.noarch.rpm

rpm --import http://archive.cloudera.com/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

2. 安装hadoop包

yum install hadoop-0.20 hadoop-0.20-namenode hadoop-0.20-secondarynamenode hadoop-0.20-datanode hadoop-0.20-jobtracker hadoop-0.20-tasktracker hadoop-0.20-source

将hadoop按照各个部分的功能分为了

source:hadoop-0.20-source

base:hadoop-0.20

namenode:hadoop-0.20-namenode

secondnamenode:hadoop-0.20-secondarynamenode

jobtracker:hadoop-0.20-jobtracker

tasktracker:hadoop-0.20-tasktracker

同时会默认添加两个用户和一个组

hdfs用户用于操作hdfs文件系统

mapred用户用于mapreduce工作

这两个用户都属于hadoop组,不存在hadoop用户。

以上1,2在每一个节点都需进行操作

3. 配置master节点

a. 创建配置

cloudrea配置可以通过alternatives工具

cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.my_cluster

复制配置文件

alternatives --display hadoop-0.20-conf

alternatives --install /etc/hadoop-0.20/conf

hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster 50

查看配置,并安装新的配置

alternatives --display hadoop-0.20-conf

hadoop-0.20-conf - status is auto.

link currently points to /etc/hadoop-0.20/conf.my_cluster

/etc/hadoop-0.20/conf.empty - priority 10

/etc/hadoop-0.20/conf.my_cluster - priority 50

Current `best' version is /etc/hadoop-0.20/conf.my_cluster.

确认安装了新配置

b. 设置java主目录

vi hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64

JAVA_HOME为JAVA的主目录,可以使用OPENJDK

c. 设置core-site.xml

vi core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://test01.linuxjcq.com:9000/</value>
</property>
</configuration>

使用这个访问hdfs文件系统

d. 设置hdfs-site.xml

vi /etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/hdfs/data</value>
</property>
</configuration>

e. 设置mapred-site.xml

<configuration>
<property>
<name>mapred.system.dir</name>
<value>/mapred/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/data/hadoop/mapred/local</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>test01.linuxjcq.com:9001</value>
</property>
</configuration>

f. 设置secondnamenode和datanode

secondnamenode

vi /etc/hadoop/masters
test02.linuxjcq.com

datanode

test02.linuxjcq.com
test03.linuxjcq.com

g. 创建相应的目录

创建dfs.name.dir和dfs.data.dir

mkdir -p /data/hadoop/hdfs/{name,data}

创建mapred.local.dir

mkdir -p /data/hadoop/mapred/local

修改dfs.name.dir和dfs.data.dir拥有者为hdfs,组拥有者为hadoop,目录权限为0700

chown -R hdfs:hadoop /data/hadoop/hdfs/{name,data}

chmod -R 0700 /data/hadoop/hdfs/{name,data}

修改mapred.local.dir拥有者为mapred,组拥有者为hadoop,目录权限为755

chown -R mapred:hadoop /data/hadoop/mapred/local

chmod -R 0755 /data/hadoop/mapred/local

4. 配置secondnamenode和datanode节点

重复3中的步骤a-f

5. 在master节点上格式化namenode

sudo -u hdfs hadoop namenode -format

6. 启动节点

master启动namenode

service hadoop-0.20-namenode start

secondnamenode启动

service hadoop-0.20-secondnamenode start

启动各个数据节点

service hadoop-0.20-datanode start

7. 创建hdfs的/tmp目录和mapred.system.dir

sudo -u hdfs hadoop fs -mkdir /mapred/system

sudo -u hdfs hadoop fs -chown mapred:hadoop /mapred/system

sudo -u hdfs hadoop fs -chmod 700 /mapred/system

mapred.system.dir需要在jobtracker启动前创建

sudo -u hdfs hadoop dfs -mkdir /tmp

sudo -u hdfs hadoop dfs -chmod -R 1777 /tmp

8. 开启mapreduce

在datanode节点上执行

service hadoop-0.20-tasktracker start

在namenode节点上启动jobtracker

service hadoop-0.20-jobtasker start

9. 设置开机启动

namenode节点:需要启动的为namenode何jobtracker,关闭其他的服务

chkconfig hadoop-0.20-namenode on

chkconfig hadoop-0.20-jobtracker on

chkconfig hadoop-0.20-secondarynamenode off

chkconfig hadoop-0.20-tasktracker off

chkconfig hadoop-0.20-datanode off

datanode节点:需要启动datanode和tasktracker

chkconfig hadoop-0.20-namenode off

chkconfig hadoop-0.20-jobtracker off

chkconfig hadoop-0.20-secondarynamenode off

chkconfig hadoop-0.20-tasktracker on

chkconfig hadoop-0.20-datanode on

secondarynamenode节点:需要启动secondarynamenode

chkconfig hadoop-0.20-secondarynamenode on

说明:

这些hadoop包作为独立的服务启动,不需要通过ssh,也可以配置ssh,通过使用start-all.sh和stop-all.sh来管理服务。

本文出自 “linuxjcq” 博客,请务必保留此出处http://linuxjcq.blog.51cto.com/3042600/763472
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐