您的位置:首页 > 运维架构 > Docker

在docker上部署Hadoop

2016-03-31 17:00 495 查看
一、构建docker镜像

1、 mkdir hadoop
2<span style="font-family: Arial, Helvetica, sans-serif;">、将hadoop-2.6.2.tar.gz复制到hadoop文件中</span>
3、vim Dockfile
FROM ubuntu
MAINTAINER Docker tianlei <393743083@qq.com>
ADD ./hadoop-2.6.2.tar.gz /usr/local/</span>
执行命令生成镜像:

docker build -t "ubuntu:base" .
运行镜像生成容器:

docker run -d -it --name hadoop ubuntu:hadoop
进入到镜像中进行操作:

docker exec -i -t hadoop /bin/bash
1、镜像中安装java
sodu apt-get update
sudo apt-get install openjdk-7-jre openjdk-7-jdk
更改环境变量

vim ~/.bashrc
加入此行:

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64


source ~/.bashrc


2、镜像中安装Hadoop
由于Hadop已经解压缩在/usr/local/中

vim ~/.bashrc
添加:

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
生成

source ~/.bashrc
修改环境变量

cd /usr/local/hadoop/etc/hadoop/
vim hadoop-env.sh
修改
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
在hadoop

目录下建立tmp、namenode、datanode

这里创建了三个目录,后续配置的时候会用到:
tmp:作为Hadoop的临时目录
namenode:作为NameNode的存放目录
datanode:作为DataNode的存放目录
进入到/etc目录下修改三个xml
1).core-site.xml配置

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/soft/apache/hadoop/hadoop-2.6.0/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>bin/start-all.sh
<final>true</final>
<description>The name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>


注意:
hadoop.tmp.dir
配置项值即为此前命令中创建的临时目录路径。
fs.default.name
配置为
hdfs://master:9000
,指向的是一个Master节点的主机(后续我们做集群配置的时候,自然会配置这个节点,先写在这里)
2).hdfs-site.xml配置
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/namenode</value>
<final>true</final>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/datanode</value>
<final>true</final>
</property>
</configuration>


注意:
我们后续搭建集群环境时,将配置一个Master节点和两个Slave节点。所以
dfs.replication
配置为2。
dfs.namenode.name.dir
dfs.datanode.data.dir
分别配置为之前创建的NameNode和DataNode的目录路径
3).mapred-site.xml配置

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
<description>The host and port that the MapReduce job tracker runs
at.  If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>

这里只有一个配置项
mapred.job.tracker
,我们指向master节点机器。

格式化namenode

hadoop namenode -format


3、安装ssh

sudo apt-get install ssh
在~/.bashrc中添加

#autorun
/usr/sbin/sshd
生成密钥

cd ~/
ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa
cd .ssh
cat id_dsa.pub >> authorized_keys
注:有时候会提示/var/run/sshd找不到,只要在run中创建一个sshd文件夹就行

进入到/etc/ssh的ssh_config中,添加

StrictHostKeyChecking no
UserKnownHostsFile /dev/null
</span>
4、生成安装hadoop的镜像

docker commit -m "hadoop install" hadoop ubuntu:hadoop


二、部署Hadoop分布式集群


启动master容器

docker run -d -ti -h master ubuntu:hadoop


启动slave1容器

docker run -d -ti -h slave1 ubuntu:hadoop


启动slave2容器

docker run -d -ti -h slave2 ubuntu:hadoop

在/etc/hosts中添加

10.0.0.5        master
10.0.0.6        slave1
10.0.0.7        slave2

在/usr/local/hadoop/etc/hadoop/slaves文件中添加

slave1
slave2
注:由于虚拟机内存不够

mapred-site.xml中添加

<property>
<name>mapreduce.map.memory.mb</name>
<value>500</value>
</property>


   
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: