您的位置：首页 > 运维架构 > Docker

在docker上部署Hadoop

2016-03-31 17:00 495 查看

一、构建docker镜像

1、 mkdir hadoop

2<span style="font-family: Arial, Helvetica, sans-serif;">、将hadoop-2.6.2.tar.gz复制到hadoop文件中</span>

３、vim Dockfile

FROM ubuntu
MAINTAINER Docker tianlei <393743083@qq.com>
ADD ./hadoop-2.6.2.tar.gz /usr/local/</span>

执行命令生成镜像：

docker build -t "ubuntu:base" .

运行镜像生成容器：

docker run -d -it --name hadoop ubuntu:hadoop

进入到镜像中进行操作：

docker exec -i -t hadoop /bin/bash

１、镜像中安装java

sodu apt-get update
sudo apt-get install openjdk-7-jre openjdk-7-jdk

更改环境变量

vim ~/.bashrc

加入此行：

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

source ~/.bashrc

２、镜像中安装Hadoop
由于Hadop已经解压缩在/usr/local/中

vim ~/.bashrc

添加：

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

生成

source ~/.bashrc

修改环境变量

cd /usr/local/hadoop/etc/hadoop/
vim hadoop-env.sh

修改

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

在hadoop

目录下建立tmp、namenode、datanode

这里创建了三个目录，后续配置的时候会用到：
tmp：作为Hadoop的临时目录
namenode：作为NameNode的存放目录
datanode：作为DataNode的存放目录
进入到/etc目录下修改三个xml
1).core-site.xml配置

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/soft/apache/hadoop/hadoop-2.6.0/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>bin/start-all.sh
<final>true</final>
<description>The name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>

注意：

hadoop.tmp.dir

配置项值即为此前命令中创建的临时目录路径。

fs.default.name

配置为

hdfs://master:9000

，指向的是一个Master节点的主机（后续我们做集群配置的时候，自然会配置这个节点，先写在这里）
2).hdfs-site.xml配置

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/namenode</value>
<final>true</final>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/datanode</value>
<final>true</final>
</property>
</configuration>

注意：
我们后续搭建集群环境时，将配置一个Master节点和两个Slave节点。所以

dfs.replication

配置为2。

dfs.namenode.name.dir

和

dfs.datanode.data.dir

分别配置为之前创建的NameNode和DataNode的目录路径
3).mapred-site.xml配置

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
<description>The host and port that the MapReduce job tracker runs
at.  If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>

这里只有一个配置项

mapred.job.tracker

，我们指向master节点机器。

格式化namenode

hadoop namenode -format

3、安装ssh

sudo apt-get install ssh

在~/.bashrc中添加

#autorun
/usr/sbin/sshd

生成密钥

cd ~/
ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa
cd .ssh
cat id_dsa.pub >> authorized_keys

注：有时候会提示/var/run/sshd找不到，只要在run中创建一个sshd文件夹就行

进入到/etc/ssh的ssh_config中，添加

StrictHostKeyChecking no
UserKnownHostsFile /dev/null
</span>

４、生成安装hadoop的镜像

docker commit -m "hadoop install" hadoop ubuntu:hadoop

二、部署Hadoop分布式集群

启动master容器

docker run -d -ti -h master ubuntu:hadoop

启动slave1容器

docker run -d -ti -h slave1 ubuntu:hadoop

启动slave2容器

docker run -d -ti -h slave2 ubuntu:hadoop

在/etc/hosts中添加

10.0.0.5        master
10.0.0.6        slave1
10.0.0.7        slave2

在/usr/local/hadoop/etc/hadoop/slaves文件中添加

slave1
slave2

注：由于虚拟机内存不够

在mapred-site.xml中添加

<property>
<name>mapreduce.map.memory.mb</name>
<value>500</value>
</property>

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航