您的位置:首页 > 运维架构 > Docker

docker镜像制作之Dockerfile文件---hadooop伪分布式

2016-10-25 20:27 686 查看
1、Dockerfile代码段

FROM ubuntu:14.04
MAINTAINER SequenceIQ

USER root

# install dev tools
RUN apt-get update
RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync

# passwordless ssh
RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /home/hadoop/.ssh/id_rsa
RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

# java
#RUN mkdir -p /usr/java/default && \
# curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/7u51-b13/jdk-7u51-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
# tar --strip-components=1 -xz -C /usr/java/default/

RUN mkdir -p /usr/local/
COPY jdk-8u92-linux-x64.tar.gz /usr/local
RUN tar -zxvf /usr/local/jdk-8u92-linux-x64.tar.gz -C /usr/local/
RUN rm -rf /usr/local/jdk-8u92-linux-x64.tar.gz

ENV JAVA_HOME /usr/local/jdk1.8.0_92
ENV PATH $PATH:$JAVA_HOME/bin

# hadoop
#RUN curl -s http://www.eu.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz | tar -xz -C /usr/local/
#RUN cd /usr/local && ln -s ./hadoop-2.6.0 hadoop

COPY hadoop-2.7.3.tar.gz /usr/local/
RUN tar -zxvf /usr/local/hadoop-2.7.3.tar.gz -C /usr/local/
RUN rm -rf /usr/local/hadoop-2.7.3.tar.gz
ENV HADOOP_PREFIX /usr/local/hadoop-2.7.3
RUN sed -i '/^export JAVA_HOME/ s:.*:export JAVA_HOME=/usr/local/jdk1.8.0_92:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
RUN sed -i '/^export HADOOP_CONF_DIR/ s:.*:export HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop/:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh

RUN mkdir $HADOOP_PREFIX/input
RUN cp $HADOOP_PREFIX/etc/hadoop/*.xml $HADOOP_PREFIX/input

#pseudo distributed
ADD core-site.xml.template $HADOOP_PREFIX/etc/hadoop/core-site.xml.template
RUN sed s/HOSTNAME/localhost/ /usr/local/hadoop-2.7.3/etc/hadoop/core-site.xml.template > /usr/local/hadoop-2.7.3/etc/hadoop/core-site.xml
ADD hdfs-site.xml $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml

ADD mapred-site.xml $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
ADD yarn-site.xml $HADOOP_PREFIX/etc/hadoop/yarn-site.xml

RUN $HADOOP_PREFIX/bin/hdfs namenode -format

# fixing the libhadoop.so like a boss
RUN rm /usr/local/hadoop-2.7.3/lib/native/*
RUN curl -Ls http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.6.0.tar|tar -x -C /usr/local/hadoop-2.7.3/lib/native/

ADD ssh_config /root/.ssh/config
RUN chmod 600 /root/.ssh/config
RUN chown root:root /root/.ssh/config

# # installing supervisord
# RUN yum install -y python-setuptools
# RUN easy_install pip
# RUN curl https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -o - | python
# RUN pip install supervisor
# ADD supervisord.conf /etc/supervisord.conf

ADD bootstrap.sh /etc/bootstrap.sh
RUN chown root:root /etc/bootstrap.sh
RUN chmod 700 /etc/bootstrap.sh
ENV BOOTSTRAP /etc/bootstrap.sh

# workingaround docker.io build error
RUN ls -la /usr/local/hadoop-2.7.3/etc/hadoop/*-env.sh
RUN chmod +x /usr/local/hadoop-2.7.3/etc/hadoop/*-env.sh
RUN ls -la /usr/local/hadoop-2.7.3/etc/hadoop/*-env.sh

# fix the 254 error code
RUN sed -i "/^[^#]*UsePAM/ s/.*/#&/" /etc/ssh/sshd_config
RUN echo "UsePAM no" >> /etc/ssh/sshd_config
RUN echo "Port 2122" >> /etc/ssh/sshd_config

#RUN service ssh start && $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh && $HADOOP_PREFIX/sbin/start-dfs.sh && $HADOOP_PREFIX/bin/hdfs dfs -mkdir -p /user/root
#RUN service ssh start && $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh && $HADOOP_PREFIX/sbin/start-dfs.sh && $HADOOP_PREFIX/bin/hdfs dfs -put $HADOOP_PREFIX/etc/hadoop/ input

CMD ["/etc/bootstrap.sh", "-d"]

EXPOSE 50020 50090 50070 50010 50075 8031 8032 8033 8040 8042 49707 22 8088 8030


2、注意点;
(1)在Dockerfile的同级目录下要有:hadoop-2.7.3.tar.gz、jdk-8u92-linux-x64.tar.gz            文件

(2)在在Dockerfile的同级目录下要有bootstrap.sh文件,文件的内容如下:

#!/bin/bash

: ${HADOOP_PREFIX:=/usr/local/hadoop-2.7.3}

$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh

rm /tmp/*.pid

# installing libraries if any - (resource urls added comma separated to the ACP system variable)
cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do echo == $cp; curl -LO $cp ; done; cd -

# altering the core-site configuration
sed s/HOSTNAME/$HOSTNAME/ /usr/local/hadoop-2.7.3/etc/hadoop/core-site.xml.template > /usr/local/hadoop-2.7.3/etc/hadoop/core-site.xml

service ssh start
#$HADOOP_PREFIX/sbin/start-dfs.sh
#$HADOOP_PREFIX/sbin/start-yarn.sh
$HADOOP_PREFIX/sbin/start-all.sh

if [[ $1 == "-d" ]]; then
while true; do sleep 1000; done
fi

if [[ $1 == "-bash" ]]; then
/bin/bash
fi
(3)在Dockerfile的同级目录下要有hdfs-site.xml文件,文件内容如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>dfs.namenode.servicerpc-bind-host</name>
<value>0.0.0.0</value>
</property>
</configuration>
(4)在Dockerfile的同级目录下要有mapred-site.xml文件,文件内容如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>


(5)在Dockerfile的同级目录下要有yarn-site.xml,文件内容如下:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

</configuration>(6)在Dockerfile的同级目录下要有core-site.xml.template,文件内容如下:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://HOSTNAME:9000</value>
</property>
</configuration>

(7)在Dockerfile的同级目录下要有ssh_config文件,文件内容如下:

Host *
UserKnownHostsFile /dev/null
StrictHostKeyChecking no
LogLevel quiet
Port 2122


3、创建镜像
在Dockerfile所在目录下执行如下命令:

docker build -t xuguokun/hello .

4、启动容器

docker run --name master -it xuguokun/hello
/etc/bootstrap.sh -bash

5、实现结果

docker@dockertest2:~/nopublicimage/hadoop$ docker run --name master -it xuguokun/hello /etc/bootstrap.sh -bash
rm: cannot remove '/tmp/*.pid': No such file or directory
/
* Starting OpenBSD Secure Shell server sshd [ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [ddd1aa1724d6]
ddd1aa1724d6: starting namenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-root-namenode-ddd1aa1724d6.out
localhost: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-root-datanode-ddd1aa1724d6.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-ddd1aa1724d6.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn--resourcemanager-ddd1aa1724d6.out
localhost: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-root-nodemanager-ddd1aa1724d6.out
root@ddd1aa1724d6:/# jps
688 Jps
641 NodeManager
145 NameNode
228 DataNode
553 ResourceManager
377 SecondaryNameNode
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: