您的位置:首页 > 运维架构 > Docker

使用docker搭建弹性hadoop集群

2016-08-01 14:26 751 查看
一.ubuntu环境(ubuntu server 16.04):
1.安装docker: apt install docker-io
2.在docker容器中安装ubuntu环境:
2.1 修改(不是必须的): vim /etc/default/docker 在最后添加以下两行:
export http_proxy="http://127.0.0.1:3128/"
export https_proxy="https://127.0.0.1:3128/"
2.2 下载镜像: docker pull ubuntu
2.3 在宿主机:本机/root/docker/config放入需要挂在的安装文件:
root@spark:~/docker/config# ls
apache-hive-2.1.0-bin.tar.gz  hadoop-2.7.2.tar.gz     hosts           jdk-8u91-linux-x64.tar.gz           spark-2.0.0-bin-hadoop2.7.tgz
authorized_keys
2.4 运行容器并挂载文件: docker run -v /root/docker/config:/software -it ubuntu
2.5 在容器里更新版本:
2.5.1 apt-get update 
2.5.2 apt-get upgrade
2.5.3 安装vim: apt-get install vim
2.5.4 安装ssh: apt-get install ssh
2.5.5 让ssh自动启动:
2.5.5.1 增加:vim ~/.bashrc 添加: /usr/sbin/sshd
2.5.5.2 增加:vim /etc/rc.local : /usr/sbin/sshd
2.5.5.3 修改为远程可登录: 
2.5.6 启动ssh: /etc/inid.s/ssh start
2.5.7 验证: ssh localhost date
3.安装jdk:
3.1 创建文件夹: mkdir -p /usr/local/jdk
3.2 解压缩: tar zxvf /software/jdk-8u91-linux-x64.tar.gz -C /usr/local/jdk/
3.3 配置环境:
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_91/bin
4.安装zookeeper:
4.1 创建文件夹: mkdir -p /usr/local/zookeeper
4.2 解压缩: tar zxvf /software/zookeeper-3.4.8.tar.gz -C /usr/local/zookeeper/
4.3 配置环境:
export ZOOKEEPER_HOME=/usr/local/zookeeper/zookeeper-3.4.8/bin
4.4 cp ./conf/zoo_sample.sh ./conf/zoo.sh
4.5 vim ./conf/zoo.sh
4.5.1 修改目录:
dataDir=/usr/local/zookeeper/zookeeper-3.4.8/tmp
4.5.2 添加:
server.1=cloud4:2888:3888
server.2=cloud5:2888:3888
server.3=cloud6:2888:3888
server.4=cloud7:2888:3888
server.5=cloud8:2888:3888
4.6 mkdir -p tmp
4.7 echo 1 > ./tmp/myid
5.安装hadoop
5.1 创建文件夹: mkdir -p /usr/local/hadoop
5.2 解压缩: tar -zxvf /software/hadoop-2.7.2.tar.gz -C /usr/local/hadoop/
5.3 配置环境变量:
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.2/bin
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.2/sbin
5.4 修改hadoop配置文件:~# cd /usr/local/hadoop/hadoop-2.7.2/etc/hadoop/
5.4.1 hadoop# vim hadoop-env.sh  
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_91
5.4.2 hadoop# vim core-site.xml 
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://ns1</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/usr/local/hadoop/hadoop-2.7.2/tmp</value>
        </property>
        <property>
                <name>ha.zookeeper.quorum</name>
                <value>cloud4:2181,cloud5:2181,cloud6:2181</value>
        </property>
</configuration>
5.4.3 hadoop# vim hdfs-site.xml
<configuration>
        <property>
                <name>dfs.nameservices</name>
                <value>ns1</value>
        </property>
        <property>
                <name>dfs.ha.namenodes.ns1</name>
                <value>nn1,nn2</value>
        </property>
        <property>
                <name>dfs.namenode.rpc-address.ns1.nn1</name>
                <value>cloud1:9000</value>
        </property>
        <property>
                <name>dfs.namenode.http-address.ns1.nn1</name>
                <value>cloud1:50070</value>
        </property>
        <property>
                <name>dfs.namenode.rpc-address.ns1.nn2</name>
                <value>cloud2:9000</value>
        </property>
        <property>
                <name>dfs.namenode.http-adress.ns1.nn2</name>
                <value>cloud2:50070</value>
        </property>
        <property>
                <name>dfs.namenode.shared.edits.dir</name>
                <value>qjournal://cloud4:8485;cloud5:8485;cloud6:8485/ns1</value>
        </property>
        <property>
                <name>dfs.journalnode.edits.dir</name>
                <value>/usr/local/hadoop/hadoop-2.7.2/journal</value>
        </property>
        <property>
                <name>dfs.ha.automatic-failover.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.client.failover.proxy.provider.ns1</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
                <name>dfs.ha.fencing.methods</name>
                <value>
                sshfence
                shell(/bin/true)
                </value>
        </property>
        <property>
                <name>dfs.ha.fencing.ssh.private-key-files</name>
                <value>/root/.ssh/id_rsa</value>
        </property>
        <property>
                <name>dfs.ha.fencing.ssh.connect-timeout</name>
                <value>30000</value>
        </property>
</configuration>
5.4.4 hadoop# cp mapred-site.xml.template mapred-site.xml
5.4.4.1 vim hadoop# vim mapred-site.xml
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>
5.4.5 hadoop# vim yarn-site.xml 
<configuration>

<!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>cloud3</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>
5.4.6 hadoop# vim slaves
cloud1
cloud2
cloud3
cloud4
cloud5
cloud6
cloud7
cloud8
6.安装spark
6.1创建文件夹: mkdir /usr/local/spark
6.2解压缩: # tar -zxvf /software/spark-2.0.0-bin-hadoop2.7.tgz -C /usr/local/spark/
6.3配置环境变量: 
export SPARK_HOME=/usr/local/spark/spark-2.0.0-bin-hadoop2.7/bin
export SPARK_HOME=/usr/local/spark/spark-2.0.0-bin-hadoop2.7/sbin
6.4 修改配置文件:
6.4.1 spark-2.0.0-bin-hadoop2.7# cp ./conf/slaves.template ./conf/slaves
6.4.1.1 spark-2.0.0-bin-hadoop2.7# vim ./conf/slaves
cloud1
cloud2
cloud3
cloud4
cloud5
cloud6
cloud7
cloud8
6.4.2 spark-2.0.0-bin-hadoop2.7# cp ./conf/spark-env.sh.template ./conf/spark-env.sh
6.4.2.1 spark-2.0.0-bin-hadoop2.7# vim ./conf/spark-env.sh
export SPARK_MASTER_IP=cloud1
export SPARK_WORKER_MEMORY=128m
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_91
export SCALA_HOME=/usr/local/scala/scala-2.10.6
export SPARK_HOME=/usr/local/spark/spark-2.0.0-bin-hadoop2.7
export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-2.7.2/etc/hadoop
export SPARK_LIBRARY_PATH=$SPARK_HOME/lib
export SCALA_LIBRARY_PATH=$SPARK_LIBRARY_PATH
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_MASTER_PORT=7077
6.4.3 spark-2.0.0-bin-hadoop2.7# cp ./conf/spark-defaults.conf.template ./conf/spark-defaults.conf
6.4.3.1 spark-2.0.0-bin-hadoop2.7# vim ./conf/spark-defaults.conf
spark.executor.extraJavaOptions         -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.eventLog.enabled                  true
spark.eventLog.dir                      hdfs://clud1:9000/historyServerforSpark
spark.yarn.historyServer.address        cloud1:18080
spark.history.fs.logDirectory           hdfs://cloud1:9000/historyServerforSpark
#spark.default.parallelism              100
7.在宿主机中打包成镜像:
7.1 root@spark:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
11ade06a603a        ubuntu              "/bin/bash"         26 hours ago        Up 51 minutes                           ha
7.2 docker commit containerId => eg: docker commit 11ade06a603a
7.3 生成新的id:root@spark:~# docker commit 11ade06a603a

sha256:7af39a0bd16940cd1ebac52ace40048b223686ccef618584004b82002eb7cb80
7.4 生成新image镜像:docker tag 7af39a0bd16940cd1ebac52ace40048b223686ccef618584004b82002eb7cb80 bigdata/ha1.0
7.5 查看生成的镜像:root@spark:~# docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             SIZE
bigdata/ha1.0          latest              7af39a0bd169        3 minutes ago       1.997 GB
8.使用上面生成的镜像分别来运行8个容器,分别是:cloud1~cloud8
root@spark:~# docker run --name cloud1 -h cloud1 -it bigdata/ha1.0
....
root@spark:~# docker run --name cloud8 -h cloud8 -it bigdata/ha1.0
9.启动服务
9.1 在cloud5~cloud8中的zookeeper中分别修改myid:
cloud5: zookeeper-3.4.8# echo 2 > ./tmp/myid 
cloud6: zookeeper-3.4.8# echo 3 > ./tmp/myid
cloud7: zookeeper-3.4.8# echo 4 > ./tmp/myid
cloud8: zookeeper-3.4.8# echo 5 > ./tmp/myid
9.2 在cloud1中修改hosts文件:
172.17.0.1 cloud1
172.17.0.2 cloud2
172.17.0.3 cloud3
172.17.0.4 cloud4
172.17.0.5 cloud5
172.17.0.6 cloud6
172.17.0.7 cloud7
172.17.0.8 cloud8
9.2.1 把hosts文件同步到其它容器:
scp /etc/hosts cloud1:/etc/
...
scp /etc/hosts cloud8:/etc/
9.3 启动zookeeper集群:
9.3.1 分别在cloud4~cloud8容器中启动:zkServer.sh start
9.3.2 查看启动状态: zkServer.sh status
9.4 在cloud1上启动journalnode:hadoop-2.7.2# hadoop-daemons.sh start journalnode
9.5 在cloud1上格式化:hdfs: hadoop-2.7.2# hdfs namenode -format
9.6 在cloud1上格式化ZK:hadoop-2.7.2# hdfs zkfc -formatZK 
9.7 在cloud1上启动hdfs: hadoop-2.7.2# start-dfs.sh 
9.8 在cloud3上启动yarn: hadoop-2.7.2# start-yarn.sh
9.9 在cloud1~cloud2上启动spark集群:spark-2.0.0-bin-hadoop2.7/sbin# start-all.sh 
10.启动服务后打开浏览器访问:
hdfs:cloud1:50070
yarn:cloud2:8088
spark:coud1:18080
11.在宿主机中访问需要在hosts中配置主机名与映射ip
12.打成新的镜像:
12.1 在cloud4~cloud8任一容器中hadoop目录下journal目录复制到cloud1相关目录
hadoop-2.7.2# scp -r journal/ cloud1:/usr/local/hadoop/hadoop-2.7.2/
12.2 将完成配置的cloud1容器生成一个新的镜像:
12.3 docker commit cloud1
12.4 docker tag id bigdata/ha2.0
13.完成后可以直接使用这个镜像运行容器:zookeeper,hadoop,spark

2.5.8 验证:ssh cloud1....cloud6

jiangshides-MacBook-Pro:docker Apple$ ls
install
jiangshides-MacBook-Pro:docker Apple$ vim install 

bigdata/ha1.0          latest              7af39a0bd169        3 minutes ago       1.997 GB
8.使用上面生成的镜像分别来运行8个容器,分别是:cloud1~cloud8
root@spark:~# docker run --name cloud1 -h cloud1 -it bigdata/ha1.0
....
root@spark:~# docker run --name cloud8 -h cloud8 -it bigdata/ha1.0
9.启动服务
9.1 在cloud5~cloud8中的zookeeper中分别修改myid:
cloud5: zookeeper-3.4.8# echo 2 > ./tmp/myid
cloud6: zookeeper-3.4.8# echo 3 > ./tmp/myid
cloud7: zookeeper-3.4.8# echo 4 > ./tmp/myid
cloud8: zookeeper-3.4.8# echo 5 > ./tmp/myid
9.2 在cloud1中修改hosts文件:
172.17.0.1 cloud1
172.17.0.2 cloud2
172.17.0.3 cloud3
172.17.0.4 cloud4
172.17.0.5 cloud5
172.17.0.6 cloud6
172.17.0.7 cloud7
172.17.0.8 cloud8
9.2.1 把hosts文件同步到其它容器:
scp /etc/hosts cloud1:/etc/
...
scp /etc/hosts cloud8:/etc/
9.3 启动zookeeper集群:
9.3.1 分别在cloud4~cloud8容器中启动:zkServer.sh start
9.3.2 查看启动状态: zkServer.sh status
9.4 在cloud1上启动journalnode:hadoop-2.7.2# hadoop-daemons.sh start journalnode
9.5 在cloud1上格式化:hdfs: hadoop-2.7.2# hdfs namenode -format
9.6 在cloud1上格式化ZK:hadoop-2.7.2# hdfs zkfc -formatZK
9.7 在cloud1上启动hdfs: hadoop-2.7.2# start-dfs.sh
9.8 在cloud3上启动yarn: hadoop-2.7.2# start-yarn.sh
9.9 在cloud1~cloud2上启动spark集群:spark-2.0.0-bin-hadoop2.7/sbin# start-all.sh
10.启动服务后打开浏览器访问:
hdfs:cloud1:50070
yarn:cloud2:8088
spark:coud1:18080
11.在宿主机中访问需要在hosts中配置主机名与映射ip
12.打成新的镜像:
12.1 在cloud4~cloud8任一容器中hadoop目录下journal目录复制到cloud1相关目录
hadoop-2.7.2# scp -r journal/ cloud1:/usr/local/hadoop/hadoop-2.7.2/
12.2 将完成配置的cloud1容器生成一个新的镜像:
12.3 docker commit cloud1
12.4 docker tag id bigdata/ha2.0
13.完成后可以直接使用这个镜像运行容器:zookeeper,hadoop,spark

2.5.8 验证:ssh cloud1....cloud8
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息