使用docker安装部署Spark集群来训练CNN(含Python实例)
2015-11-07 17:05
896 查看
本博客仅为作者记录笔记之用,不免有很多细节不对之处。
还望各位看官能够见谅,欢迎批评指正。
博客虽水,然亦博主之苦劳也。
如需转载,请附上本文链接,不甚感激!
/article/2236339.html
实验室有4台神服务器,每台有8个tesla-GPU,然而平时做实验都只使用了其中的一个GPU,实在暴遣天物!
于是想用spark来把这些GPU都利用起来。听闻docker是部署环境的神器,于是决定使用docker安装部署Spark集群来训练CNN。配置环境虽然简单,纯苦力活,但配过的人都知道,里面有太多坑了。
本文是博主含泪写出的踩坑总结,希望能够给各位提供了一些前车之鉴来避开这些坑。
docker是在操作系统层面上实现虚拟化,直接复用本地主机的操作系统,而传统方式则是在硬件层面实现。
一张图更直观地解释一下这两种差异:
Docker 容器的启动可以在秒级实现,这相比传统的虚拟机方式要快得多。
Docker 对系统资源的利用率很高,一台主机上可以同时运行数千个 Docker 容器。
容器除了运行其中应用外,基本不消耗额外的系统资源,使得应用的性能很高,同时系统的开销尽量小。(传统虚拟机方式运行 10 个不同的应用就要起 10 个虚拟机,而Docker 只需要启动 10 个隔离的应用即可)。
一次创建或配置,就可以在任意地方正常运行。
Docker 容器几乎可以在任意的平台上运行,包括物理机、虚拟机、公有云、私有云、个人电脑、服务器等。 这种兼容性可以让用户把一个应用程序从一个平台直接迁移到另外一个。
简单总结一下:
Spark,拥有Hadoop MapReduce所具有的优点;
但不同于MapReduce的是Job中间输出结果可以保存在内存中,从而不再需要读写HDFS。
因此 Spark 能更好地适用于数据挖掘与机器学习等需要迭代的 MapReduce 的算法。
关于spark的原理应用等内容,这里就不多说了,改天我再写一篇单独来聊。现在你只要知道它能有办法让你的程序分布式跑起来就行了。
先说 keras,它是基于 theano 的深度学习库,用过 theano 的可能会知道,theano 程序不是特别好些。keras 是对theano的一个高层封装,使得代码写起来更加方便,下面贴一段keras的cnn模型代码:
是不是比caffe的配置文件还要简单?
elephas 使得keras程序能够运行在Spark上面。使得基本不改变keras,就能够将程序运行到spark上面了。
下面贴一个elephas的代码(model还是上文的model):
要想在spark上面运行,只需要执行下面的命令:
spark-submit –driver-memory 1G ./your_script.py
该介绍的都介绍完了,下面我来手把手教你如何使用docker安装部署Spark-GPU集群来分布式训练CNN.
如果是较低版本的 Ubuntu 系统,需要先更新内核。
然后重复上面的步骤即可。
安装之后启动 Docker 服务。
你可以从这里下载离线包:https://get.daocloud.io/docker/builds/Linux/x86_64/docker-latest
Sequenceiq 公司提供了一个docker容器,里面安装好了spark,你只要从docker hub上pull下来就行了。
docker pull sequenceiq/spark:1.5.1
执行下面命令来运行一下:
sudo docker run -it sequenceiq/spark:1.5.1 bash
测试一下spark的功能:
首先用ifconfig得到ip地址,我的ip是172.17.0.109,然后:
bash-4.1# cd /usr/local/spark
bash-4.1# cp conf/spark-env.sh.template conf/spark-env.sh
bash-4.1# vi conf/spark-env.sh
添加两行代码:
然后启动master 跟slave:
bash-4.1# ./sbin/start-master.sh
bash-4.1# ./sbin/start-slave.sh spark:172.17.0.109:7077
浏览器打开(你的ip:8080) 可以看到如下spark各节点的状态。
用spark-sumit提交一个应用运行一下:
bash-4.1# ./bin/spark-submit examples/src/main/python/pi.py
得到如下结果:
15/11/05 02:11:23 INFO scheduler.DAGScheduler: Job 0 finished: reduce at /usr/local/spark-1.5.1-bin-hadoop2.6/examples/src/main/python/pi.py:39, took 1.095643 s
Pi is roughly 3.148900
恭喜你,刚刚跑了一个spark的应用程序!
你是不是觉得到目前为止都很顺利?提前剧透一下,困难才刚刚开始,好在我把坑都踩了一遍,所以虽然还是有点麻烦,不过至少你们还是绕过了一些深坑。。。
温馨提示:在python编译之前一定要安装openssl和openssl-devel,不要问我是怎么知道的。
安装详情:
设置 PATH
安装 setuptools
安装 PIP
修复 yum 工具
安装docker
载入了spark on docker镜像
将spark on docker 镜像中的python升级
安装了theano、keras、elephas
现在,我们已经可以做的事情:
√ 如果你的机器有多个CPU(假设24个):
你可以只开一个docker,然后很简单的使用spark结合elephas来并行(利用24个cpu)计算CNN。
√ 如果你的机器有多个GPU(假设4个):
你可以开4个docker镜像,修改每个镜像内的~/.theanorc来选择特定的GPU来并行(4个GPU)计算。(需自行安装cuda)
跑一个最简单的网络来训练mnist手写字识别,贴一个能够直接运行的代码(要事先下载好mnist.pkl.gz):
执行以下命令即可运行:
/usr/local/spark/bin/spark-submit mnist_cnn_spark.py
使用24个slave,并行迭代了5次,得到的准确率和运行时间如下:
Test accuracy: 95.68%
took: 1135s
不使用spark,大概测了一下,1次迭代就需要1800s,所以还是快7~8倍的。
关于单机多GPU集群,多机多GPU集群的配置,还请各位多待几日,等博主元气恢复,会继续义无反顾地继续踩坑的。。。
为了赤焰军,我会回来的!
还望各位看官能够见谅,欢迎批评指正。
博客虽水,然亦博主之苦劳也。
如需转载,请附上本文链接,不甚感激!
/article/2236339.html
实验室有4台神服务器,每台有8个tesla-GPU,然而平时做实验都只使用了其中的一个GPU,实在暴遣天物!
于是想用spark来把这些GPU都利用起来。听闻docker是部署环境的神器,于是决定使用docker安装部署Spark集群来训练CNN。配置环境虽然简单,纯苦力活,但配过的人都知道,里面有太多坑了。
本文是博主含泪写出的踩坑总结,希望能够给各位提供了一些前车之鉴来避开这些坑。
docker
什么是docker
Docker 是一个开源项目,诞生于 2013 年初,最初是 dotCloud 公司内部的一个业余项目。直观来说,docker是一种轻量级的虚拟机。Docker 和传统虚拟化方式的不同之处在于:docker是在操作系统层面上实现虚拟化,直接复用本地主机的操作系统,而传统方式则是在硬件层面实现。
一张图更直观地解释一下这两种差异:
为什么使用docker
作为一种新兴的虚拟化方式,Docker 跟传统的虚拟化方式相比具有众多的优势。Docker 容器的启动可以在秒级实现,这相比传统的虚拟机方式要快得多。
Docker 对系统资源的利用率很高,一台主机上可以同时运行数千个 Docker 容器。
容器除了运行其中应用外,基本不消耗额外的系统资源,使得应用的性能很高,同时系统的开销尽量小。(传统虚拟机方式运行 10 个不同的应用就要起 10 个虚拟机,而Docker 只需要启动 10 个隔离的应用即可)。
一次创建或配置,就可以在任意地方正常运行。
Docker 容器几乎可以在任意的平台上运行,包括物理机、虚拟机、公有云、私有云、个人电脑、服务器等。 这种兼容性可以让用户把一个应用程序从一个平台直接迁移到另外一个。
简单总结一下:
特性 | docker | 虚拟机 |
---|---|---|
启动 | 秒级 | 分钟级 |
硬盘使用 | 一般为 MB | 一般为 GB |
性能 | 接近原生 | 弱于 |
系统支持量 | 单机支持上千个容器 | 一般几十个 |
Spark
Spark是 UC Berkeley AMP lab 所开源的类Hadoop MapReduce 的通用并行框架。Spark,拥有Hadoop MapReduce所具有的优点;
但不同于MapReduce的是Job中间输出结果可以保存在内存中,从而不再需要读写HDFS。
因此 Spark 能更好地适用于数据挖掘与机器学习等需要迭代的 MapReduce 的算法。
关于spark的原理应用等内容,这里就不多说了,改天我再写一篇单独来聊。现在你只要知道它能有办法让你的程序分布式跑起来就行了。
Elephas(支持spark的深度学习库)
先说 keras,它是基于 theano 的深度学习库,用过 theano 的可能会知道,theano 程序不是特别好些。keras 是对theano的一个高层封装,使得代码写起来更加方便,下面贴一段keras的cnn模型代码:<code class="hljs oxygene has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">model = Sequential() model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Convolution2D(nb_filters, nb_conv, nb_conv, border_mode=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'full'</span>, input_shape=(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, img_rows, img_cols))) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>)) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Convolution2D(nb_filters, nb_conv, nb_conv)) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>)) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(MaxPooling2D(pool_size=(nb_pool, nb_pool))) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Dropout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.25</span>)) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Flatten()) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Dense(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">128</span>)) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>)) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Dropout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.5</span>)) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Dense(nb_classes)) model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'softmax'</span>)) model.compile(loss=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'categorical_crossentropy'</span>, optimizer=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'adadelta'</span>) model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>, verbose=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, validation_data=(X_test, Y_test)) </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li></ul>
是不是比caffe的配置文件还要简单?
elephas 使得keras程序能够运行在Spark上面。使得基本不改变keras,就能够将程序运行到spark上面了。
下面贴一个elephas的代码(model还是上文的model):
<code class="hljs vala has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># Create Spark context</span> conf = SparkConf().setAppName(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'Mnist_Spark_MLP'</span>).setMaster(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'local[8]'</span>) sc = SparkContext(conf=conf) <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># Build RDD from numpy features and labels</span> rdd = to_simple_rdd(sc, X_train, Y_train) <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># Initialize SparkModel from Keras model and Spark context</span> spark_model = SparkModel(sc,model) <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># Train Spark model</span> spark_model.train(rdd, nb_epoch=nb_epoch, batch_size=batch_size, verbose=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, validation_split=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.1</span>, num_workers=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li></ul>
要想在spark上面运行,只需要执行下面的命令:
spark-submit –driver-memory 1G ./your_script.py
该介绍的都介绍完了,下面我来手把手教你如何使用docker安装部署Spark-GPU集群来分布式训练CNN.
Spark on docker 安装
在线安装docker
Ubuntu 14.04 版本系统中已经自带了 Docker 包,可以直接安装。<code class="hljs lasso has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">$ sudo apt<span class="hljs-attribute" style="box-sizing: border-box;">-get</span> update $ sudo apt<span class="hljs-attribute" style="box-sizing: border-box;">-get</span> install <span class="hljs-attribute" style="box-sizing: border-box;">-y</span> docker<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>io $ sudo ln <span class="hljs-attribute" style="box-sizing: border-box;">-sf</span> /usr/bin/docker<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>io /usr/<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">local</span>/bin/docker $ sudo sed <span class="hljs-attribute" style="box-sizing: border-box;">-i</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'$acomplete -F _docker docker'</span> /etc/bash_completion<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>d/docker<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>io</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
如果是较低版本的 Ubuntu 系统,需要先更新内核。
<code class="hljs lasso has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">$ sudo apt<span class="hljs-attribute" style="box-sizing: border-box;">-get</span> update $ sudo apt<span class="hljs-attribute" style="box-sizing: border-box;">-get</span> install linux<span class="hljs-attribute" style="box-sizing: border-box;">-image</span><span class="hljs-attribute" style="box-sizing: border-box;">-generic</span><span class="hljs-attribute" style="box-sizing: border-box;">-lts</span><span class="hljs-attribute" style="box-sizing: border-box;">-raring</span> linux<span class="hljs-attribute" style="box-sizing: border-box;">-headers</span><span class="hljs-attribute" style="box-sizing: border-box;">-generic</span><span class="hljs-attribute" style="box-sizing: border-box;">-lts</span><span class="hljs-attribute" style="box-sizing: border-box;">-raring</span> $ sudo reboot</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>
然后重复上面的步骤即可。
安装之后启动 Docker 服务。
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">$ <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">sudo</span> service docker start</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>
离线安装docker
如果你的电脑连不上外网(像我的服务器那样),那还可以通过离线安装包来安装docker。你可以从这里下载离线包:https://get.daocloud.io/docker/builds/Linux/x86_64/docker-latest
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">chmod +x docker-latest <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">sudo</span> mv docker-latest /usr/local/bin/docker <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Then start docker in daemon mode:</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">sudo</span> docker daemon &</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
Spark on docker 安装
Sequenceiq 公司提供了一个docker容器,里面安装好了spark,你只要从docker hub上pull下来就行了。docker pull sequenceiq/spark:1.5.1
执行下面命令来运行一下:
sudo docker run -it sequenceiq/spark:1.5.1 bash
测试一下spark的功能:
首先用ifconfig得到ip地址,我的ip是172.17.0.109,然后:
bash-4.1# cd /usr/local/spark
bash-4.1# cp conf/spark-env.sh.template conf/spark-env.sh
bash-4.1# vi conf/spark-env.sh
添加两行代码:
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">export</span> SPARK_LOCAL_IP=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">172.17</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.109</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">export</span> SPARK_MASTER_IP=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">172.17</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.109</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>
然后启动master 跟slave:
bash-4.1# ./sbin/start-master.sh
bash-4.1# ./sbin/start-slave.sh spark:172.17.0.109:7077
浏览器打开(你的ip:8080) 可以看到如下spark各节点的状态。
用spark-sumit提交一个应用运行一下:
bash-4.1# ./bin/spark-submit examples/src/main/python/pi.py
得到如下结果:
15/11/05 02:11:23 INFO scheduler.DAGScheduler: Job 0 finished: reduce at /usr/local/spark-1.5.1-bin-hadoop2.6/examples/src/main/python/pi.py:39, took 1.095643 s
Pi is roughly 3.148900
恭喜你,刚刚跑了一个spark的应用程序!
你是不是觉得到目前为止都很顺利?提前剧透一下,困难才刚刚开始,好在我把坑都踩了一遍,所以虽然还是有点麻烦,不过至少你们还是绕过了一些深坑。。。
各种库的安装
elephas 需要python2.7,不过我们刚刚安装的docker自带的python是2.6版本,所以,我们先把python版本更新一下。
CentOS 的Python 版本升级
温馨提示:在python编译之前一定要安装openssl和openssl-devel,不要问我是怎么知道的。<code class="hljs lasso has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">yum install <span class="hljs-attribute" style="box-sizing: border-box;">-y</span> zlib<span class="hljs-attribute" style="box-sizing: border-box;">-devel</span> bzip2<span class="hljs-attribute" style="box-sizing: border-box;">-devel</span> openssl openssl<span class="hljs-attribute" style="box-sizing: border-box;">-devel</span> xz<span class="hljs-attribute" style="box-sizing: border-box;">-libs</span> wget</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>
安装详情:
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">wget http://www<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.python</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.org</span>/ftp/python/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span>/Python-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.xz</span> xz -d Python-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.xz</span> tar -xvf Python-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span> <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 进入目录:</span> cd Python-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span> <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 运行配置 configure:</span> ./configure --prefix=/usr/local CFLAGS=-fPIC (一定要加fPIC,不要问我怎么知道的) <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 编译安装:</span> make make altinstall</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li></ul>
设置 PATH
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">mv /usr/bin/python /usr/bin/python2.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">export</span> PATH=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/usr/local/bin:<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$PATH</span>"</span> 或者 ln <span class="hljs-operator" style="box-sizing: border-box;">-s</span> /usr/local/bin/python2.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span> /usr/bin/python <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 检查 Python 版本:</span> python -V</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li></ul>
安装 setuptools
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">#获取软件包</span> wget --no-check-certificate https://pypi<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.python</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.org</span>/packages/source/s/setuptools/setuptools-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.4</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.2</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.gz</span> <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 解压:</span> tar -xvf setuptools-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.4</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.2</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.gz</span> cd setuptools-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.4</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.2</span> <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 使用 Python 2.7.8 安装 setuptools</span> python setup<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.py</span> install</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>
安装 PIP
<code class="hljs ruby has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">curl <span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">https:</span>/<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/raw.githubusercontent.com/pypa</span><span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/pip/master</span><span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/contrib/get</span>-pip.py | python -</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>
修复 yum 工具
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">vi /usr/bin/yum <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#修改 yum中的python </span> 将第一行 <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#!/usr/bin/python </span> 改为 <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#!/usr/bin/python2.6</span> 此时yum就ok啦</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li></ul>
theano, keras, elephas的安装
<code class="hljs cmake has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">pip <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">install</span> --upgrade --no-deps git+git://github.com/Theano/Theano.git pip <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">install</span> keras pip <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">install</span> elephas</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>
已达成技能
我们简单总结一下,我们已经完成的工作:安装docker
载入了spark on docker镜像
将spark on docker 镜像中的python升级
安装了theano、keras、elephas
现在,我们已经可以做的事情:
√ 如果你的机器有多个CPU(假设24个):
你可以只开一个docker,然后很简单的使用spark结合elephas来并行(利用24个cpu)计算CNN。
√ 如果你的机器有多个GPU(假设4个):
你可以开4个docker镜像,修改每个镜像内的~/.theanorc来选择特定的GPU来并行(4个GPU)计算。(需自行安装cuda)
单机多CPU集群并行训练CNN实例
跑一个最简单的网络来训练mnist手写字识别,贴一个能够直接运行的代码(要事先下载好mnist.pkl.gz):<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> __future__ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> absolute_import <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> __future__ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> print_function <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> numpy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> np <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.datasets <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> mnist <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.models <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> Sequential <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.layers.core <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> Dense, Dropout, Activation, Flatten <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.optimizers <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> SGD, Adam, RMSprop <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.layers.convolutional <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> Convolution2D, MaxPooling2D <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.utils <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> np_utils <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> elephas.spark_model <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> SparkModel <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> elephas.utils.rdd_utils <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> to_simple_rdd <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> pyspark <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> SparkContext, SparkConf <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> gzip <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> cPickle APP_NAME = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"mnist"</span> MASTER_IP = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'local[24]'</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Define basic parameters</span> batch_size = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">128</span> nb_classes = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span> nb_epoch = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># input image dimensions</span> img_rows, img_cols = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># number of convolutional filters to use</span> nb_filters = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">32</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># size of pooling area for max pooling</span> nb_pool = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># convolution kernel size</span> nb_conv = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Load data</span> f = gzip.open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"./mnist.pkl.gz"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"rb"</span>) dd = cPickle.load(f) (X_train, y_train), (X_test, y_test) = dd X_train = X_train.reshape(X_train.shape[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>], <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, img_rows, img_cols) X_test = X_test.reshape(X_test.shape[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>], <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, img_rows, img_cols) X_train = X_train.astype(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"float32"</span>) X_test = X_test.astype(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"float32"</span>) X_train /= <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">255</span> X_test /= <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">255</span> print(X_train.shape[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>], <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'train samples'</span>) print(X_test.shape[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>], <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'test samples'</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Convert class vectors to binary class matrices</span> Y_train = np_utils.to_categorical(y_train, nb_classes) Y_test = np_utils.to_categorical(y_test, nb_classes) model = Sequential() model.add(Convolution2D(nb_filters, nb_conv, nb_conv, border_mode=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'full'</span>, input_shape=(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, img_rows, img_cols))) model.add(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>)) model.add(Convolution2D(nb_filters, nb_conv, nb_conv)) model.add(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>)) model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool))) model.add(Dropout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.25</span>)) model.add(Flatten()) model.add(Dense(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">128</span>)) model.add(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>)) model.add(Dropout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.5</span>)) model.add(Dense(nb_classes)) model.add(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'softmax'</span>)) model.compile(loss=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'categorical_crossentropy'</span>, optimizer=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'adadelta'</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">## spark</span> conf = SparkConf().setAppName(APP_NAME).setMaster(MASTER_IP) sc = SparkContext(conf=conf) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Build RDD from numpy features and labels</span> rdd = to_simple_rdd(sc, X_train, Y_train) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Initialize SparkModel from Keras model and Spark context</span> spark_model = SparkModel(sc,model) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Train Spark model</span> spark_model.train(rdd, nb_epoch=nb_epoch, batch_size=batch_size, verbose=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, validation_split=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.1</span>, num_workers=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">24</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Evaluate Spark model by evaluating the underlying model</span> score = spark_model.get_network().evaluate(X_test, Y_test, show_accuracy=<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>, verbose=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>) print(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'Test accuracy:'</span>, score[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>])</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li></ul>
执行以下命令即可运行:
/usr/local/spark/bin/spark-submit mnist_cnn_spark.py
使用24个slave,并行迭代了5次,得到的准确率和运行时间如下:
Test accuracy: 95.68%
took: 1135s
不使用spark,大概测了一下,1次迭代就需要1800s,所以还是快7~8倍的。
多GPU集群并行训练CNN实例
由于博主近几日踩太多坑了,心实在太累了!关于单机多GPU集群,多机多GPU集群的配置,还请各位多待几日,等博主元气恢复,会继续义无反顾地继续踩坑的。。。
为了赤焰军,我会回来的!
相关文章推荐
- Docker源码分析(一):Docker架构
- docker pull sequenceiq/spark:1.5.1
- Docker 将会取代 apt,就像 apt 之前取代 tar 那样
- Docker入门与实战系列:热点问题
- docker部署tomcat8
- docker新建ubuntu容器,设置ssh与物理机登陆
- docker安装入门
- 使用docker安装部署Spark集群来训练CNN(含Python实例)
- Docker contanier comunication with route
- docker pure-ftp 搭建ftp服务器
- 如何在windows 7运行docker
- docker 相关操作
- Docker创建支持ssh服务的容器和镜像
- Docker安装配置使用总结_20151105_莫小贝
- CentOS6安装Docker
- docker使用总结
- Docker背后的内核知识——cgroups资源限制(转)
- Docker技术-cgroup
- docker高级应用之cpu与内存资源限制(转)
- Dockerfile编写