您的位置:首页 > 运维架构 > Docker

使用docker安装部署Spark集群来训练CNN(含Python实例)

2015-11-07 17:05 896 查看
本博客仅为作者记录笔记之用,不免有很多细节不对之处。
还望各位看官能够见谅,欢迎批评指正。
博客虽水,然亦博主之苦劳也。
如需转载,请附上本文链接,不甚感激!

/article/2236339.html

实验室有4台神服务器,每台有8个tesla-GPU,然而平时做实验都只使用了其中的一个GPU,实在暴遣天物!

于是想用spark来把这些GPU都利用起来。听闻docker是部署环境的神器,于是决定使用docker安装部署Spark集群来训练CNN。配置环境虽然简单,纯苦力活,但配过的人都知道,里面有太多坑了。

本文是博主含泪写出的踩坑总结,希望能够给各位提供了一些前车之鉴来避开这些坑。

docker

什么是docker

Docker 是一个开源项目,诞生于 2013 年初,最初是 dotCloud 公司内部的一个业余项目。直观来说,docker是一种轻量级的虚拟机。Docker 和传统虚拟化方式的不同之处在于:

docker是在操作系统层面上实现虚拟化,直接复用本地主机的操作系统,而传统方式则是在硬件层面实现。

一张图更直观地解释一下这两种差异:





为什么使用docker

作为一种新兴的虚拟化方式,Docker 跟传统的虚拟化方式相比具有众多的优势。

Docker 容器的启动可以在秒级实现,这相比传统的虚拟机方式要快得多。
Docker 对系统资源的利用率很高,一台主机上可以同时运行数千个 Docker 容器。
容器除了运行其中应用外,基本不消耗额外的系统资源,使得应用的性能很高,同时系统的开销尽量小。(传统虚拟机方式运行 10 个不同的应用就要起 10 个虚拟机,而Docker 只需要启动 10 个隔离的应用即可)。
一次创建或配置,就可以在任意地方正常运行。
Docker 容器几乎可以在任意的平台上运行,包括物理机、虚拟机、公有云、私有云、个人电脑、服务器等。 这种兼容性可以让用户把一个应用程序从一个平台直接迁移到另外一个。

简单总结一下:
特性docker虚拟机
启动秒级分钟级
硬盘使用一般为 MB一般为 GB
性能接近原生弱于
系统支持量单机支持上千个容器一般几十个

Spark

Spark是 UC Berkeley AMP lab 所开源的类Hadoop MapReduce 的通用并行框架。

Spark,拥有Hadoop MapReduce所具有的优点;

但不同于MapReduce的是Job中间输出结果可以保存在内存中,从而不再需要读写HDFS。

因此 Spark 能更好地适用于数据挖掘与机器学习等需要迭代的 MapReduce 的算法。

关于spark的原理应用等内容,这里就不多说了,改天我再写一篇单独来聊。现在你只要知道它能有办法让你的程序分布式跑起来就行了。


Elephas(支持spark的深度学习库)

先说 keras,它是基于 theano 的深度学习库,用过 theano 的可能会知道,theano 程序不是特别好些。keras 是对theano的一个高层封装,使得代码写起来更加方便,下面贴一段keras的cnn模型代码:
<code class="hljs oxygene has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">model = Sequential()

model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Convolution2D(nb_filters, nb_conv, nb_conv,
border_mode=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'full'</span>,
input_shape=(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, img_rows, img_cols)))
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>))
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Convolution2D(nb_filters, nb_conv, nb_conv))
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>))
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Dropout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.25</span>))

model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Flatten())
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Dense(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">128</span>))
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>))
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Dropout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.5</span>))
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Dense(nb_classes))
model.<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">add</span>(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'softmax'</span>))

model.compile(loss=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'categorical_crossentropy'</span>, optimizer=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'adadelta'</span>)

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>, verbose=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, validation_data=(X_test, Y_test))
</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li></ul>

是不是比caffe的配置文件还要简单?
elephas 使得keras程序能够运行在Spark上面。使得基本不改变keras,就能够将程序运行到spark上面了。

下面贴一个elephas的代码(model还是上文的model):
<code class="hljs vala has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># Create Spark context</span>
conf = SparkConf().setAppName(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'Mnist_Spark_MLP'</span>).setMaster(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'local[8]'</span>)
sc = SparkContext(conf=conf)

<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># Build RDD from numpy features and labels</span>
rdd = to_simple_rdd(sc, X_train, Y_train)

<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># Initialize SparkModel from Keras model and Spark context</span>
spark_model = SparkModel(sc,model)

<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># Train Spark model</span>
spark_model.train(rdd, nb_epoch=nb_epoch, batch_size=batch_size, verbose=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, validation_split=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.1</span>, num_workers=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li></ul>

要想在spark上面运行,只需要执行下面的命令:

spark-submit –driver-memory 1G ./your_script.py

该介绍的都介绍完了,下面我来手把手教你如何使用docker安装部署Spark-GPU集群来分布式训练CNN.


Spark on docker 安装

在线安装docker

Ubuntu 14.04 版本系统中已经自带了 Docker 包,可以直接安装。
<code class="hljs lasso has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">$ sudo apt<span class="hljs-attribute" style="box-sizing: border-box;">-get</span> update
$ sudo apt<span class="hljs-attribute" style="box-sizing: border-box;">-get</span> install <span class="hljs-attribute" style="box-sizing: border-box;">-y</span> docker<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>io
$ sudo ln <span class="hljs-attribute" style="box-sizing: border-box;">-sf</span> /usr/bin/docker<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>io /usr/<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">local</span>/bin/docker
$ sudo sed <span class="hljs-attribute" style="box-sizing: border-box;">-i</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'$acomplete -F _docker docker'</span> /etc/bash_completion<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>d/docker<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">.</span>io</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>

如果是较低版本的 Ubuntu 系统,需要先更新内核。
<code class="hljs lasso has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">$ sudo apt<span class="hljs-attribute" style="box-sizing: border-box;">-get</span> update
$ sudo apt<span class="hljs-attribute" style="box-sizing: border-box;">-get</span> install linux<span class="hljs-attribute" style="box-sizing: border-box;">-image</span><span class="hljs-attribute" style="box-sizing: border-box;">-generic</span><span class="hljs-attribute" style="box-sizing: border-box;">-lts</span><span class="hljs-attribute" style="box-sizing: border-box;">-raring</span> linux<span class="hljs-attribute" style="box-sizing: border-box;">-headers</span><span class="hljs-attribute" style="box-sizing: border-box;">-generic</span><span class="hljs-attribute" style="box-sizing: border-box;">-lts</span><span class="hljs-attribute" style="box-sizing: border-box;">-raring</span>
$ sudo reboot</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

然后重复上面的步骤即可。

安装之后启动 Docker 服务。
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">$ <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">sudo</span> service docker start</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

离线安装docker

如果你的电脑连不上外网(像我的服务器那样),那还可以通过离线安装包来安装docker。

你可以从这里下载离线包:https://get.daocloud.io/docker/builds/Linux/x86_64/docker-latest
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">chmod +x docker-latest
<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">sudo</span> mv docker-latest /usr/local/bin/docker
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Then start docker in daemon mode:</span>
<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">sudo</span> docker daemon &</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>


Spark on docker 安装

Sequenceiq 公司提供了一个docker容器,里面安装好了spark,你只要从docker hub上pull下来就行了。

docker pull sequenceiq/spark:1.5.1

执行下面命令来运行一下:

sudo docker run -it sequenceiq/spark:1.5.1 bash

测试一下spark的功能:

首先用ifconfig得到ip地址,我的ip是172.17.0.109,然后:

bash-4.1# cd /usr/local/spark

bash-4.1# cp conf/spark-env.sh.template conf/spark-env.sh

bash-4.1# vi conf/spark-env.sh

添加两行代码:
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">export</span> SPARK_LOCAL_IP=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">172.17</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.109</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">export</span> SPARK_MASTER_IP=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">172.17</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.109</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

然后启动master 跟slave:

bash-4.1# ./sbin/start-master.sh

bash-4.1# ./sbin/start-slave.sh spark:172.17.0.109:7077

浏览器打开(你的ip:8080) 可以看到如下spark各节点的状态。
用spark-sumit提交一个应用运行一下:

bash-4.1# ./bin/spark-submit examples/src/main/python/pi.py

得到如下结果:

15/11/05 02:11:23 INFO scheduler.DAGScheduler: Job 0 finished: reduce at /usr/local/spark-1.5.1-bin-hadoop2.6/examples/src/main/python/pi.py:39, took 1.095643 s
Pi is roughly 3.148900

恭喜你,刚刚跑了一个spark的应用程序!

你是不是觉得到目前为止都很顺利?提前剧透一下,困难才刚刚开始,好在我把坑都踩了一遍,所以虽然还是有点麻烦,不过至少你们还是绕过了一些深坑。。。

各种库的安装

elephas 需要python2.7,不过我们刚刚安装的docker自带的python是2.6版本,所以,我们先把python版本更新一下。


CentOS 的Python 版本升级

温馨提示:在python编译之前一定要安装openssl和openssl-devel,不要问我是怎么知道的。
<code class="hljs lasso has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">yum install <span class="hljs-attribute" style="box-sizing: border-box;">-y</span> zlib<span class="hljs-attribute" style="box-sizing: border-box;">-devel</span> bzip2<span class="hljs-attribute" style="box-sizing: border-box;">-devel</span> openssl openssl<span class="hljs-attribute" style="box-sizing: border-box;">-devel</span> xz<span class="hljs-attribute" style="box-sizing: border-box;">-libs</span> wget</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

安装详情:
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">wget http://www<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.python</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.org</span>/ftp/python/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span>/Python-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.xz</span>
xz -d Python-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.xz</span>
tar -xvf Python-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span>

<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 进入目录:</span>
cd Python-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.8</span>
<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 运行配置 configure:</span>
./configure --prefix=/usr/local CFLAGS=-fPIC (一定要加fPIC,不要问我怎么知道的)
<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 编译安装:</span>
make
make altinstall</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li></ul>

设置 PATH
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">mv /usr/bin/python /usr/bin/python2.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">export</span> PATH=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/usr/local/bin:<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$PATH</span>"</span>
或者
ln <span class="hljs-operator" style="box-sizing: border-box;">-s</span> /usr/local/bin/python2.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>  /usr/bin/python
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 检查 Python 版本:</span>
python -V</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li></ul>

安装 setuptools
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">#获取软件包</span>
wget --no-check-certificate https://pypi<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.python</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.org</span>/packages/source/s/setuptools/setuptools-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.4</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.2</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.gz</span>
<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 解压:</span>
tar -xvf setuptools-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.4</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.2</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.tar</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.gz</span>
cd setuptools-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.4</span><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">.2</span>
<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 使用 Python 2.7.8 安装 setuptools</span>
python setup<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.py</span> install</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>

安装 PIP
<code class="hljs ruby has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">curl <span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">https:</span>/<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/raw.githubusercontent.com/pypa</span><span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/pip/master</span><span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/contrib/get</span>-pip.py | python -</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

修复 yum 工具
<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">vi /usr/bin/yum

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#修改 yum中的python </span>
将第一行  <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#!/usr/bin/python  </span>
改为      <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#!/usr/bin/python2.6</span>
此时yum就ok啦</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li></ul>


theano, keras, elephas的安装

<code class="hljs cmake has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">pip <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">install</span> --upgrade --no-deps git+git://github.com/Theano/Theano.git

pip <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">install</span> keras

pip <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">install</span> elephas</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>

已达成技能

我们简单总结一下,我们已经完成的工作:

安装docker
载入了spark on docker镜像
将spark on docker 镜像中的python升级
安装了theano、keras、elephas

现在,我们已经可以做的事情:

√ 如果你的机器有多个CPU(假设24个):

你可以只开一个docker,然后很简单的使用spark结合elephas来并行(利用24个cpu)计算CNN。

√ 如果你的机器有多个GPU(假设4个):

你可以开4个docker镜像,修改每个镜像内的~/.theanorc来选择特定的GPU来并行(4个GPU)计算。(需自行安装cuda)


单机多CPU集群并行训练CNN实例

跑一个最简单的网络来训练mnist手写字识别,贴一个能够直接运行的代码(要事先下载好mnist.pkl.gz):
<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> __future__ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> absolute_import
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> __future__ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> print_function
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> numpy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> np

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.datasets <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> mnist
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.models <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> Sequential
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.layers.core <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> Dense, Dropout, Activation, Flatten
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.optimizers <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> SGD, Adam, RMSprop
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.layers.convolutional <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> Convolution2D, MaxPooling2D
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> keras.utils <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> np_utils

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> elephas.spark_model <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> SparkModel
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> elephas.utils.rdd_utils <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> to_simple_rdd

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> pyspark <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> SparkContext, SparkConf

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> gzip
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> cPickle

APP_NAME = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"mnist"</span>
MASTER_IP = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'local[24]'</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Define basic parameters</span>
batch_size = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">128</span>
nb_classes = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>
nb_epoch = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># input image dimensions</span>
img_rows, img_cols = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># number of convolutional filters to use</span>
nb_filters = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">32</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># size of pooling area for max pooling</span>
nb_pool = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># convolution kernel size</span>
nb_conv = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Load data</span>
f = gzip.open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"./mnist.pkl.gz"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"rb"</span>)
dd = cPickle.load(f)
(X_train, y_train), (X_test, y_test) = dd

X_train = X_train.reshape(X_train.shape[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>], <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>], <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, img_rows, img_cols)

X_train = X_train.astype(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"float32"</span>)
X_test = X_test.astype(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"float32"</span>)
X_train /= <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">255</span>
X_test /= <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">255</span>

print(X_train.shape[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>], <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'train samples'</span>)
print(X_test.shape[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>], <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'test samples'</span>)

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Convert class vectors to binary class matrices</span>
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()
model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
border_mode=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'full'</span>,
input_shape=(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, img_rows, img_cols)))
model.add(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>))
model.add(Convolution2D(nb_filters, nb_conv, nb_conv))
model.add(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>))
model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.add(Dropout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.25</span>))

model.add(Flatten())
model.add(Dense(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">128</span>))
model.add(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'relu'</span>))
model.add(Dropout(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.5</span>))
model.add(Dense(nb_classes))
model.add(Activation(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'softmax'</span>))

model.compile(loss=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'categorical_crossentropy'</span>, optimizer=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'adadelta'</span>)

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">## spark</span>
conf = SparkConf().setAppName(APP_NAME).setMaster(MASTER_IP)
sc = SparkContext(conf=conf)

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Build RDD from numpy features and labels</span>
rdd = to_simple_rdd(sc, X_train, Y_train)

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Initialize SparkModel from Keras model and Spark context</span>
spark_model = SparkModel(sc,model)

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Train Spark model</span>
spark_model.train(rdd, nb_epoch=nb_epoch, batch_size=batch_size, verbose=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, validation_split=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.1</span>, num_workers=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">24</span>)

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Evaluate Spark model by evaluating the underlying model</span>
score = spark_model.get_network().evaluate(X_test, Y_test, show_accuracy=<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>, verbose=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>)
print(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'Test accuracy:'</span>, score[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>])</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li></ul>

执行以下命令即可运行:

/usr/local/spark/bin/spark-submit mnist_cnn_spark.py

使用24个slave,并行迭代了5次,得到的准确率和运行时间如下:

Test accuracy: 95.68%

took: 1135s

不使用spark,大概测了一下,1次迭代就需要1800s,所以还是快7~8倍的。

多GPU集群并行训练CNN实例

由于博主近几日踩太多坑了,心实在太累了!

关于单机多GPU集群,多机多GPU集群的配置,还请各位多待几日,等博主元气恢复,会继续义无反顾地继续踩坑的。。。

为了赤焰军,我会回来的!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: