您的位置:首页 > Web前端

Training LeNet on MNIST with Caffe

2015-10-19 21:03 411 查看

Training LeNet on MNIST with Caffe

http://www.bubuko.com/infodetail-850169.html

We will assume that you have Caffe successfully compiled. If not, please refer to theInstallation page. In this tutorial, we will assume that your Caffe installation
is located at
CAFFE_ROOT
.

准备数据集

从MNIST网站上下载和转换数据格式,输入命令:

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh


可能遇到的问题是 wget或者gunzip没有安装,安装即可。

运行后将会在./examples/mnist/下生成两个数据集——
mnist_train_lmdb
和mnist_test_lmdb


./data/mnist/get_mnist.sh
将会在./data/mnist/下下载数据集的gz格式,并解压成t10k-images-idx3-ubyte,t10k-labels-idx1-ubyte,train-images-idx3-ubyte,train-images-idx1-ubyte.

./examples/mnist/create_mnist.sh
create_mnist.sh是利用caffe-master/build/examples/mnist/的convert_mnist_data.bin工具,将mnist date转化为可用的lmdb格式的文件。并将新生成的2个文件mnist-train-lmdb
和 mnist-test-lmdb放于create_mnist.sh同目录下。(convert_mnist_data.bin文件位于./build/examples/mnist文件夹下,convert_mnist_data.cpp位于./examples/mnist/文件夹下)

LeNet:MNIST分类模型

运行训练程序之前,解释一下将会发生什么。这里使用的是LeNet网络,它因在训练数字分类任务中表现出色而为人知晓。这里使用的是相对于原始LeNet的实现方法有些许不同的版本,使用Rectified Linear Unit (ReLU)激励函数替换神经元中的sigmoid激励函数。

LeNet的设计包含了CNNs的本质,这一点仍旧在ImageNet那样的大型模型中使用。通常情况下,它的组成是——卷积层,紧跟着是一个pooling layer,另一个卷积层紧跟着一个pooling layer,然后两个非常接近conventional multilayer感知器的fully connected layers。在
$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt
定义了这些层。

定义MNIST网络

这个部分解释了
lenet_train_test.prototxt
模型定义文件如何指定了用于MNIST手写数字分类的LeNet模型。假设你已经了解Google Protobuf,假设你已经阅读过用于Caffe的protobuf定义(
$CAFFE_ROOT/src/caffe/proto/caffe.proto
.)Specifically,
we will write a
caffe::NetParameter
(or in python,
caffe.proto.caffe_pb2.NetParameter
) protobuf. 设置从指定网络名字开始:

name: "LeNet"

Writing the Data Layer 数据层

从我们之前创建的lmdb中读取MNIST数据,这个过程通过一个data layer定义:

//与实际文件夹中的有一些不一样
layer {
name: "mnist"
type: "Data"
data_param {
source: "mnist_train_lmdb"   //train  test 各一个层
backend: LMDB
batch_size: 64
scale: 0.00390625
}
top: "data"
top: "label"
}

这里指定的,这一层的layer的名字是mnist,类型是data,从给定的lmdb源读取数据。使用batch的大小是64,测量输入像素,范围[0,1)。为什么是0.00390625?因为它是1除以256。最后,这个层产生两个blob,一个是data,一个是label。

Writing the Convolution Layer卷积层

定义第一个卷积层:

layer {
name: "conv1"
type: "Convolution"
param { lr_mult: 1 }
param { lr_mult: 2 }
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "data"
top: "conv1"
}

这一层使用了data blob(数据层提供的),并且产生了conv1 layer。产生了20个频道(channel)的输出,卷积核大小为5,stride大小为1。filers允许我们随机初始化权值和编制单元bias。对于weight filler,使用
xavier
算法,基于输入和输出神经元自动定义初始化的规模。对于bias filler,简单地作为常量constant初始化,使用默认填充值0。

lr_mult
s are the learning rate adjustments for the layer’s learnable parameters. In this case, we will set the weight learning rate to be the same as the learning rate given by the solver during runtime, and the bias learning rate to be twice
as large as that - this usually leads to better convergence rates.

Writing the Pooling Layer

Phew. Pooling layers are actually much easier to define:

layer {
name: "pool1"
type: "Pooling"
pooling_param {
kernel_size: 2
stride: 2
pool: MAX
}
bottom: "conv1"
top: "pool1"
}

This says we will perform max pooling with a pool kernel size 2 and a stride of 2 (so no overlapping between neighboring pooling regions).

Similarly, you can write up the second convolution and pooling layers. Check
$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt
for details.

Writing the Fully Connected Layer

Writing a fully connected layer is also simple:

layer {
name: "ip1"
type: "InnerProduct"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "pool2"
top: "ip1"
}

This defines a fully connected layer (known in Caffe as an
InnerProduct
layer) with 500 outputs. All other lines look familiar, right?

Writing the ReLU Layer

A ReLU Layer is also simple:

layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}

Since ReLU is an element-wise operation, we can do in-place operations to save some memory. This is achieved by simply giving the same name to the bottom and top blobs. Of course, do NOT use duplicated blob names for other layer types!

After the ReLU layer, we will write another innerproduct layer:

layer {
name: "ip2"
type: "InnerProduct"
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
bottom: "ip1"
top: "ip2"
}

Writing the Loss Layer

Finally, we will write the loss!

layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
}

The
softmax_loss
layer implements both the softmax and the multinomial logistic loss (that saves time and improves numerical stability). It takes two blobs, the first one being the prediction and the second one being the
label
provided
by the data layer (remember it?). It does not produce any outputs - all it does is to compute the loss function value, report it when backpropagation starts, and initiates the gradient with respect to
ip2
. This is where all magic starts.

Additional Notes: Writing Layer Rules 定义层的规则

Layer definitions can include rules for whether and when they are included in the network definition, like the one below:

layer {
// ...layer definition...
include: { phase: TRAIN }
}

规则,控制网络中的层包含的内容,基于当前网络状态。查看
$CAFFE_ROOT/src/caffe/proto/caffe.proto
,了解更多层规则和模型概要的信息。在上面的例子中,这一层将会只包括在TRAIN阶段。如果我们把TRAIN改成了TEST,那么这一层将会只在TEST阶段使用。By default, that is without layer rules,
a layer is always included in the network.

因此,
lenet_train_test.prototxt定义了两个DATA层,一个用于训练阶段,一个用于测试阶段。
同时还有一个精度层定义在
lenet_solver.prototxt
,只包含在TEST阶段,用于每隔100个实例报告模型的精度。

定义MNIST Solver

查看
$CAFFE_ROOT/examples/mnist/lenet_solver.prototxt
中的每一行的注释:

# The train/test net protocol buffer definition   网络协议具体定义
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
test迭代次数

# In the case of MNIST, we have test batch size 100 and 100 test iterations, 如果batch_size =100,则100张图一批
# covering the full 10,000 testing images. 训练100次,则可以覆盖10000张图的需求
test_iter: 100
# Carry out testing every 500 training iterations. 训练迭代500次,测试一次
test_interval: 500
# The base learning rate, momentum and the weight decay of the network. 网络参数:学习率,动量,权重的衰减
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy 学习策略:有固定学习率和每步递减学习率
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations                     每迭代100次显示一次
display: 100
# The maximum number of iterations                  迭代次数最大值
max_iter: 10000
# snapshot intermediate results                    每5000次迭代存储一次数据,路径前缀是
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU

训练和测试模型

写好网络定义和soler的protobuf文件之后,训练模型是非常简单的。只需要运行
train_lenet.sh
即可,命令如下:

cd $CAFFE_ROOT
./examples/mnist/train_lenet.sh

train_lenet.sh
是一个简单的脚本,内容如下:

#!/usr/bin/env sh

./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt
解释:训练的主要工具是caffe(build/tools/caffe),动作是train, argument是solver protobuf文件。

当你运行时,你将会看到以下消息飞速闪过:

I1203 net.cpp:66] Creating Layer conv1
I1203 net.cpp:76] conv1 <- data
I1203 net.cpp:101] conv1 -> conv1
I1203 net.cpp:116] Top shape: 20 24 24
I1203 net.cpp:127] conv1 needs backward computation.

这个信息告诉你每一层的细节,connections和output shape,这些信息有助于debugging。初始化之后,训练将会开始:

I1203 net.cpp:142] Network initialization done.
I1203 solver.cpp:36] Solver scaffolding done.
I1203 solver.cpp:44] Solving LeNet

基于solver设置,将会每100次迭代打印出训练loss function,每1000次迭代测试网络。将会看到如下信息:

I1203 solver.cpp:204] Iteration 100, lr = 0.00992565
I1203 solver.cpp:66] Iteration 100, loss = 0.26044
...
I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9785
I1203 solver.cpp:111] Test score #1: 0.0606671

对于每一个训练循环,lr是那个循环的学习速率,loss是训练函数。对于测试阶段的输出,score 0是精度,score 1是测试损失函数。

几分钟后,结束!

I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9897
I1203 solver.cpp:111] Test score #1: 0.0324599
I1203 solver.cpp:126] Snapshotting to lenet_iter_10000
I1203 solver.cpp:133] Snapshotting solver state to lenet_iter_10000.solverstate
I1203 solver.cpp:78] Optimization Done.

最终的模型存储于一个二进制protobuf文件:
lenet_iter_10000




运行后生成四个文件

lenet_iter_10000.caffemodel

lenet_iter_10000.solverstate

lenet_iter_5000.caffemodel

lenet_iter_5000.solverstate

如果你训练在一个真是世界的应用的数据集,那么这个文件可以作为一个训练模型部署到你的应用中。

Um… How about GPU training?

你刚刚已经做到了!所有的训练都是在GPU上执行的。事实上,如果你想要在CPU上训练,你只需要更改
lenet_solver.prototxt中的一行:


# solver mode: CPU or GPU
solver_mode: CPU

然后你将会使用CPU进行训练,是不是很简单?

MNIST是一个小的数据集,由于通信开销(communication overheads)的影响,在GPU训练并不能创建出多少优势。在一个大的数据集使用复杂的模型,例如ImangeNet,计算速度的差别将会非常明显。

How to reduce the learning rate at fixed steps?

Look at lenet_multistep_solver.prototxt

其他示例

以上例子运行的是train_lenet.sh,还有其他的示例,如:

./examples/mnist/train_mnist_autoencoder.sh
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: