caffe学习笔记(3):solver层配置
2016-05-20 10:41
435 查看
solver层(*_solver.prototxt)与train_val.prototxt层配合使用。
solver层定义了如何使用model。
The Caffe solvers are:
1. Stochastic Gradient Descent (type: “SGD”)
2. AdaDelta (type: “AdaDelta”)
3. Adaptive Gradient (type: “AdaGrad”)
4. Adam (type: “Adam”)
5. Nesterov’s Accelerated Gradient (type: “Nesterov”)
6. RMSprop (type: “RMSProp”)
The solver :
1. scaffolds the optimization bookkeeping and creates the training network for learning and test network(s) for evaluation.
2. iteratively optimizes by calling forward / backward and updating parameters
3. (periodically) evaluates the test networks
4. snapshots the model and solver state throughout the optimization
where each iteration
1. calls network forward to compute the output and loss
2. calls network backward to compute the gradients
3. incorporates the gradients into parameter updates according to the solver method
4. updates the solver state according to learning rate, history, and method
to take the weights all the way from initialization to learned model.
Like Caffe models, Caffe solvers run in CPU / GPU modes.
先看实例:
模型分为训练阶段和测试阶段,此参数表示测试阶段需要迭代的次数,与batch_size配合使用(如果在train_val.prototxt 的data layer没有特别定义,在training和testing阶段使用相同的batch_size),例如,testing的数据集是100000,batch_size=1000,则test_iter=100。
测试间隔,每训练500次,进行一次测试。
base_lr用于设置基础学习率,在迭代的过程中,可以对基础学习率进行调整。如果training过程中出现loss=87.3365,需要调小base_lr或者batch_size。
快照功能。将训练出来的model和solver状态进行保存,snapshot用于设置训练多少次后进行保存,默认为0,不保存。snapshot_prefix设置保存路径。
本例中,在训练5000的倍数次是对模型进行保存,例如:
mynet_5000.caffemodel
mynet_5000.solverstate
mynet_10000.caffemodel
mynet_10000.solverstate
mynet_15000.caffemodel
…..
还可以设置snapshot_diff,是否保存梯度值,默认为false,不保存。
也可以设置snapshot_format,保存的类型。有两种选择:HDF5 和BINARYPROTO ,默认为BINARYPROTO
CPU或GPU可选
base_lr: 0.01
lr_policy: “inv”
gamma: 0.0001
power: 0.75
这四行可以放在一起理解,用于学习率的设置。只要是梯度下降法来求解优化,都会有一个学习率,也叫步长。base_lr用于设置基础学习率,在迭代的过程中,可以对基础学习率进行调整。怎么样进行调整,就是调整的策略,由lr_policy来设置。
lr_policy可以设置为下面这些值,相应的学习率的计算为:
- fixed: 保持base_lr不变.
- step: 如果设置为step,则还需要设置一个stepsize, 返回 base_lr * gamma ^ (floor(iter / stepsize)),其中iter表示当前的迭代次数
- exp: 返回base_lr * gamma ^ iter, iter为当前迭代次数
- inv: 如果设置为inv,还需要设置一个power, 返回base_lr * (1 + gamma * iter) ^ (- power)
- multistep: 如果设置为multistep,则还需要设置一个stepvalue。这个参数和step很相似,step是均匀等间隔变化,而multistep则是根据 stepvalue值变化
- poly: 学习率进行多项式误差, 返回 base_lr (1 - iter/max_iter) ^ (power)
- sigmoid: 学习率进行sigmod衰减,返回 base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
change file path to ‘./caffe-master/src/caffe/proto/caffe.proto’,you will see the following descriptions:
mnist模型
cifar模型
imagenet模型
solver层定义了如何使用model。
The Caffe solvers are:
1. Stochastic Gradient Descent (type: “SGD”)
2. AdaDelta (type: “AdaDelta”)
3. Adaptive Gradient (type: “AdaGrad”)
4. Adam (type: “Adam”)
5. Nesterov’s Accelerated Gradient (type: “Nesterov”)
6. RMSprop (type: “RMSProp”)
The solver :
1. scaffolds the optimization bookkeeping and creates the training network for learning and test network(s) for evaluation.
2. iteratively optimizes by calling forward / backward and updating parameters
3. (periodically) evaluates the test networks
4. snapshots the model and solver state throughout the optimization
where each iteration
1. calls network forward to compute the output and loss
2. calls network backward to compute the gradients
3. incorporates the gradients into parameter updates according to the solver method
4. updates the solver state according to learning rate, history, and method
to take the weights all the way from initialization to learned model.
Like Caffe models, Caffe solvers run in CPU / GPU modes.
先看实例:
net: "examples/mnist/lenet_train_test.prototxt" test_iter: 100 test_interval: 500 base_lr: 0.01 momentum: 0.9 type: SGD weight_decay: 0.0005 lr_policy: "inv" gamma: 0.0001 power: 0.75 display: 100 max_iter: 20000 snapshot: 5000 snapshot_prefix: "examples/mnist/lenet" solver_mode: CPU
test_iter:100
模型分为训练阶段和测试阶段,此参数表示测试阶段需要迭代的次数,与batch_size配合使用(如果在train_val.prototxt 的data layer没有特别定义,在training和testing阶段使用相同的batch_size),例如,testing的数据集是100000,batch_size=1000,则test_iter=100。
test_interval: 500
测试间隔,每训练500次,进行一次测试。
base_lr: 0.01
base_lr用于设置基础学习率,在迭代的过程中,可以对基础学习率进行调整。如果training过程中出现loss=87.3365,需要调小base_lr或者batch_size。
snapshot: 5000 snapshot_prefix: "./mynet"
快照功能。将训练出来的model和solver状态进行保存,snapshot用于设置训练多少次后进行保存,默认为0,不保存。snapshot_prefix设置保存路径。
本例中,在训练5000的倍数次是对模型进行保存,例如:
mynet_5000.caffemodel
mynet_5000.solverstate
mynet_10000.caffemodel
mynet_10000.solverstate
mynet_15000.caffemodel
…..
还可以设置snapshot_diff,是否保存梯度值,默认为false,不保存。
也可以设置snapshot_format,保存的类型。有两种选择:HDF5 和BINARYPROTO ,默认为BINARYPROTO
solver_mode: CPU
CPU或GPU可选
base_lr: 0.01
lr_policy: “inv”
gamma: 0.0001
power: 0.75
这四行可以放在一起理解,用于学习率的设置。只要是梯度下降法来求解优化,都会有一个学习率,也叫步长。base_lr用于设置基础学习率,在迭代的过程中,可以对基础学习率进行调整。怎么样进行调整,就是调整的策略,由lr_policy来设置。
base_lr: 0.01 lr_policy: "inv" gamma: 0.0001 power: 0.75
lr_policy可以设置为下面这些值,相应的学习率的计算为:
- fixed: 保持base_lr不变.
- step: 如果设置为step,则还需要设置一个stepsize, 返回 base_lr * gamma ^ (floor(iter / stepsize)),其中iter表示当前的迭代次数
- exp: 返回base_lr * gamma ^ iter, iter为当前迭代次数
- inv: 如果设置为inv,还需要设置一个power, 返回base_lr * (1 + gamma * iter) ^ (- power)
- multistep: 如果设置为multistep,则还需要设置一个stepvalue。这个参数和step很相似,step是均匀等间隔变化,而multistep则是根据 stepvalue值变化
- poly: 学习率进行多项式误差, 返回 base_lr (1 - iter/max_iter) ^ (power)
- sigmoid: 学习率进行sigmod衰减,返回 base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
change file path to ‘./caffe-master/src/caffe/proto/caffe.proto’,you will see the following descriptions:
// The learning rate decay policy. The currently implemented learning rate // policies are as follows: // - fixed: always return base_lr. // - step: return base_lr * gamma ^ (floor(iter / step)) // - exp: return base_lr * gamma ^ iter // - inv: return base_lr * (1 + gamma * iter) ^ (- power) // - multistep: similar to step but it allows non uniform steps defined by // stepvalue // - poly: the effective learning rate follows a polynomial decay, to be // zero by the max_iter. return base_lr * (1 - iter/max_iter) ^ (power) // - sigmoid: the effective learning rate follows a sigmod decay // return base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize)))) // // where base_lr, max_iter, gamma, step, stepvalue and power are defined // in the solver parameter protocol buffer, and iter is the current iteration.
mnist模型
lr_policy: "inv" gamma: 0.0001 power: 0.75
cifar模型
lr_policy: "fixed"
imagenet模型
lr_policy: "step" gamma: 0.1 stepsize: 10000
相关文章推荐
- CUDA搭建
- 深入理解CNN的细节
- Some Notes of Caffe Installation
- Some Notes of Python Interfaces Pycaffe (Caffe)
- TensorFlow人工智能引擎入门教程之十二 Caffe转换tensorflow并 跨平台调用
- TensorFlow人工智能引擎入门教程所有目录
- 安装caffe过程记录
- py-faster-rcnn训练笔记(ubuntu14.04+cuda7.5+cuDNNv3+Python2.7)
- convolutional neural network
- UFLDL Exercise: Convolutional Neural Network
- 使用深度卷积网络和支撑向量机实现的商标检测与分类的例子
- 对Pedestrian Detection aided by Deep Learning Semantic Tasks的小结
- 准确率, 召回率,mAP
- ubuntu 14.04上配置无GPU的Caffe(A卡机适用)
- caffe term: epoch, itr
- caffe+ubuntu14.04
- caffe中的iteration,batch_size, epochs理解
- 阅读 理解 思考 - Learning to Segment Object Candidates
- 卷积神经网络学习