您的位置：首页 > Web前端

使用SSD基于caffe框架训练自己的数据

2017-12-06 19:17 1076 查看

声明：本文仅以自己的实验过程编写，如若转载，请与博主联系。建议读者在做本文中的尝试之前，先训练一编官网论文中的例子，按照SSD教程跑一边，或者参看博主博文：http://blog.csdn.net/xunan003/article/details/78427446    配置SSD并完整调试一遍。

如果文章中链接点击错误，可复制该链接至地址栏中访问！！！

参考博文：http://blog.csdn.net/u010167269/article/details/52851667

                    http://blog.csdn.net/u014696921/article/details/53353896
一、总体介绍

        SSD基本基于python实现。使用VGGNET做为基础网络。运行一个完整SSD过程，需要做完成以下几个准备工作：

1、数据的标注。即将待训练验证和测试图像做标注，确定识别哪几类目标，标注后生成xml文件供训练测试时提供位置信息。

2、数据集的转换。SSD训练与分类网络类似同样需要将图片数据转化成为lmdb格式，此过程转换之前，要根据标注的位置信息生成相应的trainval.txt和test.txt文件。

3、修改训练执行脚本。根据我们自己的数据和内容修改SSD训练执行脚本ssd_pasical.py文件。

4、执行训练模型。

5、使用训练出的模型进行测试。

二、数据的标注

此过程主要使用数据标注工具labelImg完成，具体使用方法请参看博主博文：http://blog.csdn.net/xunan003/article/details/78720189

我们的数据使用INRIA数据集中train和test中的person_and_bike组合而成，共159张图片，我们使用前面120张做trainval，最后39张做test。

1、首先在ssd-caffe目录下建立文件夹image，image文件夹下建立文件夹INRIA_TRAIN_part用来保存我们的数据。该文件夹下类似于VOC数据格式，新建三个必须子文件夹：Annotations、ImageSets、PNGImage分别用来保存我们的xml文件、txt文件、原始图片数据。

2、将159张原始图片数据存放入PNGImg文件夹下，此文件夹下最好再新建两个文件夹trainval和test，将前120张图片复制一份放入trainval备份，将后39张图片放入test中备份，这样做的原因虽然多占用了存储容量，但是在后续可以利用python脚本容易生成txt文件，免去更改create_list.sh脚本生成txt的烦扰。如图1所示。

图1. 我的是全部执行完毕之后的状况

3、使用labelImg标注工具标注图像目标。本文中我们检测两个目标person和bike，打开labelImg工具使用快捷键选中PNGImg文件夹做图像源文件，选中xml保存文件夹为Annotations，然后开始159张图片的标注，标注完成后如图2。

图2.标注工具使用和标注后生成的xml文件

三、数据集的转换

在转换db文件之前，首先要先生成txt文件。按照博文http://blog.csdn.net/u014696921/article/details/53353896 的方法需要两组txt文件，一组是不带路径和后缀的纯图片名称存放在ImgeSets目录中，二组是根据一组而生成带有路径、后缀以及相对应的xml文件的路径文件名放在一起的txt文件。我们的方法不需要第一组txt文件，但是还是讲解一下生成第一组txt文件生成的脚本，但是训练过程中这部分txt文件没有用到，所以读者可不用考虑。

1、第一组txt文件生成方法（此部分可以不做考虑）

我们为了方便起见使用matlab实现，如果caffe编译里matlab接口，可以使用matlab接口直接运行.m文件运行生成。如果没有编译matlab接口则需要使用matlab软件，windows系统或ubuntu系统安装的matlab均可以。博主使用的是matlab r2012b版本。博主是在windows系统下生成的，因为博主有两台电脑。

首先将159张图片放在一个单独的文件夹pos下，使用matlab编译程序。脚本如下，ubuntu系统类似，只需更改路径即可。

%%
clear all; close all; clc;

%%
Dataset = 'E:\目标检测\练习\INRIA\img';%159张图片保存的位置
Folder = dir(fullfile(Dataset,'*.png'));%dir()返回文件夹中的所有png文件所组成的列表，结构体名为Floder
%Folder(1:2) = [];%把数组a的第一个和第二个元素取空
NumCls = length(Folder);%length()表示向量长度，即图片个数

%%
fid = fopen('E:\目标检测\练习\MIT\trainval.txt','w');%生成的trainval.txt文件保存位置，自己指定。如果需要生成test.txt文件，则把trainval相应改为test即可。
fid = fopen('E:\目标检测\练习\MIT\trainval.txt','a');
for iCls = 1:120    %这里是取的图片个数，我们是生成trainval.txt所以取前120个图片。如果要生成test.txt则改为121：NumCls即可。
ClsName = Folder(iCls).name;%结构体Floder中第iCLs个文件的名字
newClsName = ClsName(1:end-4);%取消文件名中的.png后缀，如果后缀为.jpeg则需要改为end-5。
ImgName_Label = sprintf('%s\n',newClsName);  %写入文件名
fprintf(fid, ImgName_Label);
end
fclose(fid);

生成的txt文件如图3所示。

图3.将生成的trainval.txt和test.txt文本拷贝存放在目录ImgeSets下即可。

2、生成第二组txt文件，必须生成，使用lmdb文件时需要使用。

此步骤需要生成三个txt文件——trainval.txt、test.txt以及test_name_size.txt文件，注意此处trainval.txt和test.txt是不同的文件，所以不能保存在相同的路径下，我们保存在image根目录下。test_name_size.txt文件保存了39张测试图片的名称和图片大小信息。参考博文中和ssd框架默认的都是使用的文件create_list.sh脚本生成这三个txt文件，但是博主使用此脚本更改了很久只能生成trainval.txt和test.txt文本，但是测试图片大小文本始终无法生成，错误不断，易读性较差，必须将我们的所有文件包括图片、xml文件等完全按照VOC实验时的数据存放路径放，不能像本文这样自己指定存放自己的路径，不然始终有错误。另外脚本所使用的get_image_size工具有缺陷，其对jpg图片使用较适合，但是对png等其它格式图片有缺陷，有可能使得所获取的图片大小完全离谱且每次执行的结果都不同。针对以上缺陷，我们使用python脚本完成此步骤，简单易读。

1）、生成trainval.txt和test.txt文件的脚本（此脚本需要提前将39张test图片和120张trainval图片分开存放，这也就是我们在二中将图片分开备份一份的目的。）

#! /usr/bin/python

import os, sys
import glob

trainval_dir = "/home/xn/caffe/image/INRIA_TRAIN_part/PNGImages/trainval"  #trainval图片保存路径
test_dir = "/home/xn/caffe/image/INRIA_TRAIN_part/PNGImages/test"  #test图片保存路径

trainval_img_lists = glob.glob(trainval_dir + '/*.png')    #获取trainval中所有.png的文件
trainval_img_names = []    #获取名称
for item in trainval_img_lists:
temp1, temp2 = os.path.splitext(os.path.basename(item))
trainval_img_names.append(temp1)

test_img_lists = glob.glob(test_dir + '/*.png')   #获取test中所有.png文件
test_img_names = []
for item in test_img_lists:
temp1, temp2 = os.path.splitext(os.path.basename(item))
test_img_names.append(temp1)
#dist_img_dir1 = "INRIA_TRAIN_part/PNGImages/trainval"
#dist_img_dir2 = "INRIA_TRAIN_part/PNGImages/test"
dist_img_dir = "INRIA_TRAIN_part/PNGImages"    #需要写入txt的trainval和test路径，因为我们在PNGImges目录下除了有trainval和test文件夹外还有所有159张图片，而文件夹trainval和test文件夹只是用于生成txt之用
dist_anno_dir = "INRIA_TRAIN_part/Annotations" #需要写入的xml路径

trainval_fd = open("/home/xn/caffe/image/trainval.txt", 'w')
test_fd = open("/home/xn/caffe/image/test.txt", 'w')

for item in trainval_img_names:
trainval_fd.write(dist_img_dir + '/' + str(item) + '.png' + ' ' + dist_anno_dir + '/' + str(item) + '.xml\n')

for item in test_img_names:
test_fd.write(dist_img_dir + '/' + str(item) + '.png' + ' ' + dist_anno_dir + '/' + str(item) + '.xml\n')

生成后的txt文件如图4。保存在INRIA_TRAIN_part根目录下。

图4.生成的trainval.txt和test.txt文本。

2）、生成test_name_size.txt，所用python脚本如下。

#! /usr/bin/python

import os, sys
import glob
from PIL import Image

img_dir = "/home/xn/caffe/image/INRIA_TRAIN_part/PNGImages/test/"  #39张test图片保存路径

img_lists = glob.glob(img_dir + '/*.png')

test_name_size = open('/home/xn/caffe/image/test_name_size.txt', 'w')

for item in img_lists:
img = Image.open(item)
width, height = img.size
temp1, temp2 = os.path.splitext(os.path.basename(item))
test_name_size.write(temp1 + ' ' + str(height) + ' ' + str(width) + '\n')

生成的txt文件如图5。

图5.test_name_size.txt文件包含图片名和图片大小，从左到右分别为：名称   高（height）   宽（width）

3、生成lmdb文件

       生成了相应的txt文件之后生成训练数据lmdb格式。

       首先在ssd-caffe/data/VOC0712目录下拷贝create_data.sh至目录image下并重命名为create_INRIA_data.sh。修改其中的内容，主要为路径。如下

cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
root_dir=$cur_dir/../..

cd $root_dir

redo=1
data_root_dir="$HOME/caffe/image"
dataset_name="INRIA"
mapfile="$HOME/caffe/image/labelmap_INRIA.prototxt"   #此文件我们在下面4中补充。
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0

extra_cmd="--encode-type=png --encoded"
if [ $redo ]
then
extra_cmd="$extra_cmd --redo"
fi
for subset in test trainval
do   #下面的路径需要根据自己的情况修改，我们的就是这样
python $HOME/caffe/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir $HOME/caffe/image/$subset.txt $data_root_dir/INRIA_TRAIN_part/$db/$dataset_name"_"$subset"_"$db /home/xn/caffe/examples/INRIA_TRAIN_part
done

以上脚本生成了两份lmdb文件，一份在INRIA_TRAIN_part目录下，一份备份到examples/INRIA_TRAIN_part目录下。

4、脚本3中所需的labelmap_INRIA.prototxt文件在ssd-caffe/data/VOC0712目录下，拷贝一份至image目录下并改名为labelmap_INRIA.prototxt

根据自己的类别（我们的是person和bike）修改其中的内容（第一部分不能改变）为

item {

name: "none_of_the_above"

label: 0

display_name: "background"

}

item {

name: "person"

label: 1

display_name: "person"

}

item {

name: "bike"

label: 2

display_name: "bike"

}

并保存即可。如下图

四、修改训练执行脚本

       训练时使用ssd demo中提供的预训练好的VGGnet model : VGG_ILSVRC_16_layers_fc_reduced.caffemodel

       将该模型保存到$CAFFE_ROOT/models/VGGNet下。

       ssd训练执行脚本主要是保存在ssd-caffe/examples/ssd路径下的ssd_pascal.py，此脚本为综合脚本，其包含了生成训练文件train.prototxt、test.prototxt、solver.prototxt，以及最终训练命令都包含在内，所以说要想熟练的使用ssd框架，此脚本是我们必须要仔细研读和熟悉的。我们拷贝一份至image目录下并修改其中的内容。修改部分有标记，修改后的脚本如下

from __future__ import print_function
import caffe
from caffe.model_libs import *
from google.protobuf import text_format

import math
import os
import shutil
import stat
import subprocess
import sys

# Add extra layers on top of a "base" network (e.g. VGGNet or Inception).
def AddExtraLayers(net, use_batchnorm=True, lr_mult=1):
use_relu = True

# Add additional convolutional layers.
# 19 x 19
from_layer = net.keys()[-1]

# TODO(weiliu89): Construct the name using the last layer to avoid duplication.
# 10 x 10
out_layer = "conv6_1"
ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 1, 0, 1,
lr_mult=lr_mult)

from_layer = out_layer
out_layer = "conv6_2"
ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 512, 3, 1, 2,
lr_mult=lr_mult)

# 5 x 5
from_layer = out_layer
out_layer = "conv7_1"
ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,
lr_mult=lr_mult)

from_layer = out_layer
out_layer = "conv7_2"
ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 1, 2,
lr_mult=lr_mult)

# 3 x 3
from_layer = out_layer
out_layer = "conv8_1"
ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,
lr_mult=lr_mult)

from_layer = out_layer
out_layer = "conv8_2"
ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 0, 1,
lr_mult=lr_mult)

# 1 x 1
from_layer = out_layer
out_layer = "conv9_1"
ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 128, 1, 0, 1,
lr_mult=lr_mult)

from_layer = out_layer
out_layer = "conv9_2"
ConvBNLayer(net, from_layer, out_layer, use_batchnorm, use_relu, 256, 3, 0, 1,
lr_mult=lr_mult)

return net

### Modify the following parameters accordingly ###
# The directory which contains the caffe code.
# We assume you are running the script at the CAFFE_ROOT.
caffe_root = os.getcwd()

# Set true if you want to start training right after generating all files.
run_soon = True
# Set true if you want to load from most recently saved snapshot.
# Otherwise, we will load from the pretrain_model defined below.
resume_training = True
# If true, Remove old model files.
remove_old_models = False

# The database file for training data. Created by data/VOC0712/create_data.sh
train_data = "examples/INRIA_TRAIN_part/INRIA_trainval_lmdb"             #改成自己lmdb的路径
# The database file for testing data. Created by data/VOC0712/create_data.sh
test_data = "examples/INRIA_TRAIN_part/INRIA_test_lmdb"            #改成自己lmdb的路径
# Specify the batch sampler.
resize_width = 300           # 图片大小可以改变
resize_height = 300
resize = "{}x{}".format(resize_width, resize_height)
batch_sampler = [
{
'sampler': {
},
'max_trials': 1,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio': 0.5,
'max_aspect_ratio': 2.0,
},
'sample_constraint': {
'min_jaccard_overlap': 0.1,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio': 0.5,
'max_aspect_ratio': 2.0,
},
'sample_constraint': {
'min_jaccard_overlap': 0.3,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio': 0.5,
'max_aspect_ratio': 2.0,
},
'sample_constraint': {
'min_jaccard_overlap': 0.5,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio': 0.5,
'max_aspect_ratio': 2.0,
},
'sample_constraint': {
'min_jaccard_overlap': 0.7,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio': 0.5,
'max_aspect_ratio': 2.0,
},
'sample_constraint': {
'min_jaccard_overlap': 0.9,
},
'max_trials': 50,
'max_sample': 1,
},
{
'sampler': {
'min_scale': 0.3,
'max_scale': 1.0,
'min_aspect_ratio': 0.5,
'max_aspect_ratio': 2.0,
},
'sample_constraint': {
'max_jaccard_overlap': 1.0,
},
'max_trials': 50,
'max_sample': 1,
},
]
train_transform_param = {
'mirror': True,
'mean_value': [104, 117, 123],      #均值可以跟据分类任务中的均值计算方法计算修改
'resize_param': {
'prob': 1,
'resize_mode': P.Resize.WARP,
'height': resize_height,
'width': resize_width,
'interp_mode': [
P.Resize.LINEAR,
P.Resize.AREA,
P.Resize.NEAREST,
P.Resize.CUBIC,
P.Resize.LANCZOS4,
],
},
'distort_param': {
'brightness_prob': 0.5,
'brightness_delta': 32,
'contrast_prob': 0.5,
'contrast_lower': 0.5,
'contrast_upper': 1.5,
'hue_prob': 0.5,
'hue_delta': 18,
'saturation_prob': 0.5,
'saturation_lower': 0.5,
'saturation_upper': 1.5,
'random_order_prob': 0.0,
},
'expand_param': {
'prob': 0.5,
'max_expand_ratio': 4.0,
},
'emit_constraint': {
'emit_type': caffe_pb2.EmitConstraint.CENTER,
}
}
test_transform_param = {
'mean_value': [104, 117, 123],
'resize_param': {
'prob': 1,
'resize_mode': P.Resize.WARP,
'height': resize_height,
'width': resize_width,
'interp_mode': [P.Resize.LINEAR],
},
}

# If true, use batch norm for all newly added layers.
# Currently only the non batch norm version has been tested.
use_batchnorm = False
lr_mult = 1
# Use different initial learning rate.
if use_batchnorm:
base_lr = 0.0004
else:
# A learning rate for batch_size = 1, num_gpus = 1.
base_lr = 0.000004         #基础学习虑的修改，因为我们的数据在原始的0.00004下会发散，因此缩小10倍，最终生成的solver.prototxt文件中基础学习率为0.0001，如不修改则为0.001

# Modify the job name if you want.
job_name = "SSD_{}".format(resize)
# The name of the model. Modify it if you want.
model_name = "VGG_INRIA_{}".format(job_name)   #修改我们的模型名称

# Directory which stores the model .prototxt file.
save_dir = "models/VGGNet/INRIA/{}".format(job_name)   #修改此文件生成的所有prototxt文件名称及保存路径
# Directory which stores the snapshot of models.
snapshot_dir = "models/VGGNet/INRIA/{}".format(job_name)    #训练得到的快照模型保存路径
# Directory which stores the job script and log file.
job_dir = "jobs/VGGNet/INRIA/{}".format(job_name)       #改为自己的路径
# Directory which stores the detection results.
output_result_dir = "{}/xn/image/results/INRIA/{}/Main".format(os.environ['HOME'], job_name)    #输出的结果路径

# model definition files.
train_net_file = "{}/train.prototxt".format(save_dir)
test_net_file = "{}/test.prototxt".format(save_dir)
deploy_net_file = "{}/deploy.prototxt".format(save_dir)
solver_file = "{}/solver.prototxt".format(save_dir)
# snapshot prefix.
snapshot_prefix = "{}/{}".format(snapshot_dir, model_name)
# job script path.
job_file = "{}/{}.sh".format(job_dir, model_name)

# Stores the test image names and sizes. Created by data/VOC0712/create_list.sh
name_size_file = "xn/image/test_name_size.txt"          #自己的test_name_size .txt文件路径
# The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.
pretrain_model = "models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"    #下载的微调预训练模型位置
# Stores LabelMapItem.
label_map_file = "xn/image/labelmap_INRIA.prototxt"     #我们的labelmap_INRIA.prototxt位置

# MultiBoxLoss parameters.
num_classes = 3                      #此处需要注意，修改为自己的训练类别数+1，我们的数据为person和bike共2类，所以这里设置为3
share_location = True
background_label_id=0
train_on_diff_gt = True
normalization_mode = P.Loss.VALID
code_type = P.PriorBox.CENTER_SIZE
ignore_cross_boundary_bbox = False
mining_type = P.MultiBoxLoss.MAX_NEGATIVE
neg_pos_ratio = 3.
loc_weight = (neg_pos_ratio + 1.) / 4.
multibox_loss_param = {
'loc_loss_type': P.MultiBoxLoss.SMOOTH_L1,
'conf_loss_type': P.MultiBoxLoss.SOFTMAX,
'loc_weight': loc_weight,
'num_classes': num_classes,
'share_location': share_location,
'match_type': P.MultiBoxLoss.PER_PREDICTION,
'overlap_threshold': 0.5,
'use_prior_for_matching': True,
'background_label_id': background_label_id,
'use_difficult_gt': train_on_diff_gt,
'mining_type': mining_type,
'neg_pos_ratio': neg_pos_ratio,
'neg_overlap': 0.5,
'code_type': code_type,
'ignore_cross_boundary_bbox': ignore_cross_boundary_bbox,
}
loss_param = {
'normalization': normalization_mode,
}

# parameters for generating priors.
# minimum dimension of input image
min_dim = 300
# conv4_3 ==> 38 x 38
# fc7 ==> 19 x 19
# conv6_2 ==> 10 x 10
# conv7_2 ==> 5 x 5
# conv8_2 ==> 3 x 3
# conv9_2 ==> 1 x 1
mbox_source_layers = ['conv4_3', 'fc7', 'conv6_2', 'conv7_2', 'conv8_2', 'conv9_2']
# in percent %
min_ratio = 20
max_ratio = 90
step = int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))
min_sizes = []
max_sizes = []
for ratio in xrange(min_ratio, max_ratio + 1, step):
min_sizes.append(min_dim * ratio / 100.)
max_sizes.append(min_dim * (ratio + step) / 100.)
min_sizes = [min_dim * 10 / 100.] + min_sizes
max_sizes = [min_dim * 20 / 100.] + max_sizes
steps = [8, 16, 32, 64, 100, 300]
aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]
# L2 normalize conv4_3.
normalizations = [20, -1, -1, -1, -1, -1]
# variance used to encode/decode prior bboxes.
if code_type == P.PriorBox.CENTER_SIZE:
prior_variance = [0.1, 0.1, 0.2, 0.2]
else:
prior_variance = [0.1]
flip = True
clip = False

# Solver parameters.
# Defining which GPUs to use.
gpus = "0"
gpulist = gpus.split(",")
num_gpus = len(gpulist)

# Divide the mini-batch to different GPUs.
batch_size = 16            #根据GPU及电脑内存的容量来修改批次大小，太大会导致内存爆炸无法训练
accum_batch_size = 16
iter_size = accum_batch_size / batch_size         #这里根据设置的批次的大小计算出迭代次数，不需要修改
solver_mode = P.Solver.CPU
device_id = 0
batch_size_per_device = batch_size
if num_gpus > 0:
batch_size_per_device = int(math.ceil(float(batch_size) / num_gpus))
iter_size = int(math.ceil(float(accum_batch_size) / (batch_size_per_device * num_gpus)))
solver_mode = P.Solver.GPU
device_id = int(gpulist[0])

if normalization_mode == P.Loss.NONE:    #用if做条件语句
base_lr /= batch_size_per_device            #根据代码起始处注释的basr_lr计算最终训练的base_lr
elif normalization_mode == P.Loss.VALID:
base_lr *= 25. / loc_weight    #根据起始处注释的basr_lr计算最终训练的base_lr

elif normalization_mode == P.Loss.FULL:
# Roughly there are 2000 prior bboxes per image.
# TODO(weiliu89): Estimate the exact # of priors.
base_lr *= 2000. #根据起始处注释的basr_lr计算最终训练的base_lr
# Evaluate on whole test set.
num_test_image = 39 #修改为自己的test图片数量，我们的为39
test_batch_size = 8 #可以修改test批次大小，必要时需要改小
# Ideally test_batch_size should be divisible by num_test_image,
# otherwise mAP will be slightly off the true value.
test_iter = int(math.ceil(float(num_test_image) / test_batch_size)) #根据上两条设置计算出每test一次所需要的迭代次数
solver_param = {
# Train parameters
'base_lr': base_lr,
'weight_decay': 0.0005,
'lr_policy': "multistep",
'stepvalue': [20000, 40000, 60000], #此处可以根据自己数据量的大小修改，学习虑衰减的迭代次数
'gamma': 0.1,
'momentum': 0.9,
'iter_size': iter_size,
'max_iter': 80000, #根据自己的数据量修改
'snapshot': 80000, #根据自己的数据量修改
'display': 20,
'average_loss': 20,
'type': "SGD",
'solver_mode': solver_mode,
'device_id': device_id,
'debug_info': False,
'snapshot_after_train': True,
# Test parameters
'test_iter': [test_iter],
'test_interval': 200, #根据自己的情况修改
'eval_type': "detection",
'ap_version': "11point",
'test_initialization': False,
}
# parameters for generating detection output.
det_out_param = {
'num_classes': num_classes,
'share_location': share_location,
'background_label_id': background_label_id,
'nms_param': {'nms_threshold': 0.45, 'top_k': 400},
'save_output_param': {
'output_directory': output_result_dir,
'output_name_prefix': "comp4_det_test_",
'output_format': "INRIA",
'label_map_file': label_map_file,
'name_size_file': name_size_file,
'num_test_image': num_test_image,
},
'keep_top_k': 200,
'confidence_threshold': 0.01,
'code_type': code_type,
}

# parameters for evaluating detection results.
det_eval_param = {
'num_classes': num_classes,
'background_label_id': background_label_id,
'overlap_threshold': 0.5,
'evaluate_difficult_gt': False,
'name_size_file': name_size_file,
}

### Hopefully you don't need to change the following ###
# Check file.
check_if_exist(train_data)
check_if_exist(test_data)
check_if_exist(label_map_file)
check_if_exist(pretrain_model)
make_if_not_exist(save_dir)
make_if_not_exist(job_dir)
make_if_not_exist(snapshot_dir)

# Create train net.
net = caffe.NetSpec()
net.data, net.label = CreateAnnotatedDataLayer(train_data, batch_size=batch_size_per_device,
train=True, output_label=True, label_map_file=label_map_file,
transform_param=train_transform_param, batch_sampler=batch_sampler)

VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True, dropout=False)

AddExtraLayers(net, use_batchnorm, lr_mult=lr_mult)

mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,
use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,
aspect_ratios=aspect_ratios, steps=steps, normalizations=normalizations,
num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,
prior_variance=prior_variance, kernel_size=3, pad=1, lr_mult=lr_mult)

# Create the MultiBoxLossLayer.
name = "mbox_loss"
mbox_layers.append(net.label)
net[name] = L.MultiBoxLoss(*mbox_layers, multibox_loss_param=multibox_loss_param,
loss_param=loss_param, include=dict(phase=caffe_pb2.Phase.Value('TRAIN')),
propagate_down=[True, True, False, False])
with open(train_net_file, 'w') as f:
print('name: "{}_train"'.format(model_name), file=f)
print(net.to_proto(), file=f)
shutil.copy(train_net_file, job_dir)

# Create test net.
net = caffe.NetSpec()
net.data, net.label = CreateAnnotatedDataLayer(test_data, batch_size=test_batch_size,
train=False, output_label=True, label_map_file=label_map_file,
transform_param=test_transform_param)

VGGNetBody(net, from_layer='data', fully_conv=True, reduced=True, dilated=True,
dropout=False)

AddExtraLayers(net, use_batchnorm, lr_mult=lr_mult)

mbox_layers = CreateMultiBoxHead(net, data_layer='data', from_layers=mbox_source_layers,
use_batchnorm=use_batchnorm, min_sizes=min_sizes, max_sizes=max_sizes,
aspect_ratios=aspect_ratios, steps=steps, normalizations=normalizations,
num_classes=num_classes, share_location=share_location, flip=flip, clip=clip,
prior_variance=prior_variance, kernel_size=3, pad=1, lr_mult=lr_mult)

conf_name = "mbox_conf"
if multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.SOFTMAX:
reshape_name = "{}_reshape".format(conf_name)
net[reshape_name] = L.Reshape(net[conf_name], shape=dict(dim=[0, -1, num_classes]))
softmax_name = "{}_softmax".format(conf_name)
net[softmax_name] = L.Softmax(net[reshape_name], axis=2)
flatten_name = "{}_flatten".format(conf_name)
net[flatten_name] = L.Flatten(net[softmax_name], axis=1)
mbox_layers[1] = net[flatten_name]
elif multibox_loss_param["conf_loss_type"] == P.MultiBoxLoss.LOGISTIC:
sigmoid_name = "{}_sigmoid".format(conf_name)
net[sigmoid_name] = L.Sigmoid(net[conf_name])
mbox_layers[1] = net[sigmoid_name]

net.detection_out = L.DetectionOutput(*mbox_layers,
detection_output_param=det_out_param,
include=dict(phase=caffe_pb2.Phase.Value('TEST')))
net.detection_eval = L.DetectionEvaluate(net.detection_out, net.label,
detection_evaluate_param=det_eval_param,
include=dict(phase=caffe_pb2.Phase.Value('TEST')))

with open(test_net_file, 'w') as f:
print('name: "{}_test"'.format(model_name), file=f)
print(net.to_proto(), file=f)
shutil.copy(test_net_file, job_dir)# Create deploy net.

# Remove the first and last layer from test net.
deploy_net = net
with open(deploy_net_file, 'w') as f:
net_param = deploy_net.to_proto()
# Remove the first (AnnotatedData) and last (DetectionEvaluate) layer from test net.
del net_param.layer[0]
del net_param.layer[-1]
net_param.name = '{}_deploy'.format(model_name)
net_param.input.extend(['data'])
net_param.input_shape.extend([
caffe_pb2.BlobShape(dim=[1, 3, resize_height, resize_width])])
print(net_param, file=f)
shutil.copy(deploy_net_file, job_dir)

# Create solver.
solver = caffe_pb2.SolverParameter(
train_net=train_net_file,
test_net=[test_net_file],
snapshot_prefix=snapshot_prefix,
**solver_param)

with open(solver_file, 'w') as f:
print(solver, file=f)
shutil.copy(solver_file, job_dir)

max_iter = 0
# Find most recent snapshot.
for file in os.listdir(snapshot_dir):
if file.endswith(".solverstate"):
basename = os.path.splitext(file)[0]
iter = int(basename.split("{}_iter_".format(model_name))[1])
if iter > max_iter:
max_iter = iter

train_src_param = '--weights="{}" \\\n'.format(pretrain_model)
if resume_training:
if max_iter > 0:
train_src_param = '--snapshot="{}_iter_{}.solverstate" \\\n'.format(snapshot_prefix, max_iter)

if remove_old_models:
# Remove any snapshots smaller than max_iter.
for file in os.listdir(snapshot_dir):
if file.endswith(".solverstate"):
basename = os.path.splitext(file)[0]
iter = int(basename.split("{}_iter_".format(model_name))[1])
if max_iter > iter:
os.remove("{}/{}".format(snapshot_dir, file))
if file.endswith(".caffemodel"):
basename = os.path.splitext(file)[0]
iter = int(basename.split("{}_iter_".format(model_name))[1])
if max_iter > iter:
os.remove("{}/{}".format(snapshot_dir, file))

# Create job file.
with open(job_file, 'w') as f:
f.write('cd {}\n'.format(caffe_root))
f.write('./build/tools/caffe train \\\n')
f.write('--solver="{}" \\\n'.format(solver_file))
f.write(train_src_param)
if solver_param['solver_mode'] == P.Solver.GPU:
f.write('--gpu {} 2>&1 | tee {}/{}.log\n'.format(gpus, job_dir, model_name)) #日志文件保存的路径
else:
f.write('2>&1 | tee {}/{}.log\n'.format(job_dir, model_name))

# Copy the python script to job_dir.
py_file = os.path.abspath(__file__)
shutil.copy(py_file, job_dir)

# Run the job.
os.chmod(job_file, stat.S_IRWXU)
if run_soon:
subprocess.call(job_file, shell=True)

修改完成之后使用命令:python ssd_pascal_INRIA.py

六、模型测试

参考博主博文（配置caffe-ssd）：http://blog.csdn.net/xunan003/article/details/78427446 中所述的在图片集中测试或者在视屏中测试的方法，只需要在相应的文件中修改我们的模型路径。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航