您的位置:首页 > Web前端

Caffe官方教程翻译(7):Fine-tuning for Style Recognition

2018-03-05 19:35 567 查看

前言

最近打算重新跟着官方教程学习一下caffe,顺便也自己翻译了一下官方的文档。自己也做了一些标注,都用斜体标记出来了。中间可能额外还加了自己遇到的问题或是运行结果之类的。欢迎交流指正,拒绝喷子!

官方教程的原文链接:http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/02-fine-tuning.ipynb

Fine-tuning a Pretrained Network for Style Recognition

在这个例子中,我们会一起探索一种在现实世界的应用中比较常用的方法:使用一个预训练模型,并使用用户自定义的数据集来微调网络的参数。

这个方法的优点就是,既然预训练的网络已经事先使用很大的数据集训练好了,那么网络的中间层可以获取大多数视觉上的语义(“semantics”)信息。有关这里说到的“语义”(“semantics”)这个词,我们把它当一个黑盒子来看,就将它想象成一个非常强大且通用的视觉特征就行。更重要的是,我们只需要一个相对较小的数据集就足够在目标任务上取得不错的结果。

首先,我们需要准备好数据集。包括以下几个步骤:(1)通过提供的shell脚本来获取ImageNet ilsvrc的预训练模型。(2)从整个Flickr style数据集中下载一个子数据集。(3)编译下载好的Flickr数据集成Caffe可以使用的格式。

# 指定caffe路径
caffe_root = '/home/xhb/caffe/caffe/'  # this file should be run from {caffe_root}/examples (otherwise change this line)

# 导入caffe
import sys
sys.path.insert(0, caffe_root + 'python')
import caffe

# 备注:我是在笔记本上用CPU跑的,所以改成了CPU模式
# caffe.set_device(0)
# caffe.set_mode_gpu()
caffe.set_mode_cpu()

import numpy as np
from pylab import *
%matplotlib inline
import tempfile

# 对图像格式等做一下处理
# Helper function for deprocessing preprocessed images, e.g., for display.
def deprocess_net_image(image):
image = image.copy()              # don't modify destructively
image = image[::-1]               # BGR -> RGB
image = image.transpose(1, 2, 0)  # CHW -> HWC
image += [123, 117, 104]          # (approximately) undo mean subtraction

# clamp values in [0, 255]
image[image < 0], image[image > 255] = 0, 255

# round and cast from float32 to uint8
image = np.round(image)
image = np.require(image, dtype=np.uint8)

return image


1.准备数据集

下载此例子需要的数据集。

get_ilsvrc_aux.sh
:下载ImageNet数据集的均值文件,标签文件等。

download_model_binary.py
:下载预训练好的模型。

finetune_flickr_style/assemble_data.py
:下载用于图像风格检测的训练和测试的数据集,后面就简称style数据集。

我们在下面的练习中,会从整个数据集中下载一个较小的子数据集:8万张图片中只下载2000张,从20个风格类别中只下载5种类别。(如果想要下载完整数据集,修改下面对应的代码成
full_dataset=True
即可。)

# Download just a small subset of the data for this exercise.
# (2000 of 80K images, 5 of 20 labels.)
# To download the entire dataset, set `full_dataset = True`.
full_dataset = False
if full_dataset:
NUM_STYLE_IMAGES = NUM_STYLE_LABELS = -1
else:
NUM_STYLE_IMAGES = 2000
NUM_STYLE_LABELS = 5

# This downloads the ilsvrc auxiliary data (mean file, etc),
# and a subset of 2000 images for the style recognition task.
# 备注:下面这段代码我运行时将其注释了,因为我事先在命令行下运行过脚本,事先下载好了要用到的几个文件
'''
import os
os.chdir(caffe_root)  # run scripts from caffe root
!data/ilsvrc12/get_ilsvrc_aux.sh
!scripts/download_model_binary.py models/bvlc_reference_caffenet
!python examples/finetune_flickr_style/assemble_data.py \
--workers=-1  --seed=1701 \
--images=$NUM_STYLE_IMAGES  --label=$NUM_STYLE_LABELS
# back to examples
os.chdir('examples')
'''


"\nimport os\nos.chdir(caffe_root)  # run scripts from caffe root\n!data/ilsvrc12/get_ilsvrc_aux.sh\n!scripts/download_model_binary.py models/bvlc_reference_caffenet\n!python examples/finetune_flickr_style/assemble_data.py     --workers=-1  --seed=1701     --images=$NUM_STYLE_IMAGES  --label=$NUM_STYLE_LABELS\n# back to examples\nos.chdir('examples')\n"


定义
weights
,路径指向我之前下载的使用ImageNet数据集预训练好的权重,请确保这个文件要存在。

import os
weights = os.path.join(caffe_root, 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')
assert os.path.exists(weights)


ilsvrc12/synset_words.txt
导入1000个ImageNet的标签,并从
finetune_flickr_style/style_names.txt
导入5个style数据集的标签。

# Load ImageNet labels to imagenet_labels
imagenet_label_file = caffe_root + 'data/ilsvrc12/synset_words.txt'
imagenet_labels = list(np.loadtxt(imagenet_label_file, str, delimiter='\t'))
assert len(imagenet_labels) == 1000
print 'Loaded ImageNet labels:\n', '\n'.join(imagenet_labels[:10] + ['...'])

# Load style labels to style_labels
style_label_file = caffe_root + 'examples/finetune_flickr_style/style_names.txt'
style_labels = list(np.loadtxt(style_label_file, str, delimiter='\n'))
if NUM_STYLE_LABELS > 0:
style_labels = style_labels[:NUM_STYLE_LABELS]
print '\nLoaded style labels:\n', ', '.join(style_labels)


Loaded ImageNet labels:
n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
n01491361 tiger shark, Galeocerdo cuvieri
n01494475 hammerhead, hammerhead shark
n01496331 electric ray, crampfish, numbfish, torpedo
n01498041 stingray
n01514668 cock
n01514859 hen
n01518878 ostrich, Struthio camelus
...

Loaded style labels:
Detailed, Pastel, Melancholy, Noir, HDR


2.定义网络并运行

我们一开始先定义
caffenet()
函数,用来初始化CaffeNet结构(AlexNet的一个小的变种网络)。该函数使用参数来指定数据和输出类别的数量。

from caffe import layers as L
from caffe import params as P

weight_param = dict(lr_mult=1, decay_mult=1)
bias_param   = dict(lr_mult=2, decay_mult=0)
learned_param = [weight_param, bias_param]

frozen_param = [dict(lr_mult=0)] * 2

# 卷积层加ReLU单元
def conv_relu(bottom, ks, nout, stride=1, pad=0, group=1,
param=learned_param,
weight_filler=dict(type='gaussian', std=0.01),
bias_filler=dict(type='constant', value=0.1)):
conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
num_output=nout, pad=pad, group=group,
param=param, weight_filler=weight_filler,
bias_filler=bias_filler)
return conv, L.ReLU(conv, in_place=True)

# 全连接层加ReLU单元
def fc_relu(bottom, nout, param=learned_param,
weight_filler=dict(type='gaussian', std=0.005),
bias_filler=dict(type='constant', value=0.1)):
fc = L.InnerProduct(bottom, num_output=nout, param=param,
weight_filler=weight_filler,
bias_filler=bias_filler)
return fc, L.ReLU(fc, in_place=True)

# 最大池化
def max_pool(bottom, ks, stride=1):
return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)

# caffenet网络
def caffenet(data, label=None, train=True, num_classes=1000,
classifier_name='fc8', learn_all=False):
"""Returns a NetSpec specifying CaffeNet, following the original proto text
specification (./models/bvlc_reference_caffenet/train_val.prototxt)."""
n = caffe.NetSpec()
# 按照套路来,一层一层接下去
n.data = data
param = learned_param if learn_all else frozen_param
n.conv1, n.relu1 = conv_relu(n.data, 11, 96, stride=4, param=param)
n.pool1 = max_pool(n.relu1, 3, stride=2)
n.norm1 = L.LRN(n.pool1, local_size=5, alpha=1e-4, beta=0.75)
n.conv2, n.relu2 = conv_relu(n.norm1, 5, 256, pad=2, group=2, param=param)
n.pool2 = max_pool(n.relu2, 3, stride=2)
n.norm2 = L.LRN(n.pool2, local_size=5, alpha=1e-4, beta=0.75)
n.conv3, n.relu3 = conv_relu(n.norm2, 3, 384, pad=1, param=param)
n.conv4, n.relu4 = conv_relu(n.relu3, 3, 384, pad=1, group=2, param=param)
n.conv5, n.relu5 = conv_relu(n.relu4, 3, 256, pad=1, group=2, param=param)
n.pool5 = max_pool(n.relu5, 3, stride=2)
n.fc6, n.relu6 = fc_relu(n.pool5, 4096, param=param)
# 训练集还要加上一个Dropout,测试集就不需要;加上Dropout,以防止过拟合
if train:
n.drop6 = fc7input = L.Dropout(n.relu6, in_place=True)
else:
fc7input = n.relu6
n.fc7, n.relu7 = fc_relu(fc7input, 4096, param=param)
# 训练集还要加上一个Dropout,测试集就不需要;加上Dropout,以防止过拟合
if train:
n.drop7 = fc8input = L.Dropout(n.relu7, in_place=True)
else:
fc8input = n.relu7
# always learn fc8 (param=learned_param)
fc8 = L.InnerProduct(fc8input, num_output=num_classes, param=learned_param)
# give fc8 the name specified by argument `classifier_name`
n.__setattr__(classifier_name, fc8)
# 如果不是训练模式,即测试模式,fc8接上一个softmax,输出置信率
if not train:
n.probs = L.Softmax(fc8)
# 如果给了label,建立loss和acc层,loss为损失函数,acc计算准确率
if label is not None:
n.label = label
n.loss = L.SoftmaxWithLoss(fc8, n.label)
n.acc = L.Accuracy(fc8, n.label)
# write the net to a temporary file and return its filename
with tempfile.NamedTemporaryFile(delete=False) as f:
f.write(str(n.to_proto()))
return f.name


现在,我们来建立一个CaffeNet,输入为没打标签的”dummy data”。这样子,我们可以从外部设置它的输入数据,也能看看它预测的ImageNet类别是哪个。

dummy_data = L.DummyData(shape=dict(dim=[1, 3, 227, 227]))
imagenet_net_filename = caffenet(data=dummy_data, train=False)
imagenet_net = caffe.Net(imagenet_net_filename, weights, caffe.TEST)


定义一个
style_net
函数,调用前面的
caffenet
函数,输入的数据为Flickr style数据集。

这个新的网络也有
CaffeNet
的结构,但是区别在输入和输出上:

输入是我们下载好的Flickr style数据集,使用
ImageData
层将其读入

输出是一个超过20个类的分布,而不是原始的ImageNet类别的1000个类

分类层由
fc8
被重命名或
fc8_flickr
,以告诉Caffe不要从ImageNet预训练模型中导入原始的
fc8


def style_net(train=True, learn_all=False, subset=None):
if subset is None:
subset = 'train' if train else 'test'
source = caffe_root + 'data/flickr_style/%s.txt' % subset
transform_param = dict(mirror=train, crop_size=227,
mean_file=caffe_root + 'data/ilsvrc12/imagenet_mean.binaryproto')
style_data, style_label = L.ImageData(
transform_param=transform_param, source=source,
batch_size=50, new_height=256, new_width=256, ntop=2)
return caffenet(data=style_data, label=style_label, train=train,
num_classes=NUM_STYLE_LABELS,
classifier_name='fc8_flickr',
learn_all=learn_all)


使用
style_net
函数来初始化
untrained_style_net
,结构也是CaffeNet,但是输入图像是来自style数据集,权重是来自ImageNet预训练模型。

untrained_style_net
调用
forward
函数来从style数据集获取一个batch。

untrained_style_net = caffe.Net(style_net(train=False, subset='train'),
weights, caffe.TEST)
untrained_style_net.forward()
style_data_batch = untrained_style_net.blobs['data'].data.copy()
style_label_batch = np.array(untrained_style_net.blobs['label'].data, dtype=np.int32)


从一个batch的50张图像中选取一整图像输入style net(前面使用
style_net()
函数定义的网络,后面都简称style net)。这里我们任意选一张图片,就选取一个batch中的第8张图片。将图片显示出来,然后跑一边ImageNet预训练模型
imagenet_net
,接着显示从1000个ImageNet类中预测的前5个结果。

下面,我们选了一张图片,这张图片中是有关海滩的,由于”sandbar”和”seashore”都是ImageNet-1000中的类别,所以网络在预测这张图片时预测结果还算合理。然而对于其他图片,预测结果就不怎么好了,有时由于网络没能检测到图片中的物体,也可能不是所有图片都包含ImageNet的100个类别中的物体。修改
batch_index
变量,它的默认值是8,也可改成0-49中的任意数值(因为一个batch就只有50个样本),来看看预测结果。(如果不想使用这个batch的50张图像,可以运行上面的cell重新导入一个新的batch到
style_net


def disp_preds(net, image, labels, k=5, name='ImageNet'):
input_blob = net.blobs['data']
net.blobs['data'].data[0, ...] = image
probs = net.forward(start='conv1')['probs'][0]
top_k = (-probs).argsort()[:k]
print 'top %d predicted %s labels =' % (k, name)
print '\n'.join('\t(%d) %5.2f%% %s' % (i+1, 100*probs[p], labels[p])
for i, p in enumerate(top_k))

def disp_imagenet_preds(net, image):
disp_preds(net, image, imagenet_labels, name='ImageNet')

def disp_style_preds(net, image):
disp_preds(net, image, style_labels, name='style')


batch_index = 8
image = style_data_batch[batch_index]
plt.imshow(deprocess_net_image(image))
print 'actual label =', style_labels[style_label_batch[batch_index]]


actual label = Melancholy




disp_imagenet_preds(imagenet_net, image)


top 5 predicted ImageNet labels =
(1) 69.89% n09421951 sandbar, sand bar
(2) 21.75% n09428293 seashore, coast, seacoast, sea-coast
(3)  3.22% n02894605 breakwater, groin, groyne, mole, bulwark, seawall, jetty
(4)  1.89% n04592741 wing
(5)  1.23% n09332890 lakeside, lakeshore


disp_style_preds(untrained_style_net, image)


top 5 predicted style labels =
(1) 20.00% Detailed
(2) 20.00% Pastel
(3) 20.00% Melancholy
(4) 20.00% Noir
(5) 20.00% HDR


因为这两个模型在
conv1
fc7
层之间使用的是相同的预训练权重,所以我们也可以在分类层变成与ImageNet预训练模型一样之前,验证在
fc7
上的激励函数输出。

diff = untrained_style_net.blobs['fc7'].data[0] - imagenet_net.blobs['fc7'].data[0]
error = (diff ** 2).sum()
assert error < 1e-8


删除
untrained_style_net
以节约内存。imagenet_net先放一放,后面还会用到。

del untrained_style_net


3.训练风格分类器

现在我们要创建一个函数
solver
来创建caffe的solver,我们可以用这个solver来训练网络。在这个函数中,我们将为各种用于训练网络、显示、”snapshotting”的各种参数设置初值。请参考注释理解各个参数的作用。你也可以试着自己修改一些参数,看看能不能取得更好的效果!

from caffe.proto import caffe_pb2

def solver(train_net_path, test_net_path=None, base_lr=0.001):
s = caffe_pb2.SolverParameter()

# Specify locations of the train and (maybe) test networks.
s.train_net = train_net_path
# 设置了与测试相关的参数:test_interval:每训练1000次测试一次;test_iter:测试中,每次迭代送入100个batch;
if test_net_path is not None:
s.test_net.append(test_net_path)
s.test_interval = 1000  # Test after every 1000 training iterations.
s.test_iter.append(100) # Test on 100 batches each time we test.

# The number of iterations over which to average the gradient.
# Effectively boosts the training batch size by the given factor, without
# affecting memory utilization.
s.iter_size = 1

# 最大迭代次数
s.max_iter = 100000     # # of times to update the net (training iterations)

# Solve using the stochastic gradient descent (SGD) algorithm.
# Other choices include 'Adam' and 'RMSProp'.
s.type = 'SGD'    # 迭代使用的优化算法:SGD——随机梯度下降法,也可以试试其他的算法比如:Adam、RMSProp

# Set the initial learning rate for SGD.
s.base_lr = base_lr    # SGD的初始学习率

# Set `lr_policy` to define how the learning rate changes during training.
# Here, we 'step' the learning rate by multiplying it by a factor `gamma`
# every `stepsize` iterations.
s.lr_policy = 'step'
s.gamma = 0.1
s.stepsize = 20000

# Set other SGD hyperparameters. Setting a non-zero `momentum` takes a
# weighted average of the current gradient and previous gradients to make
# learning more stable. L2 weight decay regularizes learning, to help prevent
# the model from overfitting.
s.momentum = 0.9
s.weight_decay = 5e-4

# Display the current training loss and accuracy every 1000 iterations.
s.display = 1000    # 每迭代1000次,会在终端打印信息,包括训练的loss值和准确率

# Snapshots are files used to store networks we've trained.  Here, we'll
# snapshot every 10K iterations -- ten times during training.
s.snapshot = 10000     # 每过10000次迭代,保存一次当前网络
s.snapshot_prefix = caffe_root + 'models/finetune_flickr_style/finetune_flickr_style'   # 保存网络的路径

# Train on the GPU.  Using the CPU to train large networks is very slow.
#     s.solver_mode = caffe_pb2.SolverParameter.GPU
s.solver_mode = caffe_pb2.SolverParameter.CPU     # 原本这里应该是GPU模式,我在笔记本上跑,所以换成了CPU模式

# Write the solver to a temporary file and return its filename.
# 写入临时文件
with tempfile.NamedTemporaryFile(delete=False) as f:
f.write(str(s))
return f.name


现在我们要调用上面定义好的solver来训练style网络的分类层。

不过,如果想要在命令行下调用solver来训练网络,也是可以的。指令如下:

build/tools/caffe train \ -solver models/finetune_flickr_style/solver.prototxt \ -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel \ -gpu 0


补充:如果不适用gpu模式不加
-gpu 0
就可以了。

但是我们在这个例子中使用python来训练网络。

首先,定义一个
run_solvers
函数,这个函数会在循环中获取
solvers
列表的每个元素,并一步一步迭代训练网络,同时记录每次迭代的损失值和准确率。最后会将训练好的权重保存到一个文件中。

def run_solvers(niter, solvers, disp_interval=10):
"""Run solvers for niter iterations,
returning the loss and accuracy recorded each iteration.
`solvers` is a list of (name, solver) tuples."""
blobs = ('loss', 'acc')
loss, acc = ({name: np.zeros(niter) for name, _ in solvers}
for _ in blobs)
for it in range(niter):
for name, s in solvers:
s.step(1)  # run a single SGD step in Caffe
loss[name][it], acc[name][it] = (s.net.blobs[b].data.copy()
for b in blobs)
if it % disp_interval == 0 or it + 1 == niter:
loss_disp = '; '.join('%s: loss=%.3f, acc=%2d%%' %
(n, loss
[it], np.round(100*acc
[it]))
for n, _ in solvers)
print '%3d) %s' % (it, loss_disp)
# Save the learned weights from both nets.
weight_dir = tempfile.mkdtemp()
weights = {}
for name, s in solvers:
filename = 'weights.%s.caffemodel' % name
weights[name] = os.path.join(weight_dir, filename)
s.net.save(weights[name])
return loss, acc, weights


接下来,运行创建好的solver来训练那个用于风格识别的网络。我们接下来会创建两个网络——一个(
style_solver
)的使用ImageNet预训练网络的权重初始化,另一个(
scratch_style_solver
)使用随机初始化的权重。

在训练过程中,我们会看到使用ImageNet预训练好的权重的网络,相比随机初始化权重的网络训练得更快,准确率也更高。

niter = 200  # number of iterations to train

# Reset style_solver as before.
style_solver_filename = solver(style_net(train=True))
style_solver = caffe.get_solver(style_solver_filename)
style_solver.net.copy_from(weights)

# For reference, we also create a solver that isn't initialized from
# the pretrained ImageNet weights.
scratch_style_solver_filename = solver(style_net(train=True))
scratch_style_solver = caffe.get_solver(scratch_style_solver_filename)

print 'Running solvers for %d iterations...' % niter
solvers = [('pretrained', style_solver),
('scratch', scratch_style_solver)]
loss, acc, weights = run_solvers(niter, solvers)
print 'Done.'

train_loss, scratch_train_loss = loss['pretrained'], loss['scratch']
train_acc, scratch_train_acc = acc['pretrained'], acc['scratch']
style_weights, scratch_style_weights = weights['pretrained'], weights['scratch']

# Delete solvers to save memory.
del style_solver, scratch_style_solver, solvers


Running solvers for 200 iterations...
0) pretrained: loss=1.609, acc= 0%; scratch: loss=1.609, acc= 0%
10) pretrained: loss=1.371, acc=46%; scratch: loss=1.625, acc=14%
20) pretrained: loss=1.082, acc=58%; scratch: loss=1.641, acc=12%
30) pretrained: loss=0.994, acc=58%; scratch: loss=1.612, acc=22%
40) pretrained: loss=0.893, acc=58%; scratch: loss=1.593, acc=24%
50) pretrained: loss=1.240, acc=52%; scratch: loss=1.611, acc=30%
60) pretrained: loss=1.096, acc=54%; scratch: loss=1.621, acc=16%
70) pretrained: loss=0.989, acc=50%; scratch: loss=1.591, acc=28%
80) pretrained: loss=0.962, acc=68%; scratch: loss=1.593, acc=34%
90) pretrained: loss=1.172, acc=56%; scratch: loss=1.606, acc=24%
100) pretrained: loss=0.849, acc=64%; scratch: loss=1.587, acc=30%
110) pretrained: loss=1.005, acc=52%; scratch: loss=1.587, acc=30%
120) pretrained: loss=0.870, acc=64%; scratch: loss=1.595, acc=24%
130) pretrained: loss=0.970, acc=62%; scratch: loss=1.590, acc=28%
140) pretrained: loss=0.908, acc=58%; scratch: loss=1.603, acc=18%
150) pretrained: loss=0.608, acc=76%; scratch: loss=1.614, acc=20%
160) pretrained: loss=0.816, acc=70%; scratch: loss=1.598, acc=26%
170) pretrained: loss=1.281, acc=52%; scratch: loss=1.622, acc=16%
180) pretrained: loss=0.870, acc=72%; scratch: loss=1.630, acc=12%
190) pretrained: loss=0.909, acc=66%; scratch: loss=1.609, acc=20%
199) pretrained: loss=1.086, acc=62%; scratch: loss=1.616, acc=18%
Done.


对比两个网络训练的loss和acc看看。可以看出使用ImageNet预训练网络的loss的下降速度非常快,而随机初始化权重的网络的训练速度则慢很多。

plot(np.vstack([train_loss, scratch_train_loss]).T)
xlabel('Iteration #')
ylabel('Loss')


Text(0,0.5,u'Loss')




plot(np.vstack([train_acc, scratch_train_acc]).T)
xlabel('Iteration #')
ylabel('Accuracy')


Text(0,0.5,u'Accuracy')




再来看看迭代200次后的测试准确率。要预测的只有5个类,那么平均下来的准确率应该是20%左右。我们肯定期望,两个网络取得的结果都超过随机状况下的20%的准确率,另外,我们还期望使用ImageNet预训练网络的结果远比随机初始化网络的结果好。让我们拭目以待吧!

def eval_style_net(weights, test_iters=10):
test_net = caffe.Net(style_net(train=False), weights, caffe.TEST)
accuracy = 0
for it in xrange(test_iters):
accuracy += test_net.forward()['acc']
accuracy /= test_iters
return test_net, accuracy


test_net, accuracy = eval_style_net(style_weights)
print 'Accuracy, trained from ImageNet initialization: %3.1f%%' % (100*accuracy, )
scratch_test_net, scratch_accuracy = eval_style_net(scratch_style_weights)
print 'Accuracy, trained from   random initialization: %3.1f%%' % (100*scratch_accuracy, )


Accuracy, trained from ImageNet initialization: 51.4%
Accuracy, trained from   random initialization: 23.6%


端到端微调风格网络

最后,我们再次训练前面的两个网络,就从刚才训练学习到的参数开始继续训练。这次唯一的区别是我们会以“端到端”的方式训练参数,即训练网络中所有的层,从起始的
conv1
层直接送入图像。我们将
learn_all=True
传递给前面定义的
style_net
函数,如此一来,在网络中会给所有的参数都乘上一个非0的
lr_mult
参数。在默认情况下,是
learn_all=False
,所有预训练层(
conv1
fc7
)的参数都被冻结了(
lr_mult=0
),我们训练的只有分类层
fc8_flickr


请注意,这两个网络开始训练时的准确率大致相当于之前训练结束时的准确率。为了更科学一些,我们还要使用与之相同的训练步骤,但结构不是端到端的,来确认我们的结果并不是因为训练了两倍的时间长度才取得更好的结果的。

end_to_end_net = style_net(train=True, learn_all=True)

# Set base_lr to 1e-3, the same as last time when learning only the classifier.
# You may want to play around with different values of this or other
# optimization parameters when fine-tuning.  For example, if learning diverges
# (e.g., the loss gets very large or goes to infinity/NaN), you should try
# decreasing base_lr (e.g., to 1e-4, then 1e-5, etc., until you find a value
# for which learning does not diverge).
base_lr = 0.001

style_solver_filename = solver(end_to_end_net, base_lr=base_lr)
style_solver = caffe.get_solver(style_solver_filename)
style_solver.net.copy_from(style_weights)

scratch_style_solver_filename = solver(end_to_end_net, base_lr=base_lr)
scratch_style_solver = caffe.get_solver(scratch_style_solver_filename)
scratch_style_solver.net.copy_from(scratch_style_weights)

print 'Running solvers for %d iterations...' % niter
solvers = [('pretrained, end-to-end', style_solver),
('scratch, end-to-end', scratch_style_solver)]
_, _, finetuned_weights = run_solvers(niter, solvers)
print 'Done.'

style_weights_ft = finetuned_weights['pretrained, end-to-end']
scratch_style_weights_ft = finetuned_weights['scratch, end-to-end']

# Delete solvers to save memory.
del style_solver, scratch_style_solver, solvers


Running solvers for 200 iterations...
0) pretrained, end-to-end: loss=0.851, acc=68%; scratch, end-to-end: loss=1.584, acc=28%
10) pretrained, end-to-end: loss=1.312, acc=56%; scratch, end-to-end: loss=1.637, acc=14%
20) pretrained, end-to-end: loss=0.802, acc=70%; scratch, end-to-end: loss=1.627, acc=16%
30) pretrained, end-to-end: loss=0.786, acc=66%; scratch, end-to-end: loss=1.595, acc=22%
40) pretrained, end-to-end: loss=0.748, acc=74%; scratch, end-to-end: loss=1.575, acc=24%
50) pretrained, end-to-end: loss=0.818, acc=72%; scratch, end-to-end: loss=1.595, acc=34%
60) pretrained, end-to-end: loss=0.773, acc=68%; scratch, end-to-end: loss=1.560, acc=26%
70) pretrained, end-to-end: loss=0.617, acc=84%; scratch, end-to-end: loss=1.540, acc=28%
80) pretrained, end-to-end: loss=0.561, acc=76%; scratch, end-to-end: loss=1.494, acc=46%
90) pretrained, end-to-end: loss=0.824, acc=62%; scratch, end-to-end: loss=1.521, acc=30%
100) pretrained, end-to-end: loss=0.624, acc=80%; scratch, end-to-end: loss=1.482, acc=30%
110) pretrained, end-to-end: loss=0.586, acc=76%; scratch, end-to-end: loss=1.566, acc=32%
120) pretrained, end-to-end: loss=0.633, acc=72%; scratch, end-to-end: loss=1.547, acc=26%
130) pretrained, end-to-end: loss=0.547, acc=82%; scratch, end-to-end: loss=1.458, acc=28%
140) pretrained, end-to-end: loss=0.431, acc=80%; scratch, end-to-end: loss=1.469, acc=28%
150) pretrained, end-to-end: loss=0.514, acc=78%; scratch, end-to-end: loss=1.508, acc=32%
160) pretrained, end-to-end: loss=0.475, acc=82%; scratch, end-to-end: loss=1.440, acc=28%
170) pretrained, end-to-end: loss=0.490, acc=78%; scratch, end-to-end: loss=1.554, acc=40%
180) pretrained, end-to-end: loss=0.449, acc=80%; scratch, end-to-end: loss=1.470, acc=32%
190) pretrained, end-to-end: loss=0.367, acc=84%; scratch, end-to-end: loss=1.463, acc=34%
199) pretrained, end-to-end: loss=0.492, acc=82%; scratch, end-to-end: loss=1.364, acc=52%
Done.


让我们现在测试一下端到端微调模型。由于网络中所有的层都参与到了训练当中,所以我们期望这次的结果会比之前只让分类层参与到训练中的网络取得更好的效果。

test_net, accuracy = eval_style_net(style_weights_ft)
print 'Accuracy, finetuned from ImageNet initialization: %3.1f%%' % (100*accuracy, )
scratch_test_net, scratch_accuracy = eval_style_net(scratch_style_weights_ft)
print 'Accuracy, finetuned from   random initialization: %3.1f%%' % (100*scratch_accuracy, )


Accuracy, finetuned from ImageNet initialization: 54.4%
Accuracy, finetuned from   random initialization: 40.2%


先看看输入的图片,和它在端到端模型中的预测结果。

plt.imshow(deprocess_net_image(image))
disp_style_preds(test_net, image)


top 5 predicted style labels =
(1) 87.82% Melancholy
(2)  6.10% Pastel
(3)  5.66% HDR
(4)  0.41% Detailed
(5)  0.01% Noir




喔!预测结果相比之前好了不少。但是请注意,这个图片是来自数据集的,所以网络在训练时就看过它的标签了。

接下来,我们从测试集中取出一张图片,看看端到端模型的预测结果如何。

batch_index = 1
image = test_net.blobs['data'].data[batch_index]
plt.imshow(deprocess_net_image(image))
print 'actual label =', style_labels[int(test_net.blobs['label'].data[batch_index])]


actual label = Pastel




disp_style_preds(test_net, image)


top 5 predicted style labels =
(1) 99.48% Pastel
(2)  0.47% Detailed
(3)  0.05% HDR
(4)  0.00% Melancholy
(5)  0.00% Noir


我们也可以看看这张图片在scratch网络中的预测结果。它也输出了正确的结果,尽管置信率较另一个(使用预训练权重的网络)更低。

disp_style_preds(scratch_test_net, image)


top 5 predicted style labels =
(1) 46.02% Pastel
(2) 23.50% Melancholy
(3) 16.43% Detailed
(4) 11.64% HDR
(5)  2.40% Noir


当然我们还可以看看在ImageNet模型上的预测结果:

disp_imagenet_preds(imagenet_net, image)


top 5 predicted ImageNet labels =
(1) 34.90% n07579787 plate
(2) 21.63% n04263257 soup bowl
(3) 17.75% n07875152 potpie
(4)  5.72% n07711569 mashed potato
(5)  5.27% n07584110 consomme
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: