您的位置：首页 > 理论基础 > 计算机网络

第4门课程-卷积神经网络-第二周作业2(基于残差网络的手势分类)

2018-01-07 17:55 751 查看

0- 背景

本文介绍基于残差网络的深层卷积神经网络，Residual Networks(ResNets)。

从理论上分析，神经网络层数越多，可以表示更复杂的模型函数。CNN能够提取low/mid/high-level的特征，网络的层数越多，意味着能够提取到不同level的特征越丰富。并且，越深的网络提取的特征越抽象，越具有语义信息。

但是在实际训练中，深层的神经网络很难训练。单纯增加网络层数，会导致梯度弥散或梯度爆炸（都是指反向传播过程）。这使得梯度下降越来越慢。

在训练集上的准确率先上升，后饱和随后甚至下降。

我们从下图可以看出，随着迭代次数的增加，前几层的神经网络，梯度幅值很快下降接近0。

Figure 1 : Vanishing gradient
The speed of learning decreases very rapidly for the early layers as the network trains

残差的思想都是去掉相同的主体部分，从而突出微小的变化，有点儿类似差分放大器。

1 导入依赖包：

import numpy as np
import tensorflow as tf
from keras import layers
from keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from keras.models import Model, load_model
from keras.preprocessing import image
from keras.utils import layer_utils
from keras.utils.data_utils import get_file
from keras.applications.imagenet_utils import preprocess_input
import pydot
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
from keras.utils import plot_model
from resnets_utils import *
from keras.initializers import glorot_uniform
import scipy.misc
from matplotlib.pyplot import imshow
%matplotlib inline

import keras.backend as K
K.set_image_data_format('channels_last')
K.set_learning_phase(1)

2- 创建Residual Network

残差网络中，前向传播过程，后层的网络可以接收到前层网络的直接输入作为该后层的激活函数输入的一部分；反向传播过程，则后向层网络梯度可以通过跨越的形式直接传播到前层网络。

通过对残差网络模块的堆叠可以构建深层的网络模型。

因为这种残差模块的存在使得该模型更容易学习到identity function（恒等函数）。这样堆叠形成的网络模块对于训练性能的影响微乎其微。

根据输入和输出的维度是否相同残差网络可以分为两类模块，恒等模块和卷积模块。

2.1 - identity block

第一种输入和输出维度相同：

即输入a[l]a[l]维度=输出a[l+2]a[l+2]的维度：

上面的弧线称为shortcut path，下面的称为main path。

注意，两者相加是在下一层的RELU之前。

其中的BatchNorm是为了加快训练速度。

上面这种”skips over” 2层，下面这种是”skips over” 3 layers：

main path中第一个模块:

第一个卷积层CONV2D ，F1F1 个filters，尺寸= (1,1) ，stride=(1,1)，padding设置为”valid”，命名为

conv_name_base + '2a'

, seed=0用以参数的随机初始化。

第一个BatchNorm是沿着通道方向的归一化，命名为

bn_name_base + '2a'

ReLU激活函数是不需要命名的，且无超参数。

main path中的第二个模块:

第二个卷积层CONV2D ，filter个数=F2F2，filter shape= (f,f)(f,f) stride=(1,1)，padding 设置为”same”，命名为

conv_name_base + '2b'

。seed=0用以参数的随机初始化.

第二个BatchNorm 是沿着通道方向的归一化，命名为

bn_name_base + '2b'

.

ReLU激活函数是不需要命名的，且无超参数。

main path的第三个模块:

第三个卷积层 CONV2D ，filter个数=F3F3，shape= (1,1) ，stride=(1,1)。 padding设置为”valid”。该层命名为

conv_name_base + '2c'

。同样 seed=0用以参数的随机初始化。.

第三个BatchNorm 是沿着通道方向的归一化，命名为

bn_name_base + '2c'

。注意，这之后是没有RELU激活函数的。

最后:

将shortcut 和input求和，并将求和的结果输入到ReLU激活函数中。该激活函数同样是无命名且无超参数。

具体实现如下：

# GRADED FUNCTION: identity_block

def identity_block(X, f, filters, stage, block):
"""
Implementation of the identity block as defined in Figure 4

Arguments:
X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
f -- integer, specifying the shape of the middle CONV's window for the main path
filters -- python list of integers, defining the number of filters in the CONV layers of the main path
stage -- integer, used to name the layers, depending on their position in the network
block -- string/character, used to name the layers, depending on their position in the network

Returns:
X -- output of the identity block, tensor of shape (n_H, n_W, n_C)
"""

# defining name basis
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'

# Retrieve Filters
F1, F2, F3 = filters

# Save the input value. You'll need this later to add back to the main path.
X_shortcut = X

# First component of main path
X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
X = Activation('relu')(X)

### START CODE HERE ###

# Second component of main path (≈3 lines)
X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', name = conv_name_base + '2b', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name = bn_name_base + '2b')(X)
X = Activation('relu')(X)

# Third component of main path (≈2 lines)
X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2c', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name = bn_name_base + '2c')(X)

# Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
X = layers.add([X, X_shortcut])
X = Activation('relu')(X)

### END CODE HERE ###

return X

恒等模块测试：

tf.reset_default_graph()

with tf.Session() as test:
np.random.seed(1)
A_prev = tf.placeholder("float", [3, 4, 4, 6])
X = np.random.randn(3, 4, 4, 6)
A = identity_block(A_prev, f = 2, filters = [2, 4, 6], stage = 1, block = 'a')
test.run(tf.global_variables_initializer())
out = test.run([A], feed_dict={A_prev: X, K.learning_phase(): 0})
print("out = " + str(out[0][1][1][0]))

输出结果如下：

out = [ 0.94822985  0.          1.16101444  2.747859    0.          1.36677003]

2.2- convolutional block卷积模块

卷积模块是残差网络中除恒等模块之外的另一类。当输入和输出的维度不匹配的时候，可以采用该模块进行处理。卷积模块与 identity block不同在于 shortcut path多了一个卷积层。

shortcut path上的CONV2D就是为了重塑输入xx的维度，以使得main path上可以加和操作。例如为了使长和宽都减半，我们可以使用1x1 convolution with a stride of 2的卷积层操作。shortcut path 上的CONV2D layer 不使用任何non-linear activation function。这是由于该层的功能在于学习一个线性函数以对于输入降维，以匹配输入。

main path的第一个成分:

第一个卷积层CONV2D， filter个数F1F1，shape =(1,1)， stride= (s,s)，padding设置为”valid”，命名为

conv_name_base + '2a'

.

第一个BatchNorm 是沿着通道方向的归一化，命名为

bn_name_base + '2a'

.

再输入到 ReLU 激活函数，无需命名且无超参数。

main path的第二个成分:

第二个卷积层 CONV2D ， filter个数 F2F2，shape= (f,f) ，stride = (1,1)， padding设置为”same”，命名为

conv_name_base + '2b'

.

第二个BatchNorm 是沿着通道方向的归一化，命名为

bn_name_base + '2b'

.

再输入到 ReLU 激活函数，无需命名且无超参数

main path的第三个成分:

- 第三个卷积层CONV2D，filter个数F3F3，shape= (1,1)，stride= (1,1).，padding设置为”valid” ，命名为

conv_name_base + '2c'

.

- 第三个BatchNorm是沿着通道方向的归一化，命名为

bn_name_base + '2c'

。注意这之后没有接 ReLU激活函数。

Shortcut path:

该卷积层 CONV2D ，filter个数= F3F3，shape= (1,1) ，stride=(s,s)， padding=”valid” ，命名为

conv_name_base + '1'

.

BatchNorm是沿着通道方向的归一化，命名为

bn_name_base + '1'

.

最后:

shortcut的结果与main path的值求和，再将和输入到ReLU激活函数。

具体代码如下：

# GRADED FUNCTION: convolutional_block

def convolutional_block(X, f, filters, stage, block, s = 2):
"""
Implementation of the convolutional block as defined in Figure 4

Arguments:
X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
f -- integer, specifying the shape of the middle CONV's window for the main path
filters -- python list of integers, defining the number of filters in the CONV layers of the main path
stage -- integer, used to name the layers, depending on their position in the network
block -- string/character, used to name the layers, depending on their position in the network
s -- Integer, specifying the stride to be used

Returns:
X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C)
"""

# defining name basis
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'

# Retrieve Filters
F1, F2, F3 = filters

# Save the input value
X_shortcut = X

##### MAIN PATH #####
# First component of main path
X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', padding='valid', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
X = Activation('relu')(X)

### START CODE HERE ###

# Second component of main path (≈3 lines)
X = Conv2D(F2, (f, f), strides = (1, 1), name = conv_name_base + '2b',padding='same', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
X = Activation('relu')(X)

# Third component of main path (≈2 lines)
X = Conv2D(F3, (1, 1), strides = (1, 1), name = conv_name_base + '2c',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X)

##### SHORTCUT PATH #### (≈2 lines)
X_shortcut = Conv2D(F3, (1, 1), strides = (s, s), name = conv_name_base + '1',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
X_shortcut = BatchNormalization(axis = 3, name = bn_name_base + '1')(X_shortcut)

# Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
X = layers.add([X, X_shortcut])
X = Activation('relu')(X)

### END CODE HERE ###

return X

卷积模块的测试：

tf.reset_default_graph()

with tf.Session() as test:
np.random.seed(1)
A_prev = tf.placeholder("float", [3, 4, 4, 6])
X = np.random.randn(3, 4, 4, 6)
A = convolutional_block(A_prev, f = 2, filters = [2, 4, 6], stage = 1, block = 'a')
test.run(tf.global_variables_initializer())
out = test.run([A], feed_dict={A_prev: X, K.learning_phase(): 0})
print("out = " + str(out[0][1][1][0]))

测试结果如下：

out = [ 0.09018463  1.23489773  0.46822017  0.0367176   0.          0.65516603]

3- 创建完整的残差网络模型（ResNet-50 model）

网络结构如下所示：

ResNet-50 model的细节如下：

Zero-padding pads the input with a pad of (3,3)

Stage 1:

The 2D Convolution has 64 filters of shape (7,7) and uses a stride of (2,2). Its name is “conv1”.

BatchNorm is applied to the channels axis of the input.

MaxPooling uses a (3,3) window and a (2,2) stride.

Stage 2:

The convolutional block uses three set of filters of size [64,64,256], “f” is 3, “s” is 1 and the block is “a”.

The 2 identity blocks use three set of filters of size [64,64,256], “f” is 3 and the blocks are “b” and “c”.

Stage 3:

The convolutional block uses three set of filters of size [128,128,512], “f” is 3, “s” is 2 and the block is “a”.

The 3 identity blocks use three set of filters of size [128,128,512], “f” is 3 and the blocks are “b”, “c” and “d”.

Stage 4:

The convolutional block uses three set of filters of size [256, 256, 1024], “f” is 3, “s” is 2 and the block is “a”.

The 5 identity blocks use three set of filters of size [256, 256, 1024], “f” is 3 and the blocks are “b”, “c”, “d”, “e” and “f”.

Stage 5:

The convolutional block uses three set of filters of size [512, 512, 2048], “f” is 3, “s” is 2 and the block is “a”.

The 2 identity blocks use three set of filters of size [256, 256, 2048], “f” is 3 and the blocks are “b” and “c”.

The 2D Average Pooling uses a window of shape (2,2) and its name is “avg_pool”.

The flatten doesn’t have any hyperparameters or name.

The Fully Connected (Dense) layer reduces its input to the number of classes using a softmax activation. Its name should be

'fc' + str(classes)

.

具体代码如下：

# GRADED FUNCTION: ResNet50

def ResNet50(input_shape = (64, 64, 3), classes = 6):
"""
Implementation of the popular ResNet50 the following architecture:
CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3
-> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER

Arguments:
input_shape -- shape of the images of the dataset
classes -- integer, number of classes

Returns:
model -- a Model() instance in Keras
"""

# Define the input as a tensor with shape input_shape
X_input = Input(input_shape)

# Zero-Padding
X = ZeroPadding2D((3, 3))(X_input)

# Stage 1
X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3, 3), strides=(2, 2))(X)

# Stage 2
X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1)
X = identity_block(X, 3, [64, 64, 256], stage=2, block='b')
X = identity_block(X, 3, [64, 64, 256], stage=2, block='c')

### START CODE HERE ###

# Stage 3 (≈4 lines)
# The convolutional block uses three set of filters of size [128,128,512], "f" is 3, "s" is 2 and the block is "a".
# The 3 identity blocks use three set of filters of size [128,128,512], "f" is 3 and the blocks are "b", "c" and "d".
X = convolutional_block(X, f = 3, filters=[128,128,512], stage = 3, block='a', s = 2)
X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='b')
X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='c')
X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='d')

# Stage 4 (≈6 lines)
# The convolutional block uses three set of filters of size [256, 256, 1024], "f" is 3, "s" is 2 and the block is "a".
# The 5 identity blocks use three set of filters of size [256, 256, 1024], "f" is 3 and the blocks are "b", "c", "d", "e" and "f".
X = convolutional_block(X, f = 3, filters=[256, 256, 1024], block='a', stage=4, s = 2)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='b', stage=4)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='c', stage=4)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='d', stage=4)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='e', stage=4)
X = identity_block(X, f = 3, filters=[256, 256, 1024], block='f', stage=4)

# Stage 5 (≈3 lines)
# The convolutional block uses three set of filters of size [512, 512, 2048], "f" is 3, "s" is 2 and the block is "a".
# The 2 identity blocks use three set of filters of size [256, 256, 2048], "f" is 3 and the blocks are "b" and "c".
X = convolutional_block(X, f = 3, filters=[512, 512, 2048], stage=5, block='a', s = 2)

# filters should be [256, 256, 2048], but it fail to be graded. Use [512, 512, 2048] to pass the grading
X = identity_block(X, f = 3, filters=[256, 256, 2048], stage=5, block='b')
X = identity_block(X, f = 3, filters=[256, 256, 2048], stage=5, block='c')

# AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)"
# The 2D Average Pooling uses a window of shape (2,2) and its name is "avg_pool".
X = AveragePooling2D(pool_size=(2,2))(X)

### END CODE HERE ###

# output layer
X = Flatten()(X)
X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = glorot_uniform(seed=0))(X)

# Create model
model = Model(inputs = X_input, outputs = X, name='ResNet50')

return model

模型创建：

model = ResNet50(input_shape = (64, 64, 3), classes = 6)

模型编译：

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

在训练之前，我们先看看训练数据：

训练数据加载：

X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

# Normalize image vectors
X_train = X_train_orig/255.
X_test = X_test_orig/255.

# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T

print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

运行结果：

number of training examples = 1080
number of test examples = 120
X_train shape: (1080, 64, 64, 3)
Y_train shape: (1080, 6)
X_test shape: (120, 64, 64, 3)
Y_test shape: (120, 6)

模型训练：

model.fit(X_train, Y_train, epochs = 20, batch_size = 32)

可以尝试设置不同的epochs值，进行结果的比对。比如当epochs=2时，模型在测试集上的表现：

preds = model.evaluate(X_test, Y_test)
print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))

运行结果如下：

120/120 [==============================] - 11s 93ms/step
Loss = 1.98986601035
Test Accuracy = 0.166666666667

准确率挺低的。

如果epcohs=20，运行结果如下：

Epoch 1/20
1080/1080 [==============================] - 199s 185ms/step - loss: 2.7034 - acc: 0.2833
Epoch 2/20
1080/1080 [==============================] - 201s 186ms/step - loss: 1.6690 - acc: 0.4796
Epoch 3/20
1080/1080 [==============================] - 200s 185ms/step - loss: 2.0615 - acc: 0.4963
Epoch 4/20
1080/1080 [==============================] - 199s 184ms/step - loss: 1.5976 - acc: 0.5741
Epoch 5/20
1080/1080 [==============================] - 2012s 2s/step - loss: 1.4613 - acc: 0.5991
Epoch 6/20
1080/1080 [==============================] - 209s 193ms/step - loss: 1.8959 - acc: 0.5333
Epoch 7/20
1080/1080 [==============================] - 234s 216ms/step - loss: 1.7408 - acc: 0.5602
Epoch 8/20
1080/1080 [==============================] - 234s 217ms/step - loss: 1.3857 - acc: 0.6269
Epoch 9/20
1080/1080 [==============================] - 249s 231ms/step - loss: 0.9670 - acc: 0.7111
Epoch 10/20
1080/1080 [==============================] - 252s 234ms/step - loss: 1.0224 - acc: 0.7546
Epoch 11/20
1080/1080 [==============================] - 245s 227ms/step - loss: 1.1032 - acc: 0.6907
Epoch 12/20
1080/1080 [==============================] - 255s 237ms/step - loss: 1.1375 - acc: 0.6926
Epoch 13/20
1080/1080 [==============================] - 265s 245ms/step - loss: 1.8522 - acc: 0.5130
Epoch 14/20
1080/1080 [==============================] - 266s 246ms/step - loss: 1.3047 - acc: 0.6167
Epoch 15/20
1080/1080 [==============================] - 253s 235ms/step - loss: 1.0577 - acc: 0.6565
Epoch 16/20
1080/1080 [==============================] - 245s 227ms/step - loss: 0.7025 - acc: 0.8009
Epoch 17/20
1080/1080 [==============================] - 238s 220ms/step - loss: 1.0304 - acc: 0.7380
Epoch 18/20
1080/1080 [==============================] - 238s 220ms/step - loss: 1.1430 - acc: 0.7241
Epoch 19/20
1080/1080 [==============================] - 246s 228ms/step - loss: 0.6503 - acc: 0.8037
Epoch 20/20
1080/1080 [==============================] - 244s 226ms/step - loss: 0.6583 - acc: 0.8417

准确率明显提高了。

在测试集上的测试结果：

120/120 [==============================] - 7s 62ms/step
Loss = 3.91959365209
Test Accuracy = 0.54166667064

测试结果的准确率也比之前高了。

另外，我们可以加载其他人已经训练好的模型进行测试：

model = load_model("F:/Jupyter_project/deeplearing_4/ResNets/datasets/train_signs.h5")#加载GPU训练的模型

再比如加载训练好的模型：

model = load_model('ResNet50.h5')

在测试集上进行测试：

preds = model.evaluate(X_test, Y_test)
print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))

测试结果：

120/120 [==============================] - 8s 66ms/step
Loss = 0.5301783164342244
Test Accuracy = 0.8666666626930237

使用其他人训练好的模型，特别是从网上下载的，可能会出现各种意想不到的错误，慢慢排查原因吧。

4- 在真实数据上测试：

img_path = 'images/my_image.jpg'
img = image.load_img(img_path, target_size=(64, 64))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print('Input image shape:', x.shape)
my_image = scipy.misc.imread(img_path)
imshow(my_image)
print("class prediction vector [p(0), p(1), p(2), p(3), p(4), p(5)] = ")
print(model.predict(x))

运行结果如下：

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航