第4门课程-卷积神经网络-第二周作业2(基于残差网络的手势分类)
2018-01-07 17:55
751 查看
0- 背景
本文介绍基于残差网络的深层卷积神经网络,Residual Networks(ResNets)。从理论上分析,神经网络层数越多,可以表示更复杂的模型函数。CNN能够提取low/mid/high-level的特征,网络的层数越多,意味着能够提取到不同level的特征越丰富。并且,越深的网络提取的特征越抽象,越具有语义信息。
但是在实际训练中,深层的神经网络很难训练。单纯增加网络层数,会导致梯度弥散或梯度爆炸(都是指反向传播过程)。这使得梯度下降越来越慢。
在训练集上的准确率先上升,后饱和随后甚至下降。
我们从下图可以看出,随着迭代次数的增加,前几层的神经网络,梯度幅值很快下降接近0。
Figure 1 : Vanishing gradient
The speed of learning decreases very rapidly for the early layers as the network trains
残差的思想都是去掉相同的主体部分,从而突出微小的变化,有点儿类似差分放大器。
1 导入依赖包:
import numpy as np import tensorflow as tf from keras import layers from keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D from keras.models import Model, load_model from keras.preprocessing import image from keras.utils import layer_utils from keras.utils.data_utils import get_file from keras.applications.imagenet_utils import preprocess_input import pydot from IPython.display import SVG from keras.utils.vis_utils import model_to_dot from keras.utils import plot_model from resnets_utils import * from keras.initializers import glorot_uniform import scipy.misc from matplotlib.pyplot import imshow %matplotlib inline import keras.backend as K K.set_image_data_format('channels_last') K.set_learning_phase(1)
2- 创建Residual Network
残差网络中,前向传播过程,后层的网络可以接收到前层网络的直接输入作为该后层的激活函数输入的一部分;反向传播过程,则后向层网络梯度可以通过跨越的形式直接传播到前层网络。通过对残差网络模块的堆叠可以构建深层的网络模型。
因为这种残差模块的存在使得该模型更容易学习到identity function(恒等函数)。这样堆叠形成的网络模块对于训练性能的影响微乎其微。
根据输入和输出的维度是否相同残差网络可以分为两类模块,恒等模块和卷积模块。
2.1 - identity block
第一种输入和输出维度相同:即输入a[l]a[l]维度=输出a[l+2]a[l+2]的维度:
上面的弧线称为shortcut path,下面的称为main path。
注意,两者相加是在下一层的RELU之前。
其中的BatchNorm是为了加快训练速度。
上面这种”skips over” 2层,下面这种是”skips over” 3 layers:
main path中第一个模块:
第一个卷积层CONV2D ,F1F1 个filters,尺寸= (1,1) ,stride=(1,1),padding设置为”valid”,命名为
conv_name_base + '2a', seed=0用以参数的随机初始化。
第一个BatchNorm是沿着通道方向的归一化,命名为
bn_name_base + '2a'
ReLU激活函数是不需要命名的,且无超参数。
main path中的第二个模块:
第二个卷积层CONV2D ,filter个数=F2F2,filter shape= (f,f)(f,f) stride=(1,1),padding 设置为”same”,命名为
conv_name_base + '2b'。seed=0用以参数的随机初始化.
第二个BatchNorm 是沿着通道方向的归一化,命名为
bn_name_base + '2b'.
ReLU激活函数是不需要命名的,且无超参数。
main path的第三个模块:
第三个卷积层 CONV2D ,filter个数=F3F3,shape= (1,1) ,stride=(1,1)。 padding设置为”valid”。该层命名为
conv_name_base + '2c'。同样 seed=0用以参数的随机初始化。.
第三个BatchNorm 是沿着通道方向的归一化,命名为
bn_name_base + '2c'。注意,这之后是没有RELU激活函数的。
最后:
将shortcut 和input求和,并将求和的结果输入到ReLU激活函数中。该激活函数同样是无命名且无超参数。
具体实现如下:
# GRADED FUNCTION: identity_block def identity_block(X, f, filters, stage, block): """ Implementation of the identity block as defined in Figure 4 Arguments: X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev) f -- integer, specifying the shape of the middle CONV's window for the main path filters -- python list of integers, defining the number of filters in the CONV layers of the main path stage -- integer, used to name the layers, depending on their position in the network block -- string/character, used to name the layers, depending on their position in the network Returns: X -- output of the identity block, tensor of shape (n_H, n_W, n_C) """ # defining name basis conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' # Retrieve Filters F1, F2, F3 = filters # Save the input value. You'll need this later to add back to the main path. X_shortcut = X # First component of main path X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X) X = Activation('relu')(X) ### START CODE HERE ### # Second component of main path (≈3 lines) X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', name = conv_name_base + '2b', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis=3, name = bn_name_base + '2b')(X) X = Activation('relu')(X) # Third component of main path (≈2 lines) X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2c', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis=3, name = bn_name_base + '2c')(X) # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines) X = layers.add([X, X_shortcut]) X = Activation('relu')(X) ### END CODE HERE ### return X
恒等模块测试:
tf.reset_default_graph() with tf.Session() as test: np.random.seed(1) A_prev = tf.placeholder("float", [3, 4, 4, 6]) X = np.random.randn(3, 4, 4, 6) A = identity_block(A_prev, f = 2, filters = [2, 4, 6], stage = 1, block = 'a') test.run(tf.global_variables_initializer()) out = test.run([A], feed_dict={A_prev: X, K.learning_phase(): 0}) print("out = " + str(out[0][1][1][0]))
输出结果如下:
out = [ 0.94822985 0. 1.16101444 2.747859 0. 1.36677003]
2.2- convolutional block卷积模块
卷积模块是残差网络中除恒等模块之外的另一类。当输入和输出的维度不匹配的时候,可以采用该模块进行处理。卷积模块与 identity block不同在于 shortcut path多了一个卷积层。shortcut path上的CONV2D就是为了重塑输入xx的维度,以使得main path上可以加和操作。例如为了使长和宽都减半,我们可以使用1x1 convolution with a stride of 2的卷积层操作。shortcut path 上的CONV2D layer 不使用任何non-linear activation function。这是由于该层的功能在于学习一个线性函数以对于输入降维,以匹配输入。
main path的第一个成分:
第一个卷积层CONV2D, filter个数F1F1,shape =(1,1), stride= (s,s),padding设置为”valid”,命名为
conv_name_base + '2a'.
第一个BatchNorm 是沿着通道方向的归一化,命名为
bn_name_base + '2a'.
再输入到 ReLU 激活函数,无需命名且无超参数。
main path的第二个成分:
第二个卷积层 CONV2D , filter个数 F2F2,shape= (f,f) ,stride = (1,1), padding设置为”same”,命名为
conv_name_base + '2b'.
第二个BatchNorm 是沿着通道方向的归一化,命名为
bn_name_base + '2b'.
再输入到 ReLU 激活函数,无需命名且无超参数
main path的第三个成分:
- 第三个卷积层CONV2D,filter个数F3F3,shape= (1,1),stride= (1,1).,padding设置为”valid” ,命名为
conv_name_base + '2c'.
- 第三个BatchNorm是沿着通道方向的归一化,命名为
bn_name_base + '2c'。注意这之后没有接 ReLU激活函数。
Shortcut path:
该卷积层 CONV2D ,filter个数= F3F3,shape= (1,1) ,stride=(s,s), padding=”valid” ,命名为
conv_name_base + '1'.
BatchNorm是沿着通道方向的归一化,命名为
bn_name_base + '1'.
最后:
shortcut的结果与main path的值求和,再将和输入到ReLU激活函数。
具体代码如下:
# GRADED FUNCTION: convolutional_block def convolutional_block(X, f, filters, stage, block, s = 2): """ Implementation of the convolutional block as defined in Figure 4 Arguments: X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev) f -- integer, specifying the shape of the middle CONV's window for the main path filters -- python list of integers, defining the number of filters in the CONV layers of the main path stage -- integer, used to name the layers, depending on their position in the network block -- string/character, used to name the layers, depending on their position in the network s -- Integer, specifying the stride to be used Returns: X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C) """ # defining name basis conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' # Retrieve Filters F1, F2, F3 = filters # Save the input value X_shortcut = X ##### MAIN PATH ##### # First component of main path X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', padding='valid', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X) X = Activation('relu')(X) ### START CODE HERE ### # Second component of main path (≈3 lines) X = Conv2D(F2, (f, f), strides = (1, 1), name = conv_name_base + '2b',padding='same', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X) X = Activation('relu')(X) # Third component of main path (≈2 lines) X = Conv2D(F3, (1, 1), strides = (1, 1), name = conv_name_base + '2c',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X) ##### SHORTCUT PATH #### (≈2 lines) X_shortcut = Conv2D(F3, (1, 1), strides = (s, s), name = conv_name_base + '1',padding='valid', kernel_initializer = glorot_uniform(seed=0))(X_shortcut) X_shortcut = BatchNormalization(axis = 3, name = bn_name_base + '1')(X_shortcut) # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines) X = layers.add([X, X_shortcut]) X = Activation('relu')(X) ### END CODE HERE ### return X
卷积模块的测试:
tf.reset_default_graph() with tf.Session() as test: np.random.seed(1) A_prev = tf.placeholder("float", [3, 4, 4, 6]) X = np.random.randn(3, 4, 4, 6) A = convolutional_block(A_prev, f = 2, filters = [2, 4, 6], stage = 1, block = 'a') test.run(tf.global_variables_initializer()) out = test.run([A], feed_dict={A_prev: X, K.learning_phase(): 0}) print("out = " + str(out[0][1][1][0]))
测试结果如下:
out = [ 0.09018463 1.23489773 0.46822017 0.0367176 0. 0.65516603]
3- 创建完整的残差网络模型 (ResNet-50 model)
网络结构如下所示:ResNet-50 model的细节如下:
Zero-padding pads the input with a pad of (3,3)
Stage 1:
The 2D Convolution has 64 filters of shape (7,7) and uses a stride of (2,2). Its name is “conv1”.
BatchNorm is applied to the channels axis of the input.
MaxPooling uses a (3,3) window and a (2,2) stride.
Stage 2:
The convolutional block uses three set of filters of size [64,64,256], “f” is 3, “s” is 1 and the block is “a”.
The 2 identity blocks use three set of filters of size [64,64,256], “f” is 3 and the blocks are “b” and “c”.
Stage 3:
The convolutional block uses three set of filters of size [128,128,512], “f” is 3, “s” is 2 and the block is “a”.
The 3 identity blocks use three set of filters of size [128,128,512], “f” is 3 and the blocks are “b”, “c” and “d”.
Stage 4:
The convolutional block uses three set of filters of size [256, 256, 1024], “f” is 3, “s” is 2 and the block is “a”.
The 5 identity blocks use three set of filters of size [256, 256, 1024], “f” is 3 and the blocks are “b”, “c”, “d”, “e” and “f”.
Stage 5:
The convolutional block uses three set of filters of size [512, 512, 2048], “f” is 3, “s” is 2 and the block is “a”.
The 2 identity blocks use three set of filters of size [256, 256, 2048], “f” is 3 and the blocks are “b” and “c”.
The 2D Average Pooling uses a window of shape (2,2) and its name is “avg_pool”.
The flatten doesn’t have any hyperparameters or name.
The Fully Connected (Dense) layer reduces its input to the number of classes using a softmax activation. Its name should be
'fc' + str(classes).
具体代码如下:
# GRADED FUNCTION: ResNet50 def ResNet50(input_shape = (64, 64, 3), classes = 6): """ Implementation of the popular ResNet50 the following architecture: CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3 -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER Arguments: input_shape -- shape of the images of the dataset classes -- integer, number of classes Returns: model -- a Model() instance in Keras """ # Define the input as a tensor with shape input_shape X_input = Input(input_shape) # Zero-Padding X = ZeroPadding2D((3, 3))(X_input) # Stage 1 X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = 'bn_conv1')(X) X = Activation('relu')(X) X = MaxPooling2D((3, 3), strides=(2, 2))(X) # Stage 2 X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1) X = identity_block(X, 3, [64, 64, 256], stage=2, block='b') X = identity_block(X, 3, [64, 64, 256], stage=2, block='c') ### START CODE HERE ### # Stage 3 (≈4 lines) # The convolutional block uses three set of filters of size [128,128,512], "f" is 3, "s" is 2 and the block is "a". # The 3 identity blocks use three set of filters of size [128,128,512], "f" is 3 and the blocks are "b", "c" and "d". X = convolutional_block(X, f = 3, filters=[128,128,512], stage = 3, block='a', s = 2) X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='b') X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='c') X = identity_block(X, f = 3, filters=[128,128,512], stage= 3, block='d') # Stage 4 (≈6 lines) # The convolutional block uses three set of filters of size [256, 256, 1024], "f" is 3, "s" is 2 and the block is "a". # The 5 identity blocks use three set of filters of size [256, 256, 1024], "f" is 3 and the blocks are "b", "c", "d", "e" and "f". X = convolutional_block(X, f = 3, filters=[256, 256, 1024], block='a', stage=4, s = 2) X = identity_block(X, f = 3, filters=[256, 256, 1024], block='b', stage=4) X = identity_block(X, f = 3, filters=[256, 256, 1024], block='c', stage=4) X = identity_block(X, f = 3, filters=[256, 256, 1024], block='d', stage=4) X = identity_block(X, f = 3, filters=[256, 256, 1024], block='e', stage=4) X = identity_block(X, f = 3, filters=[256, 256, 1024], block='f', stage=4) # Stage 5 (≈3 lines) # The convolutional block uses three set of filters of size [512, 512, 2048], "f" is 3, "s" is 2 and the block is "a". # The 2 identity blocks use three set of filters of size [256, 256, 2048], "f" is 3 and the blocks are "b" and "c". X = convolutional_block(X, f = 3, filters=[512, 512, 2048], stage=5, block='a', s = 2) # filters should be [256, 256, 2048], but it fail to be graded. Use [512, 512, 2048] to pass the grading X = identity_block(X, f = 3, filters=[256, 256, 2048], stage=5, block='b') X = identity_block(X, f = 3, filters=[256, 256, 2048], stage=5, block='c') # AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)" # The 2D Average Pooling uses a window of shape (2,2) and its name is "avg_pool". X = AveragePooling2D(pool_size=(2,2))(X) ### END CODE HERE ### # output layer X = Flatten()(X) X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = glorot_uniform(seed=0))(X) # Create model model = Model(inputs = X_input, outputs = X, name='ResNet50') return model
模型创建:
model = ResNet50(input_shape = (64, 64, 3), classes = 6)
模型编译:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
在训练之前,我们先看看训练数据:
训练数据加载:
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset() # Normalize image vectors X_train = X_train_orig/255. X_test = X_test_orig/255. # Convert training and test labels to one hot matrices Y_train = convert_to_one_hot(Y_train_orig, 6).T Y_test = convert_to_one_hot(Y_test_orig, 6).T print ("number of training examples = " + str(X_train.shape[0])) print ("number of test examples = " + str(X_test.shape[0])) print ("X_train shape: " + str(X_train.shape)) print ("Y_train shape: " + str(Y_train.shape)) print ("X_test shape: " + str(X_test.shape)) print ("Y_test shape: " + str(Y_test.shape))
运行结果:
number of training examples = 1080 number of test examples = 120 X_train shape: (1080, 64, 64, 3) Y_train shape: (1080, 6) X_test shape: (120, 64, 64, 3) Y_test shape: (120, 6)
模型训练:
model.fit(X_train, Y_train, epochs = 20, batch_size = 32)
可以尝试设置不同的epochs值,进行结果的比对。比如当epochs=2时,模型在测试集上的表现:
preds = model.evaluate(X_test, Y_test) print ("Loss = " + str(preds[0])) print ("Test Accuracy = " + str(preds[1]))
运行结果如下:
120/120 [==============================] - 11s 93ms/step Loss = 1.98986601035 Test Accuracy = 0.166666666667
准确率挺低的。
如果epcohs=20,运行结果如下:
Epoch 1/20 1080/1080 [==============================] - 199s 185ms/step - loss: 2.7034 - acc: 0.2833 Epoch 2/20 1080/1080 [==============================] - 201s 186ms/step - loss: 1.6690 - acc: 0.4796 Epoch 3/20 1080/1080 [==============================] - 200s 185ms/step - loss: 2.0615 - acc: 0.4963 Epoch 4/20 1080/1080 [==============================] - 199s 184ms/step - loss: 1.5976 - acc: 0.5741 Epoch 5/20 1080/1080 [==============================] - 2012s 2s/step - loss: 1.4613 - acc: 0.5991 Epoch 6/20 1080/1080 [==============================] - 209s 193ms/step - loss: 1.8959 - acc: 0.5333 Epoch 7/20 1080/1080 [==============================] - 234s 216ms/step - loss: 1.7408 - acc: 0.5602 Epoch 8/20 1080/1080 [==============================] - 234s 217ms/step - loss: 1.3857 - acc: 0.6269 Epoch 9/20 1080/1080 [==============================] - 249s 231ms/step - loss: 0.9670 - acc: 0.7111 Epoch 10/20 1080/1080 [==============================] - 252s 234ms/step - loss: 1.0224 - acc: 0.7546 Epoch 11/20 1080/1080 [==============================] - 245s 227ms/step - loss: 1.1032 - acc: 0.6907 Epoch 12/20 1080/1080 [==============================] - 255s 237ms/step - loss: 1.1375 - acc: 0.6926 Epoch 13/20 1080/1080 [==============================] - 265s 245ms/step - loss: 1.8522 - acc: 0.5130 Epoch 14/20 1080/1080 [==============================] - 266s 246ms/step - loss: 1.3047 - acc: 0.6167 Epoch 15/20 1080/1080 [==============================] - 253s 235ms/step - loss: 1.0577 - acc: 0.6565 Epoch 16/20 1080/1080 [==============================] - 245s 227ms/step - loss: 0.7025 - acc: 0.8009 Epoch 17/20 1080/1080 [==============================] - 238s 220ms/step - loss: 1.0304 - acc: 0.7380 Epoch 18/20 1080/1080 [==============================] - 238s 220ms/step - loss: 1.1430 - acc: 0.7241 Epoch 19/20 1080/1080 [==============================] - 246s 228ms/step - loss: 0.6503 - acc: 0.8037 Epoch 20/20 1080/1080 [==============================] - 244s 226ms/step - loss: 0.6583 - acc: 0.8417
准确率明显提高了。
在测试集上的测试结果:
120/120 [==============================] - 7s 62ms/step Loss = 3.91959365209 Test Accuracy = 0.54166667064
测试结果的准确率也比之前高了。
另外,我们可以加载其他人已经训练好的模型进行测试:
model = load_model("F:/Jupyter_project/deeplearing_4/ResNets/datasets/train_signs.h5")#加载GPU训练的模型
再比如加载训练好的模型:
model = load_model('ResNet50.h5')
在测试集上进行测试:
preds = model.evaluate(X_test, Y_test) print ("Loss = " + str(preds[0])) print ("Test Accuracy = " + str(preds[1]))
测试结果:
120/120 [==============================] - 8s 66ms/step Loss = 0.5301783164342244 Test Accuracy = 0.8666666626930237
使用其他人训练好的模型,特别是从网上下载的,可能会出现各种意想不到的错误,慢慢排查原因吧。
4- 在真实数据上测试:
img_path = 'images/my_image.jpg' img = image.load_img(img_path, target_size=(64, 64)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x) print('Input image shape:', x.shape) my_image = scipy.misc.imread(img_path) imshow(my_image) print("class prediction vector [p(0), p(1), p(2), p(3), p(4), p(5)] = ") print(model.predict(x))
运行结果如下:
相关文章推荐
- 第4门课程-卷积神经网络-第一周作业2(基于卷积神经网络的手势分类)
- 第4门课程-卷积神经网络-第二周作业1-基于Keras的人脸表情分类
- Coursera Deep Learning 第四课 卷积神经网络 第二周 编程作业 残差神经网络 Residual Networks - v2
- 第4门课程-卷积神经网络-第四周作业(图像风格转换)
- 第4门课程-卷积神经网络-第一周作业
- 《神经网络和深度学习》之神经网络基础(第二周)课后作业——一个隐藏层的平面数据分类
- 第4门课程-卷积神经网络-第四周作业(人脸识别)
- 计算机视觉课程作业 基于词袋模型的图像分类算法
- 第4门课程-卷积神经网络-第三周作业(机器视觉中物体检测)
- 网络131黄宇倩--第二周作业
- Logistic Regression with a Neural Network mindset v4 课程一第二周编程作业
- 吴恩达Coursera深度学习课程 DeepLearning第一课第二周编程作业
- Coursera 深度学习 deep learning.ai 吴恩达 神经网络和深度学习 第一课 第二周 编程作业 Python Basics with Numpy
- 4.卷积神经网络-第二周 深度卷积网络:实例探究
- 作业——在线学习Android课程之第二周
- 基于自适应PSO 的RBF网络分类算法实现
- 吴恩达深度学习课程第一课第二周课程作业
- 卷积神经网络(CNN)学习算法之----基于LeNet网络的中文验证码识别
- Coursera_Stanford_ML_ex3_多标度分类和神经网络初步 作业记录
- 自然语言处理课程作业 中文文本情感分类