您的位置：首页 > 理论基础 > 计算机网络

第4门课程-卷积神经网络-第一周作业2(基于卷积神经网络的手势分类)

2018-01-08 21:45 555 查看

0- 背景

本文介绍基于TensorFlow的卷积神经网络及其具体应用实例。

1-数据处理：

1-1数据导入：

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
import tensorflow as tf
from tensorflow.python.framework import ops
from cnn_utils import *

%matplotlib inline
np.random.seed(1)

# Loading the data (signs)
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

本文采用的数据集是 SIGNS dataset，这个数据集我们之前课程也是有使用过的。

1-2 数据查看：

# Example of a picture
index = 6
print("X_train_orig shape=",X_train_orig.shape)
print("Y_train_orig shape=",Y_train_orig.shape)
print("X_test_orig shape=",X_test_orig.shape)
print("Y_test_orig shape=",Y_test_orig.shape)
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

我们可以获取到数据的一些基本信息：

X_train_orig shape= (1080, 64, 64, 3)
Y_train_orig shape= (1, 1080)
X_test_orig shape= (120, 64, 64, 3)
Y_test_orig shape= (1, 120)
y = 2

1-3 数据预处理：

需要预先进行图像像素值的归一化和输出结果的onehot编码。

#像素值归一化
X_train = X_train_orig/255.
X_test = X_test_orig/255.
#需要将Y的维度变成(1080,6)和(120,6)这样onehot形式
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T
print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
conv_layers = {}

对于onehot编码进行如下补充：

#reshape的用法
print("reshape的用法")
Y=np.array([[[1,2,3]],[[4,5,6]]])
print(Y.reshape(-1))
Y=np.array([[1,2,3],[4,5,6]])
print(Y.reshape(3,-1))#自动算出另一个维度
print(Y.reshape(-1))
print(Y_train_orig[:, 6:10])
Y_train = convert_to_one_hot(Y_train_orig, 6).T
#在都不指定维度，只有一个-1的时候，直接拉成一维的

#eye的用法
print("eye的用法")
C=6
print (np.eye(C))

#构建一个onehot矩阵
print("onehot编码：")
Y1=np.array([[3,1,2,5,4,2],[2,1,2,3,5,4]])
print(Y1.reshape(-1))
Y = np.eye(C)[Y1.reshape(-1)]#后面跟的数组说明1偏移的位置,所以这些数字不能够超出矩阵的范围
#所以上述的表达式其实是在行方向延展了，在列方向置为1的位置由Y1.reshape(-1)这个list所提供，返回的Y值在行数上与Y1.reshape(-1)是一致的。
print (Y.shape)
print(Y)
#对比上述
Y = np.eye(C)[Y1.reshape(-1)].T
print (Y.shape)
print(Y)

运行结果如下：

reshape的用法
[1 2 3 4 5 6]
[[1 2]
[3 4]
[5 6]]
[1 2 3 4 5 6]
[[2 1 1 4]]
eye的用法
[[ 1.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  0.  0.]
[ 0.  0.  1.  0.  0.  0.]
[ 0.  0.  0.  1.  0.  0.]
[ 0.  0.  0.  0.  1.  0.]
[ 0.  0.  0.  0.  0.  1.]]
onehot编码：
[3 1 2 5 4 2 2 1 2 3 5 4]
(12, 6)
[[ 0.  0.  0.  1.  0.  0.]
[ 0.  1.  0.  0.  0.  0.]
[ 0.  0.  1.  0.  0.  0.]
[ 0.  0.  0.  0.  0.  1.]
[ 0.  0.  0.  0.  1.  0.]
[ 0.  0.  1.  0.  0.  0.]
[ 0.  0.  1.  0.  0.  0.]
[ 0.  1.  0.  0.  0.  0.]
[ 0.  0.  1.  0.  0.  0.]
[ 0.  0.  0.  1.  0.  0.]
[ 0.  0.  0.  0.  0.  1.]
[ 0.  0.  0.  0.  1.  0.]]
(6, 12)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 0.  1.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
[ 0.  0.  1.  0.  0.  1.  1.  0.  1.  0.  0.  0.]
[ 1.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
[ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
[ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  1.  0.]]

2-TensorFlow模型

2-1 创建占位符

TensorFlow中数据输入到模型需要通过占位符。由于样本数是不确定的，所以我们设置batch size为none，同时也是为了方便后续样本数的选择。因此X的维度=[None, n_H0, n_W0, n_C0] ，Y的维度=[None, n_y]

实现代码如下：

# GRADED FUNCTION: create_placeholders

def create_placeholders(n_H0, n_W0, n_C0, n_y):
"""
Creates the placeholders for the tensorflow session.

Arguments:
n_H0 -- scalar, height of an input image
n_W0 -- scalar, width of an input image
n_C0 -- scalar, number of channels of the input
n_y -- scalar, number of classes

Returns:
X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"
Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"
"""

### START CODE HERE ### (≈2 lines)
X = tf.placeholder(name='X', shape=(None, n_H0, n_W0, n_C0), dtype=tf.float32)
Y = tf.placeholder(name='Y', shape=(None, n_y), dtype=tf.float32)
### END CODE HERE ###

return X, Y

创建占位符测试代码：

X, Y = create_placeholders(64, 64, 3, 6)
print ("X = " + str(X))
print ("Y = " + str(Y))

测试结果：

X = Tensor("X:0", shape=(?, 64, 64, 3), dtype=float32)
Y = Tensor("Y:0", shape=(?, 6), dtype=float32)

2-2 参数初始化：

这里的参数主要是指权重矩阵，即filter与图像做卷积操作的系数矩阵。

初始化代码如下：

# GRADED FUNCTION: initialize_parameters

def initialize_parameters():
"""
Initializes weight parameters to build a neural network with tensorflow. The shapes are:
W1 : [4, 4, 3, 8]
W2 : [2, 2, 8, 16]
Returns:
parameters -- a dictionary of tensors containing W1, W2
"""

tf.set_random_seed(1)                              # so that your "random" numbers match ours

### START CODE HERE ### (approx. 2 lines of code)
W1 = tf.get_variable(name='W1', dtype=tf.float32, shape=(4, 4, 3, 8), initializer=tf.contrib.layers.xavier_initializer(seed = 0))
W2 = tf.get_variable(name='W2', dtype=tf.float32, shape=(2, 2, 8, 16), initializer=tf.contrib.layers.xavier_initializer(seed = 0))
### END CODE HERE ###

parameters = {"W1": W1,
"W2": W2}

return parameters

测试参数初始化：

tf.reset_default_graph()
with tf.Session() as sess_test:
parameters = initialize_parameters()#参数初始化
init = tf.global_variables_initializer()
sess_test.run(init)
print("W1 = " + str(parameters["W1"].eval()[1,1,1]))
print("W2 = " + str(parameters["W2"].eval()[1,1,1]))

参数初始化结果：

W1 = [ 0.00131723  0.14176141 -0.04434952  0.09197326  0.14984085 -0.03514394
-0.06847463  0.05245192]
W2 = [-0.08566415  0.17750949  0.11974221  0.16773748 -0.0830943  -0.08058
-0.00577033 -0.14643836  0.24162132 -0.05857408 -0.19055021  0.1345228
-0.22779644 -0.1601823  -0.16117483 -0.10286498]

2-3 前向传播：

前向传播一般包括卷积层，池化层，激活函数。之后再将输出结果拉成一维向量，输入到全连接层。

各层使用的函数说明如下：

tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = ‘SAME’): 输入X ，filter为 W1，两者做卷积运算。第三个参数([1,f,f,1])表示每个维度 (m, n_H_prev, n_W_prev, n_C_prev)的stride的步径。完整的参考-> 参考

tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = ‘SAME’): 该函数对输入 A进行卷积操作，filter尺寸=(f, f)， strides size= (s, s)。池化层是对每个通道单独处理，所以输出结果在通道数上是不变的。参考资料

tf.nn.relu(Z1): 对Z1按元素进行ReLU操作。参考资料

tf.contrib.layers.flatten(P): 对于输入的P拉成1维的向量，方便后续接入全连接层。但是batch-size是保持不变的，所以输出的结果size=[batch_size, k]。参考资料

tf.contrib.layers.fully_connected(F, num_outputs): 将输入 F接入全连接层。该全连接层的参数会自动初始化，且在模型训练过程也会被学习到，所以我们在前面的参数初始化的时候，就没有对全连接层的参数进行初始化操作。参考资料

本文构建的前向传播如下：

CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

卷积层，池化层等层参数如下：

- Conv2D: stride 1, padding is "SAME"
- ReLU
- Max pool: Use an 8 by 8 filter size and an 8 by 8 stride, padding is "SAME"
- Conv2D: stride 1, padding is "SAME"
- ReLU
- Max pool: Use a 4 by 4 filter size and a 4 by 4 stride, padding is "SAME"
- Flatten the previous output.
- FULLYCONNECTED (FC) layer: 这里不需要采用非线性的激活函数，也无需调用softmax函数，因为该函数已经被整合到代价函数中。

前向传播代码如下：

# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
"""
Implements the forward propagation for the model:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "W2"
the shapes are given in initialize_parameters

Returns:
Z3 -- the output of the last LINEAR unit
"""

# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
W2 = parameters['W2']

### START CODE HERE ###
# CONV2D: stride of 1, padding 'SAME'
Z1 = tf.nn.conv2d(input=X, filter=W1, strides=[1, 1, 1, 1], padding='SAME')
# RELU
A1 = tf.nn.relu(Z1)
# MAXPOOL: window 8x8, sride 8, padding 'SAME'
P1 = tf.nn.max_pool(value=A1, ksize=[1, 8, 8, 1], strides=[1, 8, 8, 1], padding='SAME')
# CONV2D: filters W2, stride 1, padding 'SAME'
Z2 = tf.nn.conv2d(input=P1, filter=W2, strides=[1, 1, 1, 1], padding='SAME')
# RELU
A2 = tf.nn.relu(Z2)
# MAXPOOL: window 4x4, stride 4, padding 'SAME'
P2 = tf.nn.max_pool(value=A2, ksize=[1, 4, 4, 1], strides=[1, 4, 4, 1], padding='SAME')
# FLATTEN
P = tf.contrib.layers.flatten(P2)
# FULLY-CONNECTED without non-linear activation function (not not call softmax).
# 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None"
Z3 = tf.contrib.layers.fully_connected(P, 6, activation_fn=None)#没有激活函数
### END CODE HERE ###

return Z3

前向传播的测试代码：

tf.reset_default_graph()

with tf.Session() as sess:
np.random.seed(1)
X, Y = create_placeholders(64, 64, 3, 6)#n_H0, n_W0,initialize_parameters n_C0, n_y
parameters = initialize_parameters()#初始化W参数
Z3 = forward_propagation(X, parameters)
init = tf.global_variables_initializer()
sess.run(init)
a = sess.run(Z3, {X: np.random.randn(2,64,64,3), Y: np.random.randn(2,6)})
print("Z3 = " + str(a))

测试结果如下：

Z3 =    [[-0.44670227 -1.57208765 -1.53049231 -2.31013036 -1.29104376 0.46852064]
[-0.17601591 -1.57972014 -1.4737016 -2.61672091 -1.00810647 0.5747785 ]]

2-4 代价函数

tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y)

：

计算softmax entropy loss。该函数即计算了softmax 激活函数的输出，也计算了样本的损失值。

tf.reduce_mean

:

在tensor的某一维度上求均值，如果没有指定维度，则是计算全局的均值。采用该函数可以计算所有样本的整体损失值。

代价函数实现：

# GRADED FUNCTION: compute_cost

def compute_cost(Z3, Y):
"""
Computes the cost

Arguments:
Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
Y -- "true" labels vector placeholder, same shape as Z3

Returns:
cost - Tensor of the cost function
"""

### START CODE HERE ### (1 line of code)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Z3, labels=Y))#计算总的lost
### END CODE HERE ###

return cost

代价函数测试：

tf.reset_default_graph()

with tf.Session() as sess:
np.random.seed(1)
X, Y = create_placeholders(64, 64, 3, 6)
parameters = initialize_parameters()
Z3 = forward_propagation(X, parameters)
cost = compute_cost(Z3, Y)
init = tf.global_variables_initializer()
sess.run(init)
a = sess.run(cost, {X: np.random.randn(4,64,64,3), Y: np.random.randn(4,6)})
print("cost = " + str(a))

测试结果：

cost =  2.91034

3-模型

将上述的所有步骤进行整合：

创建placeholders

参数初始化

前向传播

计算代价

创建优化器

完整代码：

注意代码中的

random_mini_batches

是之前课程中的min batch方法。

# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009,
num_epochs = 100, minibatch_size = 64, print_cost = True):
"""
Implements a three-layer ConvNet in Tensorflow:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

Arguments:
X_train -- training set, of shape (None, 64, 64, 3)
Y_train -- test set, of shape (None, n_y = 6)
X_test -- training set, of shape (None, 64, 64, 3)
Y_test -- test set, of shape (None, n_y = 6)
learning_rate -- learning rate of the optimization
num_epochs -- number of epochs of the optimization loop
minibatch_size -- size of a minibatch
print_cost -- True to print the cost every 100 epochs

Returns:
train_accuracy -- real number, accuracy on the train set (X_train)
test_accuracy -- real number, testing accuracy on the test set (X_test)
parameters -- parameters learnt by the model. They can then be used to predict.
"""

ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables
tf.set_random_seed(1)                             # to keep results consistent (tensorflow seed)
seed = 3                                          # to keep results consistent (numpy seed)
(m, n_H0, n_W0, n_C0) = X_train.shape
n_y = Y_train.shape[1]
costs = []                                        # To keep track of the cost

# Create Placeholders of the correct shape
### START CODE HERE ### (1 line)
X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)
### END CODE HERE ###

# Initialize parameters
### START CODE HERE ### (1 line)
parameters = initialize_parameters()
### END CODE HERE ###

# Forward propagation: Build the forward propagation in the tensorflow graph
### START CODE HERE ### (1 line)
Z3 = forward_propagation(X, parameters)
### END CODE HERE ###

# Cost function: Add cost function to tensorflow graph
### START CODE HERE ### (1 line)
cost = compute_cost(Z3, Y)
### END CODE HERE ###

# Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.
### START CODE HERE ### (1 line)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
### END CODE HERE ###

# Initialize all the variables globally
init = tf.global_variables_initializer()

# Start the session to compute the tensorflow graph
with tf.Session() as sess:

# Run the initialization
sess.run(init)

# Do the training loop
for epoch in range(num_epochs):

minibatch_cost = 0.
num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)#这是之前课程中定义的

for minibatch in minibatches:

# Select a minibatch
(minibatch_X, minibatch_Y) = minibatch
# IMPORTANT: The line that runs the graph on a minibatch.
# Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y).
### START CODE HERE ### (1 line)
_ , temp_cost = sess.run([optimizer, cost], feed_dict={X:minibatch_X, Y:minibatch_Y})
### END CODE HERE ###

minibatch_cost += temp_cost / num_minibatches

# Print the cost every epoch
if print_cost == True and epoch % 5 == 0:
print ("Cost after epoch %i: %f" % (epoch, minibatch_cost))
if print_cost == True and epoch % 1 == 0:
costs.append(minibatch_cost)

# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()

# Calculate the correct predictions
predict_op = tf.argmax(Z3, 1)
correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))

# Calculate accuracy on the test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(accuracy)
train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
test_accuracy = accuracy.eval({X: X_test, Y: Y_test})
print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)

return train_accuracy, test_accuracy, parameters

模型调用：

_, _, parameters = model(X_train, Y_train, X_test, Y_test)

检查epoch 0 and 5的输出是否如下所示，如果不一致的话，再回上文检查代码。

Cost after epoch 0 =    1.917929
Cost after epoch 5 =    1.506757
Train Accuracy =    0.940741
Test Accuracy =     0.783333

拿真实数据进行测试：

fname = "images/thumbs_up.jpg"
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(64,64))
plt.imshow(my_image)

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 深度学习 deep-learning

相关文章推荐

新的分享

章节导航