您的位置:首页 > 理论基础 > 计算机网络

使用tensorflow搭建深层神经网络

2019-04-11 21:49 453 查看

6在吴恩达老师的《深度学习》第二课第三周的课程中,提及到了多种深度学习框架,包括caffe/caffe2,CNTK,DL4J,Keras,Lasagne,mxnet,paddlepadle,tensorflow,Theano,Torch等等,虽然Andrew说不特别推荐某种框架,但因其在谷歌多年的经历在之后的练习中终究还是使用tensorflow框架。下面我们跟着达叔的思路一步一步构建深层神经网络。

程序所需的库文件如下

  1. import math
  2. import numpy as np
  3. import h5py
  4. import matplotlib.pyplot as plt
  5. import tensorflow as tf
  6. from tensorflow.python.framework import ops
  7. from tf_utils import *
  8. np.random.seed(1)
tf_utils是吴恩达老师给出的辅助程序,可在这里获取。

一、牛刀小试

1.tensorflow程序执行的步骤

我们先使用tensorflow来对一个简单的损失函数进行计算,Loss公式如下:


  1. y_hat = tf.constant(36, name='y_hat')#创建常数张量,传入数值或者list来填充
  2. y = tf.constant(39, name='y')
  3. loss = tf.Variable((y - y_hat)**2, name='loss')#创建变量loss
  4. init = tf.global_variables_initializer()
  5. with tf.Session() as session:
  6. session.run(init)
  7. print(session.run(loss))
测试程序中设y_hat=36,y=39,运行程序结果为:
9
从上述的测试程序中,我们可以了解到tensorflow运行的几个步骤

(1)创建张量tensor(常量或变量);

(2)定义张量之间的运算operations;

(3)初始化*;

(4)创建会话Session*;

(5)在会话中运行操作*。

标*的操作步骤为重要步骤。

2. placeholder的使用

在tf中placeholder是可以稍后赋值的对象,之后在执行session时可通过feed_dict来给placeholder传递值。

  1. x = tf.placeholder(tf.int64, name='x')
  2. with tf.Session() as sess:
  3. print(sess.run(2 * x, feed_dict = {x: 3}))
  4.     sess.close()
3.线性函数

我们计算线性函数 Y = WX + b,其中W是(4,3)矩阵,X是(3,1)向量,b是(4,1)向量。

实现代码如下

  1. def linear_function():
  2. np.random.seed(1)
  3. X = tf.constant(np.random.randn(3,1), name = "X")
  4. W = tf.constant(np.random.randn(4,3), name = "W")
  5. b = tf.constant(np.random.randn(4,1), name = "b")
  6. Y = tf.add(tf.matmul(W, X), b)
  7. sess = tf.Session()
  8. result = sess.run(Y)
  9. sess.close()
  10. return result
print("Result = " + str(linear_function()))
[/code]
  1. Result = [[-2.15657382]
  2. [ 2.95891446]
  3. [-1.08926781]
  4. [-0.84538042]]
[/code]4.计算sigmoid

Tensorflow提供了很多神经网络常用的函数,可以直接调用使用,比如sigmoid,softmax,调用方法分别为tf.sigmoid()和tf.softmax,方便开发人员使用。

  1. def sigmoid(z):
  2. x = tf.placeholder(tf.float32, name = "x")
  3. sigmoid = tf.sigmoid(x)
  4. with tf.Session() as sess:
  5. result = sess.run(sigmoid, feed_dict = {x:z})
  6. sess.close()
  7. return result
  1. print("sigmoid(0) =" + str(sigmoid(0)))
  2. print("sigmoid(12) =" + str(sigmoid(12)))
[/code]
  1. sigmoid(0) =0.5
  2. sigmoid(12) =0.9999938
[/code]5.计算cost

cost的计算公式为


在之前的计算中我们需要使用对i,从1一直算到m,而在Tensorflow框架下,仅需一行代码就可实现这个公式。

  1. def cost(logits, labels):
  2. z = tf.placeholder(tf.float32, name = "z")
  3. y = tf.placeholder(tf.float32, name = "y")
  4. cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y)
  5. sess = tf.Session()
  6. cost = sess.run(cost, feed_dict = {z:logits, y:labels})
  7. sess.close()
  8. return cost
  1. logits = sigmoid(np.array([0.2,0.4,0.7,0.9]))
  2. cost = cost(logits, np.array([0,0,1,1]))
  3. print ("cost = " + str(cost))
[/code]
cost = [1.0053872  1.0366408  0.41385433 0.39956617]
[/code]6."one_hot"转换

对于多分类问题,我们需要将输出值转化为(C, m)矩阵,如下图;


如果使用numpy来实现需要多行代码才能够实现,而在Tensorflow框架下只需要一行代码即可。

  1. def one_hot_matrix(labels, C):
  2. C = tf.constant(value = C, name = "C")
  3. one_hot_matrix = tf.one_hot(labels, C, axis = 0)
  4. sess = tf.Session()
  5. one_hot = sess.run(one_hot_matrix)
  6. sess.close()
  7. return one_hot
  1. labels = np.array([1,2,3,0,2,1])
  2. one_hot = one_hot_matrix(labels, C = 4)
  3. print("one_hot = " +'\n'+ str(one_hot))
[/code]
  1. one_hot = [[ 0. 0. 0. 1. 0. 0.]
  2. [ 1. 0. 0. 0. 0. 1.]
  3. [ 0. 1. 0. 0. 1. 0.]
  4. [ 0. 0. 1. 0. 0. 0.]]
[/code]二、使用Tensorflow搭建神经网络

看过达叔教程的同学一定会实现神经网络的步骤比较熟悉,下面我们一起跟着达叔的思路用Tensorflow重新搭建一遍深层神经网络。由于之前已经写了多篇关于实现神经网络的文章,在本文中就不再详细描述了。

1.数据处理

达叔在此提供了一个有趣的数据集--手势数字集,如下图。


完整数据集点击此处下载。

X_train_orig, Y_train_orig,X_test_orig, Y_test_orig, classes = load_dataset()
  1. index = 0
  2. plt.imshow(X_train_orig[index])
  3. print("y=" + str(np.squeeze(Y_train_orig[:,index])))
[/code]显示的图片为


打印的标签是

y= 5
通常导入数据后还要进行flatten、标准化、one-hot处理
  1. X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
  2. X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
  3. X_train = X_train_flatten /255
  4. X_test = X_test_flatten /255
  5. Y_train = convert_to_one_hot(Y_train_orig, C = 6)
  6. Y_test = convert_to_one_hot(Y_test_orig, C = 6)
  7. print ("number of training examples = " + str(X_train.shape[1]))
  8. print ("number of test examples = " + str(X_test.shape[1]))
  9. print ("X_train shape: " + str(X_train.shape))
  10. print ("Y_train shape: " + str(Y_train.shape))
  11. print ("X_test shape: " + str(X_test.shape))
  12. print ("Y_test shape: " + str(Y_test.shape))
  1. number of training examples = 1080
  2. number of test examples = 120
  3. X_train shape: (12288, 1080)
  4. Y_train shape: (6, 1080)
  5. X_test shape: (12288, 120)
  6. Y_test shape: (6, 120)
[/code]在上述程序中,用到了convert_to_one_hot函数
  1. def convert_to_one_hot(Y, C):
  2. Y = np.eye(C)[Y.reshape(-1)].T
  3. return Y
很巧妙的使用了矩阵的特性来实现one-hot变换。提示:np.array((3,4))[1] -> 输出为该矩阵的第2行

2.创建placeholders

为了后续传入训练数据,我们需要先创建X和Y的占位符。

  1. def create_placeholder(n_x, n_y):
  2. X = tf.placeholder(tf.float32, shape = [n_x, None])
  3. Y = tf.placeholder(tf.float32, shape = [n_y, None])
  4. return X, Y
  1. X, Y = create_placeholder(12288, 6)
  2. print("X = " + str(X))
  3. print("Y = " + str(Y))
[/code]
  1. X = Tensor("Placeholder:0", shape=(12288, ?), dtype=float32)
  2. Y = Tensor("Placeholder_1:0", shape=(6, ?), dtype=float32)
[/code]3.参数初始化
我们使用Tensorflow来初始化参数W和b
  1. def initialize_parameters():
  2. tf.set_random_seed(1)
  3. W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
  4. b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())
  5. W2 = tf.get_variable("W2", [12,25], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
  6. b2 = tf.get_variable("b2", [12,1], initializer = tf.zeros_initializer())
  7. W3 = tf.get_variable("W3", [6,12], initializer = tf.contrib.layers.xavier_initializer(seed = 1))
  8. b3 = tf.get_variable("b3", [6,1], initializer = tf.zeros_initializer())
  9. parameters = {"W1" : W1,
  10. "b1" : b1,
  11. "W2" : W2,
  12. "b2" : b2,
  13. "W3" : W3,
  14. "b3" : b3}
  15. return parameters
  1. with tf.Session() as sess:
  2. parameters = initialize_parameters()
  3. print("W1 = " + str(parameters["W1"]))
  4. print("b1 = " + str(parameters["b1"]))
  5. print("W2 = " + str(parameters["W2"]))
  6. print("b2 = " + str(parameters["b2"]))
[/code]
  1. W1 = <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref>
  2. b1 = <tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref>
  3. W2 = <tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref>
  4. b2 = <tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref>
[/code]4.前向传播
  1. def forward_propagation(X, parameters):
  2. W1 = parameters['W1']
  3. b1 = parameters['b1']
  4. W2 = parameters['W2']
  5. b2 = parameters['b2']
  6. W3 = parameters['W3']
  7. b3 = parameters['b3']
  8. Z1 = tf.add(tf.matmul(W1, X), b1)
  9. A1 = tf.nn.relu(Z1)
  10. Z2 = tf.add(tf.matmul(W2, A1), b2)
  11. A2 = tf.nn.relu(Z2)
  12. Z3 = tf.add(tf.matmul(W3, A2), b3)
  13. return Z3
由于使用Tensorflow框架会自动进行反向传播的一些操作,因此在前向传播中我们不需要计算A3,也不需要返回cache。
  1. tf.reset_default_graph()
  2. with tf.Session() as sess:
  3. X, Y = create_placeholder(12288, 6)
  4. parameters = initialize_parameters()
  5. Z3 = forward_propagation(X, parameters)
  6. print("Z3 =", Z3)
[/code]
Z3 = Tensor("Add_2:0", shape=(6, ?), dtype=float32)
[/code]5.计算cost
  1. def compute_cost(Z3, Y):
  2. logits = tf.transpose(Z3)
  3. labels = tf.transpose(Y)
  4. cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels))
  5. return cost
  1. tf.reset_default_graph()
  2. with tf.Session() as sess:
  3. X, Y = create_placeholder(12288, 6)
  4. parameters = initialize_parameters()
  5. Z3 = forward_propagation(X, parameters)
  6. cost = compute_cost(Z3, Y)
  7. print("cost =", cost)
[/code]
cost = Tensor("Mean:0", shape=(), dtype=float32)
[/code]6.反向传播及参数更新

由于使用深度学习框架,我们可以把反向传播和参数更新浓缩成一行代码,且很容易整合到模型中。在计算完cost函数后,我们需要创建一个“optimizer”对象,以便执行给定梯度下降的方法和学习率

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)
使该优化生效,在sess.run中需要做下述处理。
_, c = sess.run([optimizer, cost], feed_dict = {X: minibatch_X, Y:minibatch_Y})
注:"_"为丢弃变量的标识符,在此我们只需sess.run()返回的cost值因此使用c来接收返回值,optimizer的值无需使用因此使用"_"来接收。

7.构建模型

现在我们将上述的函数整合到一起构建一个基于Tensorflow的神经网络模型。

  1. def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
  2. num_epochs = 1500, minibatch_size = 32, print_cost = True):
  3. ops.reset_default_graph()
  4. tf.set_random_seed(1)
  5. seed = 3
  6. (n_x, m) = X_train.shape
  7. n_y = Y_train.shape[0]
  8. costs = []
  9. X, Y = create_placeholder(n_x, n_y)
  10. parameters = initialize_parameters()
  11. Z3 = forward_propagation(X, parameters)
  12. cost = compute_cost(Z3, Y)
  13. optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
  14. init = tf.global_variables_initializer()
  15. with tf.Session() as sess:
  16. sess.run(init)
  17. for epoch in range(num_epochs):
  18. epoch_cost = 0.
  19. num_minibatches = int(m / minibatch_size)
  20. seed = seed + 1
  21. minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
  22. for minibatch in minibatches:
  23. (minibatch_X, minibatch_Y) = minibatch
  24. _ , minibatch_cost = sess.run([optimizer, cost], feed_dict = {X: minibatch_X, Y:minibatch_Y})
  25. epoch_cost += minibatch_cost / num_minibatches
  26. if print_cost and epoch % 100 == 0:
  27. print("Cost after epoch %i: %f" %(epoch, epoch_cost))
  28. if print_cost and epoch % 5 == 0:
  29. costs.append(epoch_cost)
  30. plt.plot(np.squeeze(costs))
  31. plt.xlabel("iterations (per five)")
  32. plt.ylabel("cost")
  33. plt.title("learning_rate:" + str(learning_rate))
  34. plt.show()
  35. parameters = sess.run(parameters)
  36. print("Parameters have been trained!")
  37. correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
  38. accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
  39. print("Train Accuray:", accuracy.eval({X: X_train, Y: Y_train}))
  40. print("Test Accuray:", accuracy.eval({X: X_test, Y: Y_test}))
  41. return parameters
程序在执行时需要一定时间,大家注意下如果在epoch 100时cost值不是1.016458,可以停止程序查找问题不必要浪费时间。
parameters = model(X_train, Y_train, X_test, Y_test)
  1. Cost after epoch 0: 1.855702
  2. Cost after epoch 100: 1.016458
  3. Cost after epoch 200: 0.733102
  4. Cost after epoch 300: 0.572938
  5. Cost after epoch 400: 0.468799
  6. Cost after epoch 500: 0.380979
  7. Cost after epoch 600: 0.313819
  8. Cost after epoch 700: 0.254258
  9. Cost after epoch 800: 0.203795
  10. Cost after epoch 900: 0.166410
  11. Cost after epoch 1000: 0.141497
  12. Cost after epoch 1100: 0.107579
  13. Cost after epoch 1200: 0.086229
  14. Cost after epoch 1300: 0.059415
  15. Cost after epoch 1400: 0.052237
[/code]
  1. Parameters have been trained!
  2. Train Accuray: 0.9990741
  3. Test Accuray: 0.71666664
[/code]以上我们完成的基于Tensorflow框架的神经网络,并使用这个模型对手势数字集进行了学习,虽然有些过拟合但是毕竟我们完成了自己的第一Tensorflow神经网络模型。
三、图片预测

在完成了模型搭建的任务之后,我们来测试一下自己的手势图片,如下图。

我们将该图片进行一些处理转化成64*64*3格式以便测试。

  1. my_image = "myfigure.jpg"
  2. fname = "images\\" + my_image
  3. image = np.array(ndimage.imread(fname, flatten=False))
  4. my_image = scipy.misc.imresize(image, size=(64,64)).reshape((1,64*64*3)).T
  5. my_image_prediction = predict(my_image, parameters)
  6. plt.imshow(image)
  7. print("your algorithm predicts : y=" + str(np.squeeze(my_image_prediction)))
your algorithm predicts : y=4
[/code]预测结果为4,看来模型还是需要提高。
四、总结

1.Tensorflow是深度学习的框架,其最重要的两个类是Tensor和Operaters

2.在框架下编程时要注意在第一节中的步骤

3.graph可以多次执行

4.反向传播和优化过程是框架自动完成



内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: