TensorFlow入门-MNIST & CNN

参考TensorFLow官方教程Deep MNIST for Experts实现用CNN识别手写数字,数据集MNIST。

# load MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("Mnist_data/", one_hot=True)

# start tensorflow interactiveSession
import tensorflow as tf
sess = tf.InteractiveSession()

# weight initialization
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)

def bias_variable(shape):
initial = tf.constant(0.1, shape = shape)
return tf.Variable(initial)

# convolution
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
# pooling
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# Create the model
# placeholder
x = tf.placeholder("float", [None, 784])
y_ = tf.placeholder("float", [None, 10])
# variables
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x,W) + b)

# first convolutinal layer
w_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# second convolutional layer
w_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# densely connected layer
w_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)

# dropout
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# readout layer
w_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc2) + b_fc2)

# train and evaluate the model
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdagradOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
for i in range(20000):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_:batch[1], keep_prob:1.0})
print "step %d, train accuracy %g" %(i, train_accuracy)
train_step.run(feed_dict={x:batch[0], y_:batch[1], keep_prob:0.5})

print "test accuracy %g" % accuracy.eval(feed_dict={x:mnist.test.images, y_:mnist.test.labels, keep_prob:1.0})


step 19300, train accuracy 0.94

step 19400, train accuracy 0.92

step 19500, train accuracy 0.86

step 19600, train accuracy 0.98

step 19700, train accuracy 0.94

step 19800, train accuracy 0.96

step 19900, train accuracy 0.94


ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10000,28,28,32]

意思是用10000张图像测试时,内存泄露(memory leak)。不知道原因,那就把测试图形改成2000张试试。

print "test accuracy %g" % accuracy.eval(feed_dict={x:mnist.test.images[:2000], y_:mnist.test.labels[:2000], keep_prob:1.0})

输出:test accuracy 0.9155。


Gradient Descent

step 19200, train accuracy 1

step 19300, train accuracy 1

step 19400, train accuracy 1

step 19500, train accuracy 1

step 19600, train accuracy 1

step 19700, train accuracy 1

step 19800, train accuracy 1

step 19900, train accuracy 1

test accuracy 0.9895



ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10000,28,28,32]

是因为在测试时GPU显存不足导致的,可以将test set分成几个batch分别测试,最后求平均精度。如下:

accuracy_sum = tf.reduce_sum(tf.cast(correct_prediction, tf.float32))
good = 0
total = 0
for i in xrange(10):
testSet = mnist.test.next_batch(50)
good += accuracy_sum.eval(feed_dict={ x: testSet[0], y_: testSet[1], keep_prob: 1.0})
total += testSet[0].shape[0]
print("test accuracy %g"%(good/total))


1. http://stackoverflow.com/questions/39076388/tensorflow-deep-mnist-resource-exhausted-oom-when-allocating-tensor-with-shape

2. https://github.com/tensorflow/tensorflow/pull/157


MNIST的训练图像一共有5,5000张,但是训练时使用了20,000个周期(epochs),每个周期的batch size是50,一共就有20000*50=100万张图片参与训练,平均每张图片训练了18次,那么这样的训练方式会提高精度吗,会不会产生过拟合呢?




