您的位置:首页 > 编程语言 > Go语言

《TensorFlow 实战Google深度学习框架》中MNIST数字识别问题程序的实现与思考

2018-11-10 20:25 555 查看

书上的程序:

[code]import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

__author__: str = 'zhangkun'
INPUT_NODE = 784  # 输入节点数
OUTPUT_NODE = 10  # 输出节点数

LAYER1_NODE = 500  # 隐层节点数
BATCH_SIZE = 100  # BATCH大小

LEARNING_RATE_BASE = 0.8  # 基础学习率
LEARNING_RATE_DECAY = 0.99  # 学习衰减率

REGULARIZATION_RATE = 0.0001  # 正则化系数
TRAINING_STEPS = 30000  # 循环次数
MOVING_AVERAGE_DECAY = 0.99  # 滑动平均衰减系数

def inferene(input_tensor, avg_class, weights1, biases1, weights2, biases2):  # avg_class 是什么?
"""
:param input_tensor: 输入
:param avg_class: 用于计算参数平均值的类
:param weights1: 第一层权重
:param biases1: 第一层偏置
:param weights2: 第二层权重
:param biases2: 第二层偏置
:return: 返回神经网络的前向结果
"""
# 不使用滑动平均
if avg_class is None:
layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1)
return tf.matmul(layer1, weights2) + biases2
# 使用滑动平均
else:
layer1 = tf.nn.relu(
tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1)
)
return tf.matmul(layer1, avg_class.average(weights2)) + avg_class.average(biases2)

def train(mnist):
x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input')
# 正确的分类y
y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='y-input')

weights1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], stddev=0.1))  # 若使用stddev=0.1 则收敛很慢,为什么?
biases1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))

weights2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, OUTPUT_NODE], stddev=0.1))
biases2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))

y = inferene(x, None, weights1, biases1, weights2, biases2)

global_step = tf.Variable(0, trainable=False)

# 滑动平均类,存储滑动平均的参数
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)

variable_averages_op = variable_averages.apply(tf.trainable_variables())

# 经过滑动平均的参数算出的y
average_y = inferene(x, variable_averages, weights1, biases1, weights2, biases2)

# 计算分类损失
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, 1), logits=y)
# cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, 1), logits=average_y)
cross_entropy_mean = tf.reduce_mean(cross_entropy)

regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
regularization = regularizer(weights1) + regularizer(weights2)
loss = cross_entropy_mean + regularization

learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,  # 基础学习率
global_step,  # 迭代轮数
mnist.train.num_examples / BATCH_SIZE,  # 过完所有训练数据需要的迭代次数
LEARNING_RATE_DECAY  # 学习率衰减速率
)

# 为什么不用adam?在这里就更新了weights1和weights2?
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

#  顺序执行train_step 和 variable_averages_op
#  更新神经网络参数和滑动平均参数,滑动平均参数并没有参与神经网络参数的更新
with tf.control_dependencies([train_step, variable_averages_op]):
train_op = tf.no_op(name='train')

'''
流控制
其实用法很简单,只有在 control_inputs被执行以后,上下文管理器中的操作才会被执行。例如
with tf.control_dependencies([a, b, c]):
# `d` and `e` will only run after `a`, `b`, and `c` have executed.
d = ...
e = ...
'''

#  计算正确率
correct_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))
# correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# 初始化会话,并开始训练过程。
with tf.Session() as sess:
tf.global_variables_initializer().run()
validate_feed = {x: mnist.validation.images, y_: mnist.validation.labels}
test_feed = {x: mnist.test.images, y_: mnist.test.labels}

for i in range(TRAINING_STEPS):
if i % 1000 == 0:
validate_acc = sess.run(accuracy, feed_dict=validate_feed)
print("after %d training steps,validate accuracy using average model is %g" % (i, validate_acc))
xs, ys = mnist.train.next_batch(BATCH_SIZE)
sess.run(train_op, feed_dict={x: xs, y_: ys})

test_acc = sess.run(accuracy, feed_dict=test_feed)
print("after %d training steps,test accuracy using average model is %g" % (TRAINING_STEPS, test_acc))

def main(argv=None):  # 这是干什么的?
mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True)
train(mnist)

if __name__ == '__main__':  # 入口
tf.app.run()

调试程序中出现了一个十分奇葩的bug,是因为

[code]    else:
layer1 = tf.nn.relu(
tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1)
)
return tf.matmul(layer1, avg_class.average(weights2)) + avg_class.average(biases2)

括号位置写错了,写成了:

[code]    else:
layer1 = tf.nn.relu(
tf.matmul(input_tensor, avg_class.average(weights1) + avg_class.average(biases1))
)
return tf.matmul(layer1, avg_class.average(weights2) + avg_class.average(biases2))

得到如下结果:

神奇的是竟然能通过编译,实际上这样写导致了预测数值的计算错误。

另外,关于滑动平均的理解:

滑动平均是为了提高准确率,但是不能作为训练的评价参数

在不使用滑动平均的情况下的正确率

使用滑动平均好像预测效果更好,如下:

那么训练参数的时候使用平均滑动会怎么样?

可以看到程序会以极慢速度优化参数。

阅读更多
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: