您的位置：首页 > 理论基础 > 计算机网络

TensorFlow实现经典深度学习网络（7）：TensorFlow实现双向长短时记忆循环神经网络

2017-11-03 10:05 1181 查看

TensorFlow实现经典深度学习网络（7）：TensorFlow实现双向长短时记忆循环神经网络

20世纪末，Schuster和Paliwal首次提出双向循环神经网络（Bi-directional Recurrent Neural Network ，Bi-RNN），长短时记忆模型LSTM也在同年被提出。Bi-RNN的主要目标是增加RNN可利用的信息。上节我门介绍来RNN的相关理论，然而，由于标准的循环神经网络在时序上处理序列，可以处理不固定长度的时序数据，但他们往往忽略了未来的上下文信息。一种很显而易见的解决办法是在输入和目标之间添加延迟，进而可以给网络一些时步来加入未来的上下文信息，也就是加入M时间帧的未来信息来一起预测输出。理论上，M可以非常大来捕获所有未来的可用信息，但事实上发现如果M过大，预测结果将会变差。这是因为网路把精力都集中记忆大量的输入信息，而导致将不同输入向量的预测知识联合的建模能力下降。因此，M的大小需要手动来调节。Bi-RNN则正好相反，它可以同时使用时序数据中某个输入的历史及未来数据。
Bi-RNN的实现原理很简单，将时序方向相反的两个循环神经网络连接到同一输出，通过这种结构，输出层就可以同时获取历史和未来信息了。双向循环神经网络Bi-RNN的基本思想是提出每一个训练序列向前和向后分别是两个循环神经网络RNN，而且这两个都连接着一个输出层。这个结构提供给输出层输入序列中每一个点的完整的过去和未来的上下文信息。下图展示的是一个沿着时间展开的双向循环神经网络。六个独特的权值在每一个时步被重复的利用，六个权值分别对应：输入到向前和向后隐含层（w1,w3），隐含层到隐含层自己（w2,
w5），向前和向后隐含层到输出层（w4, w6）。值得注意的是：向前和向后隐含层之间没有信息流，这保证了展开图是非循环的。

双向循环神经网络（BRNN）在时间上展开

Bi-RNN网络结构的核心是把一个普通的单向RNN拆成两个方向，一个是随时序正向的，一个是逆着时序反向的，这样当前时间节点的输出就可以同时利用正向和反向两个方向的信息。Bi-RNN中的每个RNN单元既可以是传统的RNN，也可以是LSTM单元或者GRU单元。在对Bi-LSTM有所了解后，我们就可以搭建网络了。首先需要下载MNIST数据集并解压，将其放在自己的工程目录下。以下代码是根据本人对Bi-LSTM网络的理解和现有资源(《TensorFlow实战》、TensorFlow的开源实现等)整理而成，并根据自己认识添加了注释。代码注释若有错误请指正。

# -*- coding: utf-8 -*-
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# 载入依赖库
import tensorflow as tf
import numpy as np

# 读取MNIST数据集
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# 设置训练参数
learning_rate = 0.01
max_samples = 400000
batch_size = 128
display_step = 10

# 设置网络结构参数
n_input = 28  # MNIST数据输入(img shape: 28*28)
n_steps = 28  # timesteps
n_hidden = 256  # LSTM的隐藏节点数
n_classes = 10  # MNIST数据集分类数目

# 创建输入x和学习目标y的placeholder
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

# 创建最后的Softmax层的weights和biases
weights = {

'out': tf.Variable(tf.random_normal([2 * n_hidden, n_classes]))
}
biases = {
'out': tf.Variable(tf.random_normal([n_classes]))
}

# 定义网络的生成函数
def BiRNN(x, weights, biases):

x = tf.transpose(x, [1, 0, 2])

x = tf.reshape(x, [-1, n_input])

x = tf.split(x, n_steps)

# 创建forward和backword的LSTM单元
lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

# 获得LSTM单元输出
outputs, _, _ = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
dtype=tf.float32)
return tf.matmul(outputs[-1], weights['out']) + biases['out']

pred = BiRNN(x, weights, biases)

# 定义损失和优化器
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# 评价模型
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# 初始化
init = tf.global_variables_initializer()

# 执行训练和测试操作
with tf.Session() as sess:
sess.run(init)
step = 1
# 保持训练直到达到最大迭代次数
while step * batch_size < max_samples:
batch_x, batch_y = mnist.train.next_batch(batch_size)

batch_x = batch_x.reshape((batch_size, n_steps, n_input))

sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
if step % display_step == 0:
# 计算batch accuracy
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
# 计算batch loss
loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
print("Iter " + str(step * batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
step += 1
print("Optimization Finished!")

# 测试数据，并展示准确率
test_len = 10000
test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
test_label = mnist.test.labels[:test_len]
print("Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

运行程序，我们会看到如下的程序显示

Iter 369920, Minibatch Loss= 0.010986, Training Accuracy= 1.00000
Iter 371200, Minibatch Loss= 0.004468, Training Accuracy= 1.00000
Iter 372480, Minibatch Loss= 0.026122, Training Accuracy= 0.99219
Iter 373760, Minibatch Loss= 0.021567, Training Accuracy= 0.99219
Iter 375040, Minibatch Loss= 0.031608, Training Accuracy= 0.98438
Iter 376320, Minibatch Loss= 0.003522, Training Accuracy= 1.00000
Iter 377600, Minibatch Loss= 0.021465, Training Accuracy= 0.99219
Iter 378880, Minibatch Loss= 0.029406, Training Accuracy= 0.99219
Iter 380160, Minibatch Loss= 0.019673, Training Accuracy= 0.99219
Iter 381440, Minibatch Loss= 0.023889, Training Accuracy= 0.99219
Iter 382720, Minibatch Loss= 0.006301, Training Accuracy= 1.00000
Iter 384000, Minibatch Loss= 0.041616, Training Accuracy= 0.98438
Iter 385280, Minibatch Loss= 0.045547, Training Accuracy= 0.99219
Iter 386560, Minibatch Loss= 0.009119, Training Accuracy= 1.00000
Iter 387840, Minibatch Loss= 0.004345, Training Accuracy= 1.00000
Iter 389120, Minibatch Loss= 0.021394, Training Accuracy= 0.99219
Iter 390400, Minibatch Loss= 0.004244, Training Accuracy= 1.00000
Iter 391680, Minibatch Loss= 0.009501, Training Accuracy= 1.00000
Iter 392960, Minibatch Loss= 0.001563, Training Accuracy= 1.00000
Iter 394240, Minibatch Loss= 0.009096, Training Accuracy= 1.00000
Iter 395520, Minibatch Loss= 0.004810, Training Accuracy= 1.00000
Iter 396800, Minibatch Loss= 0.062289, Training Accuracy= 0.98438
Iter 398080, Minibatch Loss= 0.009722, Training Accuracy= 1.00000
Iter 399360, Minibatch Loss= 0.033058, Training Accuracy= 0.99219
Optimization Finished!
Testing Accuracy: 0.9837

[align=left] 我们可以看出，Bi-LSTM在MNIST数据集上的表现虽然不如卷积神经网络，但也达到来一个很不错的水平。在完成40万个样本训练后，预测准确率基本都是1，而在10000个样本的测试集上也有0。9837的准确率。可以看到，Bi-LSTM同时利用时间序列的历史和未来信息，结合上下文信息获得了较好的表现。[/align]

在后续工作中，我将继续为大家展现TensorFlow和深度学习网络带来的无尽乐趣，我将和大家一起探讨深度学习的奥秘。当然，如果你感兴趣，我的Weibo将与你一起分享最前沿的人工智能、机器学习、深度学习与计算机视觉方面的技术。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： Deep Learning TensorFlow python NLP Bi-LSTM

相关文章推荐

新的分享

章节导航