您的位置:首页 > 理论基础 > 计算机网络

LSTM神经网络Demystifying LSTM neural networks

2016-04-02 19:15 597 查看
有一个中文翻译版:http://www.csdn.net/article/2015-06-05/2824880

This article provides a basic introduction to Long Short Term Memory Neural Networks. For a more thorough review of RNNs, see the full
33 page review hosted on arXiv.

Given its wide applicability to real-world tasks, deep learning has attracted the attention of a wide audience of interested technologists, investors, and spectators. While the most celebrated results use feedforward convolutional neural networks (convnets)
to solve problems in computer vision, less public attention has been paid to developments using recurrent neural network to model relationships in time.

(Note: To help you begin experimenting with LSTM recurrent nets, I've
attached a snap of a simple micro instance preloaded with numpy, theano, and a git clone of Jonathan Raiman's LSTM example.)

In a recent post, "Learning to Read with Recurrent Neural Networks," I explained
why, despite their incredible successes, feedforward networks are limited by their inability to explicitly model relationships in time and by their assumption that all data points consist of vectors of fixed length. At the posts' conclusion, I promised a forthcoming
post, explaining the basics of recurrent nets and introducing the Long Short Term Memory (LSTM) model.





First, the basics of neural networks. A neural network can be represented as a graph ofartificial neurons, also called nodes and directed edges, which model synapses. Each neuron is a processing unit which takes as input the outputs of those nodes
connected to it. Before emitting output, each neuron first applies a nonlinear activation function. It is this activation function that gives neural networks the ability to model nonlinear relationships.

Now, consider this recent famous paper, "Playing Atari with Deep Reinforcement Learning," which combines convnets
with reinforcement learning to train computer to play video games. The system achieves superhuman performance on games like Breakout!, at which the proper strategy at any point can be deduced by looking at the screen. However, the system falls far short of
human performance when optimal strategies require planning over long spans of time, as in Space Invaders.

With this motivation we introduce recurrent neural networks, an approach which endows neural networks with the ability to explicitly model time by adding a self-connected hidden layer which spans time points. In other words, the hidden layer feeds not only
into the output, but also into the hidden layer at the next time step. Throughout this post I'll use some illustrations of recurrent networks pilferred from my forthcoming review of the literature on the subject.



We can now unfold this network across two time steps to visualize the connections in an acyclic way. Note that the weights (from input to hidden and hidden to output) are identical at each time step. A recurrent net is sometimes described as a deep network
where the depth occurs not between input and output, but across time steps, where each time step can be thought of as a layer.



Once unfolded, these networks can be trained end to end using backpropagation. This extension of backpropagation to span time steps is called backpropagation through time.

One problem, however, is the vanishing gradient as described by Yoshua Bengio in the frequently cited paper, "Learning
Long-Term Dependencies with Gradient Descent is Difficult." In other words, the error signal from later time steps often doesn't make it far enough back in time to influence the network at much earlier time steps. This makes it difficult to learn long-range
effects, such as taking that pawn will come back to bite you in 12 moves.

A remedy to this problem is the Long Short Term Memory (LSTM ) model first described in 1997 by Sepp Hochreiter and Jurgen Schmidhuber. In this model, ordinary neurons, i.e. units which apply a sigmoidal activation to a linear combination of their inputs, are
replaced bymemory cells. Each memory cell is associated with an input gate, an output gate and an internal state that feeds into itself unperturbed across time steps.



In this model, for each memory cell, three sets of weights are learned from the input as well as the entire hidden state at the previous time step. One feeds into the input node, pictured at bottom. One feeds into the input gate, shown on the far right side
of the cell at bottom. Another feeds into the output gate, shown on the far right side of the cell at top. Each blue node is associated with an activation function, typically sigmoidal, and the Pi nodes represent multiplication. The centermost node in the
cell is called the internal state and feeds into itself with a fixed weight of 1 across time steps. The self-connected edge attached to the internal state is referred to as the constant error carousel or CEC.

Thinking in terms of the forward pass, the input gate learns to decide when to let activation pass into the memory cell and the output gate learns when to let activation pass out of the memory cell. Alternative, in terms of the backwards pass, the
output gate is learning when to let error flow into the memory cell, and the input gates are learning when to let it flow out of the memory cell and to through the rest of the network. These models have proven remarkably successful on tasks as varied as handwriting
recognition and image captioning. Perhaps with some love they can be made to win at Space Invaders.

For a more thorough review of RNNs, see my full 33 page review hosted on arXiv.

from: http://blog.terminal.com/demistifying-long-short-term-memory-lstm-recurrent-neural-networks/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: