神经网络注意力机制--Attention in Neural Networks
2017-10-16 14:14
585 查看
Attention in Neural Networks and How to Use It
http://akosiorek.github.io/ml/2017/10/14/visual-attention.html
这篇博文主要介绍神经网络中的注意力机制,代码实现了两个 soft visual attention
What is Attention? 首先来看看 注意力机制是什么?
Informally, a neural attention mechanism equips a neural network with the ability to focus on a subset of its inputs (or features): it selects specific inputs.
注意力机制可以让一个神经网络能够只关注其输入的一部分信息,它能够选择特定的输入。
attention is implemented as
f 是一个 attention network,其生成一个 attention vector a, 再讲 a 与输入 x 的 特征向量 z 相乘,这个 a 取值范围是 【0,1】,当我们说 soft attention 时,其取值是 0 到 1,当我们说 hard attention 其取值就只有 0 或 1。
为什么 attention 是重要的了?我们还有从 neural network 的本质说起, neural network 本质上就是一个 函数拟合器 function approximator,它的结构决定其可以拟合什么类型的函数,通常情况下输入向量彼此的作用方式只能是相加 A typical neural net is implemented as a chain of matrix multiplications and element-wise non-linearities, where elements of the input or feature vectors interact with each other only by addition
但是 注意力机制可以让输入向量之间的作用方式是相乘
Attention mechanisms compute a mask which is used to multiply features
neural networks are universal function approximators and can approximate an arbitrary function to arbitrary precision, but only in the limit of an infinite number of hidden units. In any practical setting, that is not the case: we are limited by the number of hidden units we can use.
神经网络可以拟合任意函数,但是受限于神经单元的数量,我们拟合的函数是有限制的。
The above definition of attention as multiplicative interactions allow us to consider a broader class of models if we relax the constrains on the values of the attention mask
注意力引入的 multiplicative interactions 可以让我们拟合更复杂的函数模型
Visual Attention
Attention 可以被应用到任意类型的输入,不管其形状如何,对于矩阵形式的输入 如 图像,我们可以探讨 visual attention,
Hard Attention
对于图像的 Hard Attention 就是 image cropping : g = I[y:y+h, x:x+w], 这个 hard attention 的问题是 non-differentiable,可以通过 score-function estimator 来解决
Soft Attention
文献 Show, Attend and Tell 中使用这种类型的 attention,
The model learns to attend to specific parts of the image while generating the word describing that part
Closing Thoughts
Attention mechanisms expand capabilities of neural networks: they allow approximating more complicated functions, or in more intuitive terms, they enable focusing on specific parts of the input.
Attention mechanisms 应该可以发挥更大的作用!
11
http://akosiorek.github.io/ml/2017/10/14/visual-attention.html
这篇博文主要介绍神经网络中的注意力机制,代码实现了两个 soft visual attention
What is Attention? 首先来看看 注意力机制是什么?
Informally, a neural attention mechanism equips a neural network with the ability to focus on a subset of its inputs (or features): it selects specific inputs.
注意力机制可以让一个神经网络能够只关注其输入的一部分信息,它能够选择特定的输入。
attention is implemented as
f 是一个 attention network,其生成一个 attention vector a, 再讲 a 与输入 x 的 特征向量 z 相乘,这个 a 取值范围是 【0,1】,当我们说 soft attention 时,其取值是 0 到 1,当我们说 hard attention 其取值就只有 0 或 1。
为什么 attention 是重要的了?我们还有从 neural network 的本质说起, neural network 本质上就是一个 函数拟合器 function approximator,它的结构决定其可以拟合什么类型的函数,通常情况下输入向量彼此的作用方式只能是相加 A typical neural net is implemented as a chain of matrix multiplications and element-wise non-linearities, where elements of the input or feature vectors interact with each other only by addition
但是 注意力机制可以让输入向量之间的作用方式是相乘
Attention mechanisms compute a mask which is used to multiply features
neural networks are universal function approximators and can approximate an arbitrary function to arbitrary precision, but only in the limit of an infinite number of hidden units. In any practical setting, that is not the case: we are limited by the number of hidden units we can use.
神经网络可以拟合任意函数,但是受限于神经单元的数量,我们拟合的函数是有限制的。
The above definition of attention as multiplicative interactions allow us to consider a broader class of models if we relax the constrains on the values of the attention mask
注意力引入的 multiplicative interactions 可以让我们拟合更复杂的函数模型
Visual Attention
Attention 可以被应用到任意类型的输入,不管其形状如何,对于矩阵形式的输入 如 图像,我们可以探讨 visual attention,
Hard Attention
对于图像的 Hard Attention 就是 image cropping : g = I[y:y+h, x:x+w], 这个 hard attention 的问题是 non-differentiable,可以通过 score-function estimator 来解决
Soft Attention
文献 Show, Attend and Tell 中使用这种类型的 attention,
The model learns to attend to specific parts of the image while generating the word describing that part
Closing Thoughts
Attention mechanisms expand capabilities of neural networks: they allow approximating more complicated functions, or in more intuitive terms, they enable focusing on specific parts of the input.
Attention mechanisms 应该可以发挥更大的作用!
11
相关文章推荐
- 神经网络注意力机制--Attention in Neural Networks
- 用平常语言介绍神经网络(Neural Networks in Plain English)
- 神经网络压缩(4) Learning Structured Sparsity in Deep Neural Networks
- 用平常语言介绍神经网络(Neural Networks in Plain English)
- Attention, 神经网络中的注意力机制
- 神经网络压缩(6):Exploring the Regularity of Sparse Structure in Convolutional Neural Networks
- Stanford机器学习---第四讲. 神经网络的表示 Neural Networks representation
- 神经网络不同激活函数比较--读《Understanding the difficulty of training deep feedforward neural networks》
- 机器学习中的神经网络Neural Networks for Machine Learning:Programming Assignment 4: Restricted Boltzmann Machin
- 斯坦福机器学习实验之3-多分类和神经网络(Multi-class Classification and Neural Networks)
- 深度学习方法(十):卷积神经网络结构变化——Maxout Networks,Network In Network,Global Average Pooling
- AndrewNg机器学习第五周-神经网络的学习 Neural Networks learning
- 机器学习中的神经网络Neural Networks for Machine Learning:Lecture 12 Quiz
- Implementing a Neural Network from Scratch in Python – An Introduction(通过python中的Scratch来实现神经网络--简介)
- Stanford机器学习---第4讲. 神经网络的表示 Neural Networks representation
- CHAPTER 3 改进神经网络的学习(Improving the way neural networks learn)
- (zhuan) Attention in Long Short-Term Memory Recurrent Neural Networks
- Reducing the Dimensionality of Data with Neural Networks:神经网络用于降维
- 循环神经网络教程Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs
- 深度学习方法(十):卷积神经网络结构变化——Maxout Networks,Network In Network,Global Average Pooling