论文阅读:ByteNet, Neural Machine Translation in Linear Time
2018-01-12 19:01
435 查看
Neural Translation Model
给定源语言 string s, 通过网络模型来估计目标语言string t的概率分布p(t|s)。与PixelCNN类似,t的联合概率分布可以通过链式法则转化为连续的p(ti|t<i,s)条件概率的乘积。
strings通常是各自语言中的句子,string中的每一个token则是字母(或者单词)。
模型包括以下两个部分:
Encoder:将源语言的string转换成特征表示。
Decoder:将特征表示转化成目标语言的string。
Desiderata
模型的运行时间和输入的长度呈线性比。encoder输出的特征表示应与输入长度呈线性比(而不是一个常量)。这意味着特征向量的长度与它携带的信息成正比。
任意两个token之间的前向或反向传播的路径长度应该尽可能的短。即传播路径的长度需要相对于输入长度“解耦”,这有助于网络学习语言中的长距离依赖。
ByteNet
ByteNet的Decoder是直接叠放在Encoder上的(而不是将Encoder的输出固定成定长向量或者attention pooling之后再输入Decoder),通过dynamic unfolding可以生成变长的输出。Dynamic Unfolding
设源序列s的长度为|s|,目标序列t的长度为|t|,则|t|的存在上界|tup|是|s|的线性函数:|tup|=a|s|+b
选择|tup|使满足条件:1.几乎总是大于|t|,2.不会超出计算需求太多。
In our case, we let a = 1.20 and b = 0 when translating from English into German, as German sentences tend to be somewhat longer than their English counterparts.
问题:原文说动态展开的步骤可以
may freely proceed beyond the estimated length |tup| of the encoder representation.
都可以超出 |tup|了那计算 |tup|有什么意义?确保输出的长度足够长?
两种Residual Blocks
with relu: 常用于machine translation experimentswith MU: 曾用于language modelling experiments
原文地址:https://arxiv.org/pdf/1610.10099.pdf
相关文章推荐
- 【论文阅读】Addressing the RareWord Problem in NeuralMachine Translation
- 【nlp论文阅读】Adversal Neural Machine Translation
- 【论文阅读】Neural Machine Translation By Jointly Learning To Align and Translate
- [论文阅读笔记] Massive Exploration of Neural Machine Translation Architectures
- Neural Machine Translation论文阅读笔记
- 论文阅读:《Neural Machine Translation by Jointly Learning to Align and Translate》
- 知识蒸馏(Distillation)相关论文阅读(1)——Distilling the Knowledge in a Neural Network(以及代码复现)
- Feed Forward and Backward Run in Deep Convolution Neural Network 论文阅读笔记
- 论文《NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE》总结
- [NLP论文阅读]Learned in Translation: Contextualized Word Vectors
- 论文阅读:Reading Text in the Wild with Convolutional Neural Networks
- 论文阅读--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection
- Sampled Softmax 论文笔记:On Using Very Large Target Vocabulary for Neural Machine Translation
- 【论文阅读】Community Structure in Time-Dependent, Multiscale, and Multiplex Networks
- Sampled Softmax 论文笔记:On Using Very Large Target Vocabulary for Neural Machine Translation
- An overview on domain adaptation in neural machine translation
- 论文阅读:Reading Text in the Wild with Convolutional Neural Networks
- 阅读小结:Google's Neural Machine Translation System
- 论文阅读:Dual Learning for Machine Translation
- Distilling the Knowledge in a Neural Network[论文阅读笔记]