Gradient-Based Learning Applied to Document Recognition LeNet-5部分阅读笔记
2018-02-03 17:15
465 查看
《Gradient-Based Learning Applied to Document Recognition》
点击打开链接
Background knowledge
1.
Gradient-based learning
2. Back propagation: gradients can be computed efficiently by propagation from the outputto the input对误差进行反向传播,更新权值
Xn is a vector representing the output of the module. Wnis thevector of tunable parameters in the module a subset of W and Xn is the module’s input vector as well as the previous
module’soutput vector
3.
ConvolutionalNetworks
Convolutional Networks combine three architecturalideas to ensure some degree of shift, scale and distortion invariance:local receptive fields,shared weights
(or weight replication) andspatial or temporalsub-sampling.
卷积网络的三个要点:局部感受野、权值共享、下采样。
localreceptive fields : Each unit in a layer receives inputsfrom a set of units located in a small neighborhood in the previous layer. 局部感受野,每个局部单元共享权值。
featuremap: Units in a layer are organized in planeswithin which all the units share the same set of weights. The set of outputs ofthe units in such a plane is called a feature map.共享权值的各局部单元输出形成一个feature
map。
sub-sampling: The receptive field of each unit is a 2 by 2 area in the previouslayer’s corresponding feature map. Units are non-overlapping. sub-samplingperforms a local averaging and reduces
the spatial solution of the feature map.
下采样,减小卷积层的尺寸,通过求局部平均降低特征图的分辨率,并且降低了输出对平移和形变的敏感度。
4.
LossFunction
Maximum Likelihood Estimation criterion (MLE)
maximum a posteriori criterion (MAP) posterior∝likelihood×prior (Beyasian Theory)
损失函数:损失函数最小相当于似然函数取得最大值
贝叶斯方法:求后验最大似然函数
LeNet-5网络架构
Input: a 32×32pixel image输入32×32像素的图像
7 layers 一共7层
C1: 5×5 unit, 6 feature maps 卷积层,28×28(32-(5-1)=28)
trainable parameters: (5×5+1)×6=156; connections: (5×5+1)×28×28×6=122304
S2: 2×2 unit, 6 feature maps. 下采样层,14×14 (28/2=14)
The four inputs to a unit in S2 are added, thenmultiplied by a trainable coefficient, and added to a trainable bias.
trainable parameters: (1+1)×6=12; connections: (2×2+1)×14×14×6=5880
C3: 5×5 unit, 16 feature maps 卷积层,10×10(14-(5-1)=10)
Each unit in each feature map is connected toseveral 5×5 neighborhoods at identical locations in asubset of S2’s feature map.
C3的每个feature map并不与S2所有feature map 相连接
trainableparameters: (25*3+1)*6+(25*4+1)*9+(25*6+1)=1516; connections: 1516×10×10=151600
S4: 2×2 unit, 16 feature maps. 下采样层,5×5(10/2=5)
trainable parameters:2×16=32; connections: (2×2+1)×5×5×16=2000
C5: 5×5 unit, 120 feature maps. 卷积层,1×1,与S4全连接
C5 is labeled as a convolutional layer, insteadof a fully connected layer, because if LeNet-5 input were made bigger witheverything else kept constant, the feature map dimension would be larger than 1x1但仍是卷积层
trainable connections: (5×5×16+1)×120=48120
F6: fully connected to C5, 84 units
全连接层,84个单元,先计算与上一层点积,加上bias,再传入sigmoid函数
trainable parameters: (120+1)×84=10164
output layer: Euclidean RadialBasis Function units (RBF) for each class
输出层,每类一个输出,输出该类对应的RBF
点击打开链接
Background knowledge
1.
Gradient-based learning
2. Back propagation: gradients can be computed efficiently by propagation from the outputto the input对误差进行反向传播,更新权值
Xn is a vector representing the output of the module. Wnis thevector of tunable parameters in the module a subset of W and Xn is the module’s input vector as well as the previous
module’soutput vector
3.
ConvolutionalNetworks
Convolutional Networks combine three architecturalideas to ensure some degree of shift, scale and distortion invariance:local receptive fields,shared weights
(or weight replication) andspatial or temporalsub-sampling.
卷积网络的三个要点:局部感受野、权值共享、下采样。
localreceptive fields : Each unit in a layer receives inputsfrom a set of units located in a small neighborhood in the previous layer. 局部感受野,每个局部单元共享权值。
featuremap: Units in a layer are organized in planeswithin which all the units share the same set of weights. The set of outputs ofthe units in such a plane is called a feature map.共享权值的各局部单元输出形成一个feature
map。
sub-sampling: The receptive field of each unit is a 2 by 2 area in the previouslayer’s corresponding feature map. Units are non-overlapping. sub-samplingperforms a local averaging and reduces
the spatial solution of the feature map.
下采样,减小卷积层的尺寸,通过求局部平均降低特征图的分辨率,并且降低了输出对平移和形变的敏感度。
4.
LossFunction
Maximum Likelihood Estimation criterion (MLE)
maximum a posteriori criterion (MAP) posterior∝likelihood×prior (Beyasian Theory)
损失函数:损失函数最小相当于似然函数取得最大值
贝叶斯方法:求后验最大似然函数
LeNet-5网络架构
Input: a 32×32pixel image输入32×32像素的图像
7 layers 一共7层
C1: 5×5 unit, 6 feature maps 卷积层,28×28(32-(5-1)=28)
trainable parameters: (5×5+1)×6=156; connections: (5×5+1)×28×28×6=122304
S2: 2×2 unit, 6 feature maps. 下采样层,14×14 (28/2=14)
The four inputs to a unit in S2 are added, thenmultiplied by a trainable coefficient, and added to a trainable bias.
trainable parameters: (1+1)×6=12; connections: (2×2+1)×14×14×6=5880
C3: 5×5 unit, 16 feature maps 卷积层,10×10(14-(5-1)=10)
Each unit in each feature map is connected toseveral 5×5 neighborhoods at identical locations in asubset of S2’s feature map.
C3的每个feature map并不与S2所有feature map 相连接
trainableparameters: (25*3+1)*6+(25*4+1)*9+(25*6+1)=1516; connections: 1516×10×10=151600
S4: 2×2 unit, 16 feature maps. 下采样层,5×5(10/2=5)
trainable parameters:2×16=32; connections: (2×2+1)×5×5×16=2000
C5: 5×5 unit, 120 feature maps. 卷积层,1×1,与S4全连接
C5 is labeled as a convolutional layer, insteadof a fully connected layer, because if LeNet-5 input were made bigger witheverything else kept constant, the feature map dimension would be larger than 1x1但仍是卷积层
trainable connections: (5×5×16+1)×120=48120
F6: fully connected to C5, 84 units
全连接层,84个单元,先计算与上一层点积,加上bias,再传入sigmoid函数
trainable parameters: (120+1)×84=10164
output layer: Euclidean RadialBasis Function units (RBF) for each class
输出层,每类一个输出,输出该类对应的RBF
相关文章推荐
- Gradient-Based Learning Applied to Document Recognition 部分阅读
- Gradient-Based Learning Applied to Document Recognition部分翻译
- 经典网络结构Lenet 《Gradient-Based Learning Applied to Document Recognition》
- A Brief Summary of Yann's "Gradient-Based Learning Applied to Document Recognition"
- 2016.4.16 Gradient-based learning applied to document recognition[待更]
- OpenCV编程实现LeCun论文(Gradient-Based Learning Applied to Document Recognition)中的CNN
- 论文提要“Gradient based Learning Applied to Document Reocognition”
- 论文笔记——N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning
- Grad-CAM:Visual Explanations from Deep Networks via Gradient-based L阅读笔记-网络可视化NO.3
- End-to-end learning of action detection from frame glimpses in videos 阅读笔记
- 【论文阅读笔记】Deep Learning based Recommender System: A Survey and New Perspectives
- LEARNING TO NAVIGATE IN COMPLEX ENVIRONMENTS 阅读笔记
- ResNet 《Deep Residual Learning for Image Recognition》 阅读笔记
- 论文笔记:A Review on Deep Learning Techniques Applied to Semantic Segmentation
- 论文笔记:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application
- Artificial Neural Networks Applied to Taxi Destination Prediction(阅读笔记)20171207
- Deep Residual Learning for Image Recognition--ResNet论文阅读笔记
- Deep Residual Learning for Image Recognition 阅读笔记
- Learning to learn by gradient descent by gradient descent 笔记
- [深度学习]Deep Residual Learning for Image Recognition(ResNet,残差网络)阅读笔记