您的位置:首页 > 移动开发

Gradient-Based Learning Applied to Document Recognition LeNet-5部分阅读笔记

2018-02-03 17:15 465 查看
《Gradient-Based Learning Applied to Document Recognition》

点击打开链接

Background knowledge

1.   
Gradient-based learning






2.    Back propagation: gradients can be computed efficiently by propagation from the outputto the input对误差进行反向传播,更新权值



Xn is a vector representing the output of the module. Wnis thevector of tunable parameters in the module a subset of W and Xn is the module’s input vector as well as the previous
module’soutput vector

 

3.   
ConvolutionalNetworks


Convolutional Networks combine three architecturalideas to ensure some degree of shift, scale and distortion invariance:local receptive fields,shared weights
(or weight replication) andspatial or temporalsub-sampling.

卷积网络的三个要点:局部感受野、权值共享、下采样。

 

localreceptive fields : Each unit in a layer receives inputsfrom a set of units located in a small neighborhood in the previous layer. 局部感受野,每个局部单元共享权值。

featuremap: Units in a layer are organized in planeswithin which all the units share the same set of weights. The set of outputs ofthe units in such a plane is called a feature map.共享权值的各局部单元输出形成一个feature
map。

sub-sampling: The receptive field of each unit is a 2 by 2 area in the previouslayer’s corresponding feature map. Units are non-overlapping. sub-samplingperforms a local averaging and reduces
the spatial solution of the feature map.

下采样,减小卷积层的尺寸,通过求局部平均降低特征图的分辨率,并且降低了输出对平移和形变的敏感度。

 

4.   
LossFunction


Maximum Likelihood Estimation criterion (MLE)



maximum a posteriori criterion (MAP)   posterior∝likelihood×prior (Beyasian Theory)

 


损失函数:损失函数最小相当于似然函数取得最大值

贝叶斯方法:求后验最大似然函数

 

LeNet-5网络架构





Input: a 32×32pixel image输入32×32像素的图像

7 layers 一共7层

C1: 5×5 unit, 6 feature maps 卷积层,28×28(32-(5-1)=28)

trainable parameters: (5×5+1)×6=156; connections: (5×5+1)×28×28×6=122304

S2: 2×2 unit, 6 feature maps. 下采样层,14×14 (28/2=14)

The four inputs to a unit in S2 are added, thenmultiplied by a trainable coefficient, and added to a trainable bias.

trainable parameters: (1+1)×6=12; connections: (2×2+1)×14×14×6=5880

C3: 5×5 unit, 16 feature maps 卷积层,10×10(14-(5-1)=10)

Each unit in each feature map is connected toseveral 5×5 neighborhoods at identical locations in asubset of S2’s feature map.



C3的每个feature map并不与S2所有feature map 相连接

trainableparameters: (25*3+1)*6+(25*4+1)*9+(25*6+1)=1516; connections: 1516×10×10=151600
S4: 2×2 unit, 16 feature maps. 下采样层,5×5(10/2=5)

trainable parameters:2×16=32;  connections: (2×2+1)×5×5×16=2000

C5: 5×5 unit, 120 feature maps. 卷积层,1×1,与S4全连接

C5 is labeled as a convolutional layer, insteadof a fully connected layer, because if LeNet-5 input were made bigger witheverything else kept constant, the feature map dimension would be larger than 1x1但仍是卷积层

trainable connections: (5×5×16+1)×120=48120

F6: fully connected to C5, 84 units

全连接层,84个单元,先计算与上一层点积,加上bias,再传入sigmoid函数

trainable parameters: (120+1)×84=10164

output layer: Euclidean RadialBasis Function units (RBF) for each class
输出层,每类一个输出,输出该类对应的RBF

 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐