您的位置:首页 > Web前端

[深度学习论文笔记][Weight Initialization] Random walk initialization for training very deep feedforward netw

2016-09-20 10:08 651 查看
Sussillo, David, and L. F. Abbott. “Random walk initialization for training very deep feedforward networks.” arXiv preprint arXiv:1412.6558 (2014). [Citations: 3].

1 Motivation

[Motivation] Gradient vanishing problem.

[Idea] Keep the gradient norm the same during backprop.

2 Linear Random Walk Initialization

[Network Form]



[Backprop]



[Simplifications]

• All layers have same width n .

• Initialize each W^(l) from N(0, 1/n) .



 is Gaussian since the product of a Gaussian matrix and a unit vector is a Gaussian
vector.



 is the squared magnitude of a Gaussian vector, so this term satisfies χ^2_n
.

[Goal] Solving the vanishing gradient problem amounts to keeping fraction the order of 1. Because W^(l) ’s are random, so does the fraction. So we take average.



This is equivent to have weight initialized from variance 


3 ReLU Random Walk Initialization

[Equivent Form of ReLU Activations] Zero out half of the rows of W, and leave out other rows unchanged.

• I.e., set (1−β) rows of W to 0 and leaves β rows with Gaussian entries.

• β ∼ Bin(n, 1/2 ) .

• Then

 .

[Optimal c] Compute numerical form of E[log χ^2_β ] and we can get

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
相关文章推荐