您的位置：首页 > Web前端

[深度学习论文笔记][Weight Initialization] Random walk initialization for training very deep feedforward netw

2016-09-20 10:08 651 查看

Sussillo, David, and L. F. Abbott. “Random walk initialization for training very deep feedforward networks.” arXiv preprint arXiv:1412.6558 (2014). [Citations: 3].

1 Motivation

[Motivation] Gradient vanishing problem.

[Idea] Keep the gradient norm the same during backprop.

2 Linear Random Walk Initialization

[Network Form]

[Backprop]

[Simplifications]

• All layers have same width n .

• Initialize each W^(l) from N(0, 1/n) .

•

is Gaussian since the product of a Gaussian matrix and a unit vector is a Gaussian
vector.

•

is the squared magnitude of a Gaussian vector, so this term satisfies χ^2_n
.

[Goal] Solving the vanishing gradient problem amounts to keeping fraction the order of 1. Because W^(l) ’s are random, so does the fraction. So we take average.

This is equivent to have weight initialized from variance

3 ReLU Random Walk Initialization

[Equivent Form of ReLU Activations] Zero out half of the rows of W, and leave out other rows unchanged.

• I.e., set (1−β) rows of W to 0 and leaves β rows with Gaussian entries.

• β ∼ Bin(n, 1/2 ) .

• Then

.

[Optimal c] Compute numerical form of E[log χ^2_β ] and we can get

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： Computer Vision Deep Learning CNN Papers Initialization

相关文章推荐

新的分享

章节导航