您的位置：首页 > 其它

[深度学习论文笔记][Weight Initialization] Exact solutions to the nonlinear dynamics of learning in deep lin

2016-09-20 09:58 791 查看

Saxe, Andrew M., James L. McClelland, and Surya Ganguli. “Exact solutions to the nonlinear dynamics of learning in deep linear neural net-
works.” arXiv preprint arXiv:1312.6120 (2013). [Citations: 97].

1 General Learning Dynamics of Gradient Descent

[Timescale of Learning]

• Deep net learning time depends on optimal (largest stable) learning

rate.

• The optimal learning rate can be estimated by taking inverse of max-

imal eigenvalue of Hessian over the region of interest.
• Optimal learning rate scales as O(1/L), where L is # of layers.

2 Finding Good Weight Initializations

[Motivations] Unsupervised pretraining speeds up the optimization and act as a special regularizer towards solutions with better generalization performance.

• Unsupervised pretraining finds the special class of orthogonalized, decoupled initial conditions.

• That allow for rapid supervised learning since the network does notneed to adapt the principal directions, but rather only the strength of each layer.

[Idea] Using random orthogonal matrices (W^T W = I).

• Preservation of statistics across layers can imply faster learning.

• Gaussian matrices are almost guaranteed to have many small singular values. This implies that many vectors, either coming up or down the network, will be severely attenuated, hindering learning.

• Xavier initialization preserves norm of random vector on average.
• Orthogonal preserves norm of all vectors exactly.

[Nonlinear Case] A good initialization: The singular values of Jacobian J = ∂⃗a/∂⃗x concentrated around 1.

[Deep networks + large weights] Train exceptionally quickly.

• Large weights incur heavy cost in generalization performance.

• Small initial weights regularize towards smoother functions.

• Training difficulty arises from saddle points, not local minima.

3 References

[1]. Pillow Lab Blog. https://pillowlab.wordpress.com/2015/10/04/exact-solutions-to-the-nonlinear-dynamics-of-learning-in-deep-linear-neural-netw
[2]. ICLR 2014 Talk. https://www.youtube.com/watch?v=Ap7atx-Ki3Q.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： Computer Vision Deep Learning CNN Papers Initialization

相关文章推荐

新的分享

章节导航