您的位置：首页 > 其它

《Neural networks and deep learning》概览

2015-03-12 12:34 316 查看

最近阅读了《Neural networks and deep learning》这本书（online book，还没出版），算是读得比较仔细，前面几章涉及的内容比较简单，我着重看了第三章《Improving the way neural networks learn》，涉及深度神经网络优化和训练的各种技术，对第三章做了详细的笔记（同时参考了其他资料，以后读到其他相关的论文资料也会补充或更改），欢迎有阅读这本书的同学一起交流。以下属个人理解，如有错误请指正。

What this book is about？

这本书中的代码基于Python实现，从MNIST这个例子出发，讲人工神经网络（Neural networks），逐步深入到深度学习（Deep Learning），以及代码实现，一些优化方法。适合作为入门书。

1、 Using neural nets to recognize handwritten digits

文章概要

用人工神经网络来识别MNIST数据集，Python实现，仅依赖NumPy库。

2、 How the backpropagation algorithm works

文章概要

上一章没有讨论怎么优化NN，当时并没有讨论怎么计算损失函数的梯度，没有讨论优化过程，这就是这一章要讲的BP算法。

BP算法在1970s出现，但直到1986年Hinton的paper发表之后它才火起来。

BP实现代码

the code was contained in the update_ mini _ batch and backprop methods of the Network class.In particular, the update_mini_batch method updates the Network’s weights and biases by computing the gradient for the current mini_batch of training examples:

Fully matrix-based approach to backpropagation over a mini-batch

Our implementation of stochastic gradient descent loops over training examples in a mini-batch. It’s possible to modify the backpropagation algorithm so that it computes the gradients for all training examples in a mini-batch simultaneously. The idea is that instead of beginning with a single input vector, x, we can begin with a matrix X=[x1x2…xm] whose columns are the vectors in the mini-batch.

将mini batch里的所有样本组合成一个大矩阵，然后计算梯度，这样可以利用线性代数库，大大地减少运行时间。

BP算法有多快？

BP算法刚发明的时候，计算机计算能力极其有限。现在BP在深度学习算法中广泛应用，得益于计算能力的大跃升，以及很多有用的trick。

what’s the algorithm really doing？

这部分对BP算法深入讨论，是个证明过程。网络前面某个节点发生的改变，会一层一层往后传递，导致代价函数发生改变，这两个改变之间的关系可以表示为：

一层一层地推导，又可以表示为：

后面还有一堆……

关于BP的原理，建议看看Andrew NG的UFLDL，也可以看一些相应的博文。

3、Improving the way neural networks learn

这一章讨论一些加速BP算法、提高NN性能的技术。这些技术/trick在训练网络、优化的时候很常用，如下所述，（目前还没整理完各个部分的笔记，而且篇幅长，就分为几篇博客来写，陆续在 [文章链接] 中贴出。）：

比方差代价函数更好的：交叉熵代价函数 [文章链接]

四种正则化方法（提高泛化能力，避免overfitting）： [文章链接]

L1 regularization

L2 regularization

dropout

artificial expansion of the training data

权重初始化的方法 [文章链接]

如何选取超参数（学习速率、正则项系数、minibatch size） [文章链接]

4、A visual proof that neural nets can compute any function

转载请注明出处：/article/1524286.html

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航