您的位置:首页 > 其它

Chapter 2 - Neural Network and Deep Learning

2017-08-21 21:35 696 查看

Chapter 2: How the backpropagation algorithm works

反向传播:back propagation or BP:

计算图:computation graph

链式法则: chain rule ∂y∂x=∂y∂z⋅∂z∂x

Computational Graph

In GD algorithms, we modify weights/biases by −η multiplying ∇C (C’s partial derivatives to them).

An example of using computational graphs to solve partial derivatives:



Notations

elementwise application of functions: f(v)

elementwise product of two vectors of the same shape: s⊙v

from the (l−1)th to lth layer: vl

weight from neuron k in layer l−1 to neuron j in layer l: wljk

zlj≡wlj⋅al−1+blj

the activation of the jth neuron in layer l: alj=σ(zlj)

δlj≡∂C∂zlj

Back Propagation

If we go through every neuron forwards, we may revisit some neurons for many times. Back propagation uses dynamic programming to save time.

Try to build an intuition with the following equations.

Calculate δl

1) For the output layer

Apply the chain rule:

∂C∂zLj=∂C∂aLjσ′(zLj)(1)

in shorthand: δL=∇aC⊙σ′(zL)

2) For layer l before the output layer

According to the chain rule: δlk=δl+1j∂zl+1j∂zlk=δl+1jwl+1jkσ′(zlk), therefore:

δl=((wl+1)T⋅δl+1)⊙σ′(zl)(2)

Calculate biases and weights

1) Biases

∂C∂blj=δlj(3)

in shorthand: ∂C∂b=δ

2) Weights

∂C∂wljk=al−1kδlj(4)

Dipicted:



A Vanilla Implementation

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: