您的位置：首页 > 理论基础 > 计算机网络

DeepLearning.ai code笔记1：神经网络与深度学习

2018-04-02 22:52 931 查看

说明一下，这和系列是对编程作业的作一些我认为比较重要的摘抄、翻译和解释，主要是为了记录不同的模型的主要思想或者流程，以及一些coding中常见的错误，作为查漏补缺之用。

作业链接：https://github.com/Wasim37/deeplearning-assignment。感谢大佬们在GitHub上的贡献。

1、随机数的生成

np.random.randn() 和 np.random.rand() 的差别：前者n表示按正太分布，后者按线性产生随机数。我在编程中开始总是因为少个 n 发现产生的随机数和作业不一致。

np.random.seed() ：通过设定一个随机数种子，相当于产生了一个固定的数组列表，每次按顺序返回数组中对应索引的数据。

import numpy as np
# np.random.seed(1)     # 取消注释查看差异就明白了seed的作用
print(np.random.random())
for i in range(5):
print(np.random.random())


未去掉	去掉
0.22199317108973948	0.22199317108973948
0.8707323061773764	0.8707323061773764
0.20671915533942642	0.20671915533942642
0.9186109079379216	0.9186109079379216
0.48841118879482914	0.48841118879482914
0.6117438629026457

2、建立神经网络的基本步骤

1、Define the model structure (such as number of input features)

2、Initialize the model’s parameters

3、Loop:

Calculate current loss (forward propagation)

Calculate current gradient (backward propagation)

Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call model().

翻译：

1、定义模型结构（如输入特征的个数）

2、初始化模型的参数

3、循环：

计算当前损失（正向传播）

计算当前梯度（反向传播）

更新参数（梯度下降）

你经常分别建立1-3，并把它们整合到我们所说的一个函数中model()。

def initialize_parameters_deep(layer_dims):
...
return parameters
def L_model_forward(X, parameters):
...
return AL, caches # 返回最后一层的激活值，所有层激活值的集合
def compute_cost(AL, Y):
...
return cost
def L_model_backward(AL, Y, caches):
...
return grads
def update_parameters(parameters, grads, learning_rate):
...
return parameters

前向传播的主要公式：

z(i)=wTx(i)+b(1)(1)z(i)=wTx(i)+b y^(i)=a(i)=sigmoid(z(i))(2)(2)y^(i)=a(i)=sigmoid(z(i)) L(a(i),y(i))=−y(i)log(a(i))−(1−y(i))log(1−a(i))(3)(3)L(a(i),y(i))=−y(i)log⁡(a(i))−(1−y(i))log⁡(1−a(i))

The cost is then computed by summing over all training examples: J=1m∑i=1mL(a(i),y(i))(4)(4)J=1m∑i=1mL(a(i),y(i))

反向传播的主要公式：

For layer ll, the linear part is: Z[l]=W[l]A[l−1]+b[l]Z[l]=W[l]A[l−1]+b[l] (followed by an activation).

Suppose you have already calculated the derivative dZ[l]=∂L∂Z[l]dZ[l]=∂L∂Z[l]. You want to get (dW[l],db[l]dA[l−1])(dW[l],db[l]dA[l−1]).

The three outputs (dW[l],db[l],dA[l])(dW[l],db[l],dA[l]) are computed using the input dZ[l]dZ[l].Here are the formulas you need: dW[l]=∂L∂W[l]=1mdZ[l]A[l−1]T(1)(1)dW[l]=∂L∂W[l]=1mdZ[l]A[l−1]T db[l]=∂L∂b[l]=1m∑i=1mdZ[l](i)(2)(2)db[l]=∂L∂b[l]=1m∑i=1mdZ[l](i) dA[l−1]=∂L∂A[l−1]=W[l]TdZ[l](3)(3)dA[l−1]=∂L∂A[l−1]=W[l]TdZ[l]

def linear_backward(dZ, cache):
"""
反向传播计算梯度
:param dZ: 当前层损失函数的导数，L层一般为 A-y
:param cache:
:return:
"""
A_pre, W, b = cache
m = A_pre.shape[1]

dW = np.dot(dZ, A_pre.T) / m
db = np.sum(dZ, axis=1, keepdims=True) / m
# dA/dA_pre = (dA/dZ * dZ/dA_pre) = (dA/dZ * w), 为了表示方便去掉了"dA/", 故乘法不变
dA_pre = np.dot(W.T, dZ)  # 注意 dA 和 dZ 不需要 / m
return dA_pre, dW, db

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 吴恩达深度学习作业神经网络网络搭建过程征信成本反向传播

相关文章推荐

新的分享

章节导航