深度学习DeepLearning.ai系列课程学习总结:12. 优化算法实战
2017-09-17 21:20
1046 查看
转载过程中,图片丢失,代码显示错乱。
为了更好的学习内容,请访问原创版本:
http://www.missshi.cn/api/view/blog/59bbcae0e519f50d04000204
Ps:初次访问由于js文件较大,请耐心等候(8s左右)
到目前为止,我们始终都是在使用梯度下降法来优化代价函数。
本文中,我们将使用一些更加高级的优化算法,利用这些优化算法,通常可以提高我们算法的收敛速度,并在最终得到更好的分离结果。
假设我们的代价函数就像这样一个山峰:首先,我们需要引入一些相关的库:
import numpy as np import matplotlib.pyplot as plt import scipy.io import math import sklearn import sklearn.datasets from opt_utils import load_params_and_grads, initialize_parameters, forward_propagation, backward_propagation from opt_utils import compute_cost, predict, predict_dec, plot_decision_boundary, load_dataset from testCases import * %matplotlib inline plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray'
其中,一些相关的函数如下:
def load_params_and_grads(seed=1): np.random.seed(seed) W1 = np.random.randn(2,3) b1 = np.random.randn(2,1) W2 = np.random.randn(3,3) b2 = np.random.randn(3,1) dW1 = np.random.randn(2,3) db1 = np.random.randn(2,1) dW2 = np.random.randn(3,3) db2 = np.random.randn(3,1) return W1, b1, W2, b2, dW1, db1, dW2, db2 def initialize_parameters(layer_dims): """ Arguments: layer_dims -- python array (list) containing the dimensions of each layer in our network Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layer_dims[l], layer_dims[l-1]) b1 -- bias vector of shape (layer_dims[l], 1) Wl -- weight matrix of shape (layer_dims[l-1], layer_dims[l]) bl -- bias vector of shape (1, layer_dims[l]) Tips: - For example: the layer_dims for the "Planar Data classification model" would have been [2,2,1]. This means W1's shape was (2,2), b1 was (1,2), W2 was (2,1) and b2 was (1,1). Now you have to generalize it! - In the for loop, use parameters['W' + str(l)] to access Wl, where l is the iterative integer. """ np.random.seed(3) parameters = {} L = len(layer_dims) # number of layers in the network for l in range(1, L): parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])* np.sqrt(2 / layer_dims[l-1]) parameters['b' + str(l)] = np.zeros((layer_dims[l], 1)) assert(parameters['W' + str(l)].shape == layer_dims[l], layer_dims[l-1]) assert(parameters['W' + str(l)].shape == layer_dims[l], 1) return parameters def forward_propagation(X, parameters): """ Implements the forward propagation (and computes the loss) presented in Figure 2. Arguments: X -- input dataset, of shape (input size, number of examples) parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3": W1 -- weight matrix of shape () b1 -- bias vector of shape () W2 -- weight matrix of shape () b2 -- bias vector of shape () W3 -- weight matrix of shape () b3 -- bias vector of shape () Returns: loss -- the loss function (vanilla logistic loss) """ # retrieve parameters W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] W3 = parameters["W3"] b3 = parameters["b3"] # LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID z1 = np.dot(W1, X) + b1 a1 = relu(z1) z2 = np.dot(W2, a1) + b2 a2 = relu(z2) z3 = np.dot(W3, a2) + b3 a3 = sigmoid(z3) cache = (z1, a1, W1, b1, z2, a2, W2, b2, z3, a3, W3, b3) return a3, cache def backward_propagation(X, Y, cache): """ Implement the backward propagation presented in figure 2. Arguments: X -- input dataset, of shape (input size, number of examples) Y -- true "label" vector (containing 0 if cat, 1 if non-cat) cache -- cache output from forward_propagation() Returns: gradients -- A dictionary with the gradients with respect to each parameter, activation and pre-activation variables """ m = X.shape[1] (z1, a1, W1, b1, z2, a2, W2, b2, z3, a3, W3, b3) = cache dz3 = 1./m * (a3 - Y) dW3 = np.dot(dz3, a2.T) db3 = np.sum(dz3, axis=1, keepdims = True) da2 = np.dot(W3.T, dz3) dz2 = np.multiply(da2, np.int64(a2 > 0)) dW2 = np.dot(dz2, a1.T) db2 = np.sum(dz2, axis=1, keepdims = True) da1 = np.dot(W2.T, dz2) dz1 = np.multiply(da1, np.int64(a1 > 0)) dW1 = np.dot(dz1, X.T) db1 = np.sum(dz1, axis=1, keepdims = True) gradients = {"dz3": dz3, "dW3": dW3, "db3": db3, "da2": da2, "dz2": dz2, "dW2": dW2, "db2": db2, "da1": da1, "dz1": dz1, "dW1": dW1, "db1": db1} return gradients def compute_cost(a3, Y): """ Implement the cost function Arguments: a3 -- post-activation, output of forward propagation Y -- "true" labels vector, same shape as a3 Returns: cost - value of the cost function """ m = Y.shape[1] logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y) cost = 1./m * np.sum(logprobs) return cost def predict(X, y, parameters): """ This function is used to predict the results of a n-layer neural network. Arguments: X -- data set of examples you would like to label parameters -- parameters of the trained model Returns: p -- predictions for the given dataset X """ m = X.shape[1] p = np.zeros((1,m), dtype = np.int) # Forward propagation a3, caches = forward_propagation(X, parameters) # convert probas to 0/1 predictions for i in range(0, a3.shape[1]): if a3[0,i] > 0.5: p[0,i] = 1 else: p[0,i] = 0 # print results #print ("predictions: " + str(p[0,:])) #print ("true labels: " + str(y[0,:])) print("Accuracy: " + str(np.mean((p[0,:] == y[0,:])))) return p def predict_dec(parameters, X): """ Used for plotting decision boundary. Arguments: parameters -- python dictionary containing your parameters X -- input data of size (m, K) Returns predictions -- vector of predictions of our model (red: 0 / blue: 1) """ # Predict using forward propagation and a classification threshold of 0.5 a3, cache = forward_propagation(X, parameters) predictions = (a3 > 0.5) return predictions def plot_decision_boundary(model, X, y): # Set min and max values and give it some padding x_min, x_max = X[0, :].min() - 1, X[0, :].max() + 1 y_min, y_max = X[1, :].min() - 1, X[1, :].max() + 1 h = 0.01 # Generate a grid of points with distance h between them xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) # Predict the function value for the whole grid Z = model(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) # Plot the contour and training examples plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral) plt.ylabel('x2') plt.xlabel('x1') plt.scatter(X[0, :], X[1, :], c=y, cmap=plt.cm.Spectral) plt.show() def load_dataset(): np.random.seed(3) train_X, train_Y = sklearn.datasets.make_moons(n_samples=300, noise=.2) #300 #0.2 # Visualize the data plt.scatter(train_X[:, 0], train_X[:, 1], c=train_Y, s=40, cmap=plt.cm.Spectral); train_X = train_X.T train_Y = train_Y.reshape((1, train_Y.shape[0])) return train_X, train_Y
梯度下降算法
更新参数的函数实现如下:def update_parameters_with_gd(parameters, grads, learning_rate): """ Update parameters using one step of gradient descent Arguments: parameters -- python dictionary containing your parameters to be updated: parameters['W' + str(l)] = Wl parameters['b' + str(l)] = bl grads -- python dictionary containing your gradients to update each parameters: grads['dW' + str(l)] = dWl grads['db' + str(l)] = dbl learning_rate -- the learning rate, scalar. Returns: parameters -- python dictionary containing your updated parameters """ L = len(parameters) // 2 # number of layers in the neural networks # Update rule for each parameter for l in range(L): ### START CODE HERE ### (approx. 2 lines) parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads['dW' + str(l+1)] parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads['db' + str(l+1)] ### END CODE HERE ### return parameters
由梯度下降算法演变来的还有随机梯度下降算法和小批量梯度下降算法。
其中,对于梯度下降算法,实现如下:
X = data_input Y = labels parameters = initialize_parameters(layers_dims) for i in range(0, num_iterations): # Forward propagation a, caches = forward_propagation(X, parameters) # Compute cost. cost = compute_cost(a, Y) # Backward propagation. grads = backward_propagation(a, caches, parameters) # Update parameters. parameters = update_parameters(parameters, grads)
对于随机梯度下降算法,实现如下:
X = data_input Y = labels parameters = initialize_parameters(layers_dims) for i in range(0, num_iterations): for j in range(0, m): # Forward propagation a, caches = forward_propagation(X[:,j], parameters) # Compute cost cost = compute_cost(a, Y[:,j]) # Backward propagation grads = backward_propagation(a, caches, parameters) # Update parameters. parameters = update_parameters(parameters, grads)
在随机梯度下降算法中,每次迭代中仅使用其中一个样本。
当训练集很大时,使用随机梯度下降算法的运行速度会很快,但是会存在一定的波动。
而在实践中,一个更好的实践是使用小批量梯度下降算法。
小批量梯度下降算法是一种综合批梯度下降算法和随机梯度下降算法的方法。
每次迭代过程中,既不是选择全部数据、也不是仅选择一个样本,而是选择一个批量。例如64,128等。
一方面,充分利用的GPU的并行性,更一方面,没有造成特别大的计算时间。
小批量梯度下降算法
首先,我们需要学习如果将训练集进行分批。分批一共可以分为两个步骤:
第一步:打乱顺序。
首先,我们需要将输入样本和标定结果按照相同的顺序随机打乱,保证我们得到的每个批次都是随机的。
第二步:切分。
当我们已经把训练集随机打乱后,接下来就是对其进行切分。批次的大小可以选择64,128,256等。
def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0): """ Creates a list of random minibatches from (X, Y) Arguments: X -- input data, of shape (input size, number of examples) Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples) mini_batch_size -- size of the mini-batches, integer Returns: mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y) """ np.random.seed(seed) # To make your "random" minibatches the same as ours m = X.shape[1] # number of training examples mini_batches = [] # Step 1: Shuffle (X, Y) permutation = list(np.random.permutation(m)) shuffled_X = X[:, permutation] shuffled_Y = Y[:, permutation].reshape((1,m)) # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case. num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning for k in range(0, num_complete_minibatches): ### START CODE HERE ### (approx. 2 lines) mini_batch_X = shuffled_X[:, mini_batch_size * k : (k+1) * mini_batch_size] mini_batch_Y = shuffled_Y[:, mini_batch_size * k : (k+1) * mini_batch_size] ### END CODE HERE ### mini_batch = (mini_batch_X, mini_batch_Y) mini_batches.append(mini_batch) # Handling the end case (last mini-batch < mini_batch_size) if m % mini_batch_size != 0: ### START CODE HERE ### (approx. 2 lines) mini_batch_X = shuffled_X[:, mini_batch_size * (k+1) : m] mini_batch_Y = shuffled_Y[:, mini_batch_size * (k+1) : m] ### END CODE HERE ### mini_batch = (mini_batch_X, mini_batch_Y) mini_batches.append(mini_batch) return mini_batches
Momentum
当我们使用小批量梯度下降法时,每次对于一个训练样本的子集进行一次迭代。因此,计算得到的梯度与真实的梯度可能存在一定的偏差。
此时,我们可以利用Momentum来减小偏差。
Momentum方法在更新的过程中,考虑了之前时刻的运行方向的影响,最终结合的作用可以克服一些非真实梯度引入的上下抖动现象。
momentum的更新规则如下:
def initialize_velocity(parameters): """ Initializes the velocity as a python dictionary with: - keys: "dW1", "db1", ..., "dWL", "dbL" - values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters. Arguments: parameters -- python dictionary containing your parameters. parameters['W' + str(l)] = Wl parameters['b' + str(l)] = bl Returns: v -- python dictionary containing the current velocity. v['dW' + str(l)] = velocity of dWl v['db' + str(l)] = velocity of dbl """ L = len(parameters) // 2 # number of layers in the neural networks v = {} # Initialize velocity for l in range(L): ### START CODE HERE ### (approx. 2 lines) v["dW" + str(l+1)] = np.zeros(parameters['W' + str(l+1)].shape) v["db" + str(l+1)] = np.zeros(parameters['b' + str(l+1)].shape) ### END CODE HERE ### return v def update_parameters_with_momentum(parameters, grads, v, beta, learning_rate): """ Update parameters using Momentum Arguments: parameters -- python dictionary containing your parameters: parameters['W' + str(l)] = Wl parameters['b' + str(l)] = bl grads -- python dictionary containing your gradients for each parameters: grads['dW' + str(l)] = dWl grads['db' + str(l)] = dbl v -- python dictionary containing the current velocity: v['dW' + str(l)] = ... v['db' + str(l)] = ... beta -- the momentum hyperparameter, scalar learning_rate -- the learning rate, scalar Returns: parameters -- python dictionary containing your updated parameters v -- python dictionary containing your updated velocities """ L = len(parameters) // 2 # number of layers in the neural networks # Momentum update for each parameter for l in range(L): ### START CODE HERE ### (approx. 4 lines) # compute velocities v["dW" + str(l+1)] = beta * v["dW" + str(l+1)] + (1-beta) * grads['dW' + str(l+1)] v["db" + str(l+1)] = beta * v["db" + str(l+1)] + (1-beta) * grads['db' + str(l+1)] # update parameters parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * v["dW" + str(l+1)] parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * v["db" + str(l+1)] ### END CODE HERE ### return parameters, v
在momentum中,有一个参数beta。
当beta=0时,此时,momentum相当于没有使用momentum算法的标准梯度下降算法。
当beta越到,说明平滑的作用越明显。通常,在实践中,0.9是比较适当的值。
Adam算法
Adam算法是训练神经网络中最有效的算法之一。它是RMSProp算法与Momentum算法的结合体。
其迭代公式如下:
实现过程如下:
def initialize_adam(parameters) : """ Initializes v and s as two python dictionaries with: - keys: "dW1", "db1", ..., "dWL", "dbL" - values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters. Arguments: parameters -- python dictionary containing your parameters. parameters["W" + str(l)] = Wl parameters["b" + str(l)] = bl Returns: v -- python dictionary that will contain the exponentially weighted average of the gradient. v["dW" + str(l)] = ... v["db" + str(l)] = ... s -- python dictionary that will contain the exponentially weighted average of the squared gradient. s["dW" + str(l)] = ... s["db" + str(l)] = ... """ L = len(parameters) // 2 # number of layers in the neural networks v = {} s = {} # Initialize v, s. Input: "parameters". Outputs: "v, s". for l in range(L): ### START CODE HERE ### (approx. 4 lines) v["dW" + str(l+1)] = np.zeros(parameters['W' + str(l+1)].shape) v["db" + str(l+1)] = np.zeros(parameters['b' + str(l+1)].shape) s["dW" + str(l+1)] = np.zeros(parameters['W' + str(l+1)].shape) s["db" + str(l+1)] = np.zeros(parameters['b' + str(l+1)].shape) ### END CODE HERE ### return v, s def update_parameters_with_adam(parameters, grads, v, s, t, learning_rate = 0.01, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8): """ Update parameters using Adam Arguments: parameters -- python dictionary containing your parameters: parameters['W' + str(l)] = Wl parameters['b' + str(l)] = bl grads -- python dictionary containing your gradients for each parameters: grads['dW' + str(l)] = dWl grads['db' + str(l)] = dbl v -- Adam variable, moving average of the first gradient, python dictionary s -- Adam variable, moving average of the squared gradient, python dictionary learning_rate -- the learning rate, scalar. beta1 -- Exponential decay hyperparameter for the first moment estimates beta2 -- Exponential decay hyperparameter for the second moment estimates epsilon -- hyperparameter preventing division by zero in Adam updates Returns: parameters -- python dictionary containing your updated parameters v -- Adam variable, moving average of the first gradient, python dictionary s -- Adam variable, moving average of the squared gradient, python dictionary """ L = len(parameters) // 2 # number of layers in the neural networks v_corrected = {} # Initializing first moment estimate, python dictionary s_corrected = {} # Initializing second moment estimate, python dictionary # Perform Adam update on all parameters for l in range(L): # Moving average of the gradients. Inputs: "v, grads, beta1". Output: "v". ### START CODE HERE ### (approx. 2 lines) v["dW" + str(l+1)] = beta1 * v["dW" + str(l+1)] + (1-beta1) * grads['dW' + str(l+1)] v["db" + str(l+1)] = beta1 * v["db" + str(l+1)] + (1-beta1) * grads['db' + str(l+1)] ### END CODE HERE ### # Compute bias-corrected first moment estimate. Inputs: "v, beta1, t". Output: "v_corrected". ### START CODE HERE ### (approx. 2 lines) v_corrected["dW" + str(l+1)] = v["dW" + str(l+1)] / (1 - beta1 ** t) v_corrected["db" + str(l+1)] = v["db" + str(l+1)] / (1 - beta1 ** t) ### END CODE HERE ### # Moving average of the squared gradients. Inputs: "s, grads, beta2". Output: "s". ### START CODE HERE ### (approx. 2 lines) s["dW" + str(l+1)] = beta2 * s["dW" + str(l+1)] + (1-beta2) * (grads['dW' + str(l+1)] ** 2) s["db" + str(l+1)] = beta2 * s["db" + str(l+1)] + (1-beta2) * (grads['db' + str(l+1)] ** 2) ### END CODE HERE ### # Compute bias-corrected second raw moment estimate. Inputs: "s, beta2, t". Output: "s_corrected". ### START CODE HERE ### (approx. 2 lines) s_corrected["dW" + str(l+1)] = s["dW" + str(l+1)] / (1 - beta2 ** t) s_corrected["db" + str(l+1)] = s["db" + str(l+1)] / (1 - beta2 ** t) ### END CODE HERE ### # Update parameters. Inputs: "parameters, learning_rate, v_corrected, s_corrected, epsilon". Output: "parameters". ### START CODE HERE ### (approx. 2 lines) parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * v_corrected["dW" + str(l+1)] / (s_corrected["dW" + str(l+1)] ** 0.5 + epsilon) parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * v_corrected["db" + str(l+1)] / (s_corrected["db" + str(l+1)] ** 0.5 + epsilon) ### END CODE HERE ### return parameters, v, s
使用不同优化算法的模型效果对比
train_X, train_Y = load_dataset()
基本模型实现如下(一个三层的神经网络模型):
def model(X, Y, layers_dims, optimizer, learning_rate = 0.0007, mini_batch_size = 64, beta = 0.9, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8, num_epochs = 10000, print_cost = True): """ 3-layer neural network model which can be run in different optimizer modes. Arguments: X -- input data, of shape (2, number of examples) Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples) layers_dims -- python list, containing the size of each layer learning_rate -- the learning rate, scalar. mini_batch_size -- the size of a mini batch beta -- Momentum hyperparameter beta1 -- Exponential decay hyperparameter for the past gradients estimates beta2 -- Exponential decay hyperparameter for the past squared gradients estimates epsilon -- hyperparameter preventing division by zero in Adam updates num_epochs -- number of epochs print_cost -- True to print the cost every 1000 epochs Returns: parameters -- python dictionary containing your updated parameters """ L = len(layers_dims) # number of layers in the neural networks costs = [] # to keep track of the cost t = 0 # initializing the counter required for Adam update seed = 10 # For grading purposes, so that your "random" minibatches are the same as ours # Initialize parameters parameters = initialize_parameters(layers_dims) # Initialize the optimizer if optimizer == "gd": pass # no initialization required for gradient descent elif optimizer == "momentum": v = initialize_velocity(parameters) elif optimizer == "adam": v, s = initialize_adam(parameters) # Optimization loop for i in range(num_epochs): # Define the random minibatches. We increment the seed to reshuffle differently the dataset after each epoch seed = seed + 1 minibatches = random_mini_batches(X, Y, mini_batch_size, seed) for minibatch in minibatches: # Select a minibatch (minibatch_X, minibatch_Y) = minibatch # Forward propagation a3, caches = forward_propagation(minibatch_X, parameters) # Compute cost cost = compute_cost(a3, minibatch_Y) # Backward propagation grads = backward_propagation(minibatch_X, minibatch_Y, caches) # Update parameters if optimizer == "gd": parameters = update_parameters_with_gd(parameters, grads, learning_rate) elif optimizer == "momentum": parameters, v = update_parameters_with_momentum(parameters, grads, v, beta, learning_rate) elif optimizer == "adam": t = t + 1 # Adam counter parameters, v, s = update_parameters_with_adam(parameters, grads, v, s, t, learning_rate, beta1, beta2, epsilon) # Print the cost every 1000 epoch if print_cost and i % 1000 == 0: print ("Cost after epoch %i: %f" %(i, cost)) if print_cost and i % 100 == 0: costs.append(cost) # plot the cost plt.plot(costs) plt.ylabel('cost') plt.xlabel('epochs (per 100)') plt.title("Learning rate = " + str(learning_rate)) plt.show() return parameters
小批量梯度下降算法
layers_dims = [train_X.shape[0], 5, 2, 1] parameters = model(train_X, train_Y, layers_dims, optimizer = "gd") # Predict predictions = predict(train_X, train_Y, parameters) # Plot decision boundary plt.title("Model with Gradient Descent optimization") axes = plt.gca() axes.set_xlim([-1.5,2.5]) axes.set_ylim([-1,1.5]) plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
使用Momentum的小批量梯度算法
layers_dims = [train_X.shape[0], 5, 2, 1] parameters = model(train_X, train_Y, layers_dims, beta = 0.9, optimizer = "momentum") # Predict predictions = predict(train_X, train_Y, parameters) # Plot decision boundary plt.title("Model with Momentum optimization") axes = plt.gca() axes.set_xlim([-1.5,2.5]) axes.set_ylim([-1,1.5]) plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
使用Adam优化的小批量梯度算法
layers_dims = [train_X.shape[0], 5, 2, 1] parameters = model(train_X, train_Y, layers_dims, optimizer = "adam") # Predict predictions = predict(train_X, train_Y, parameters) # Plot decision boundary plt.title("Model with Adam optimization") axes = plt.gca() axes.set_xlim([-1.5,2.5]) axes.set_ylim([-1,1.5]) plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
总结一下:
不同优化算法的对比结果如下表:
对比三种方法,我们通常会发现Adam算法往往可以得到更好的结果。
更多更详细的内容,请访问原创网站:
http://www.missshi.cn/api/view/blog/59bbcae0e519f50d04000204
Ps:初次访问由于js文件较大,请耐心等候(8s左右)
相关文章推荐
- 深度学习DeepLearning.ai系列课程学习总结:11. 优化算法理论讲解
- 深度学习DeepLearning.ai系列课程学习总结:10. 初始化、正则化、梯度检查实战
- 深度学习DeepLearning.ai系列课程学习总结:6. 具有一个隐藏层的平面数据分类代码实战
- 深度学习DeepLearning.ai系列课程学习总结:8. 多层神经网络代码实战
- 深度学习DeepLearning.ai系列课程学习总结:2. 神经网络基础
- 深度学习DeepLearning.ai系列课程学习总结:课程概述
- 深度学习DeepLearning.ai系列课程学习总结:13. 超参数调优、Batch正则化理论及深度学习框架学习
- 深度学习DeepLearning.ai系列课程学习总结:5. 浅层神经网络
- 深度学习DeepLearning.ai系列课程学习总结:14. Tensorflow入门
- 深度学习DeepLearning.ai系列课程学习总结:1. 深度学习简介
- 深度学习DeepLearning.ai系列课程学习总结:3. Python矢量化实现神经网络
- Coursera deeplearning.ai 深度学习笔记2-2-Optimization algorithms-优化算法与代码实现
- 【备忘】深度学习实战决胜AI-强化学习实战系列视频课程
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(2-2)-- 优化算法
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(2-2)-- 优化算法
- Fast.ai: Practical Deep Learning for coders 课程学习笔记:Part1 Lesson1 (Lesson0)
- Fast.ai: Practical Deep Learning for coders 课程学习笔记:Part1 Lesson1 (Lesson0)
- Fast.ai: Practical Deep Learning for coders 课程学习笔记:Part1 Lesson1 (Lesson0)
- DeepLearning.ai学习笔记(一)神经网络和深度学习--Week4深层神经网络
- Fast.ai: Practical Deep Learning for coders 课程学习笔记:Part1 Lesson1 (Lesson0)