COURSE 1 Neural Networks and Deep Learning
2017-12-11 14:56
537 查看
Week1
What is neural network?
It is a powerful learning algorithm inspired by how the brain works.Example 1 - single neural network
Given data about the size of houses on the real estate market and you want to fit a function that willpredict their price. It is a linear regression problem because the price as a function of size is a continuous
output.
We know the prices can never be negative so we are creating a function called Rectified Linear Unit (ReLU)
which starts at zero.
The input is the size of the house (x)
The output is the price (y)
The “neuron” implements the function ReLU (blue line)
Example 2 – Multiple neural network
The price of a house can be affected by other features such as size, number of bedrooms, zip code andwealth. The role of the neural network is to predicted the price and it will automatically generate the
hidden units. We only need to give the inputs x and the output y.
Supervised learning for Neural Network
In supervised learning, we are given a data set and already know what our correct output should look like,having the idea that there is a relationship between the input and the output.
Supervised learning problems are categorized into “regression” and “classification” problems. In a
regression problem, we are trying to predict results within a continuous output, meaning that we are
trying to map input variables to some continuous function. In a classification problem, we are instead
trying to predict results in a discrete output. In other words, we are trying to map input variables into
discrete categories.
There are different types of neural network, for example Convolution Neural Network (CNN) used often
for image application and Recurrent Neural Network (RNN) used for one-dimensional sequence data
such as translating English to Chinses or a temporal component such as text transcript. As for the
autonomous driving, it is a hybrid neural network architecture.
Neural Network examples
Structured vs unstructured data
Structured data refers to things that has a defined meaning such as price, age whereas unstructureddata refers to thing like pixel, raw audio, text.
Why is deep learning taking off?
Deep learning is taking off due to a large amount of data available through the digitization of the society, faster computation and innovation in the development of neural network algorithm.Two things have to be considered to get to the high level of performance:
Being able to train a big enough neural network
Huge amount of labeled data
The process of training a neural network is iterative.
It could take a good amount of time to train a neural network, which affects your productivity. Faster computation helps to iterate and improve new algorithm.
Week2
Binary Classification
In a binary classification problem, the result is a discrete value outputNotation
a training example:(x,y),x∈ℝnx,y∈{0,1}
m training examples:
{(x(1),y(1)),(x(2),y(2)),...,(x(m),y(m))}m=mtrain=# of train examples
matrix:
X=[x(1),x(2),...,x(m)]∈ℝnx×mY=[y(1),y(2),...,y(m)]∈ℝ1×m
goal:
Given x,ŷ =P(y=1|x),where 0≤ŷ
Logistic Regression
parameters
The input features vector:x∈ℝnx,where nx is the number of features
The training label:
y∈{0,1}
The weights:
w∈ℝnX,where nx is the number of features
The threshold:
b∈ℝ
The output:
ŷ =σ(wTx+b)
Sigmoid function:
s=σ(wtx+b)=σ(z)=11+e−z
Loss (error) function:
ℓ(ŷ ,y)=−(ylog(ŷ )+(1−y)log(1−ŷ ))Cost function:
J(w,b)=1m∑i=1mℓ(ŷ (i),y(i))=−1m∑i=1m(y(i)log(ŷ (i))+(1−y(i))log(1−ŷ (i)))Gradient Descent
Want to find w and b that minimize J(w, b)Process
Repeatw:=w−α∂J(w,b)∂wb:=b−α∂J(b,w)∂b
Logistic Regression Gradient Descent
Recapz=wTx+bŷ =a=σ(z)ℓ(a,y)=−(ylog(a)+(1−y)log(1−a))
Gradient Descent
dz=∂ℓ∂z=a−y=a(1−a)dw1=∂ℓ∂w1=x1⋅dzdw2=∂ℓ∂w2=x2⋅dz...db=∂ℓ∂b=dzProcess
w1:=w1−αdw1w2:=w2−αdw2...b:=b−αdbGradient Descent on m examples
RecapJ(w,b)=1m∑i=1mℓ(a(i),y(i))=−1m∑i=1m(y(i)log(a(i))+(1−y(i))log(1−a(i)))a(i)=y(i)=σ(z(i))=σ(wTx+b)
Descent
dz(i)=∂ℓ∂z(i)=a(i)−y(i)dw1=1m∑i=1m∂ℓ∂w1=1m∑i=1mx1⋅dz(i)dw2=1m∑i=1m∂ℓ∂w2=1m∑i=1mx2⋅dz(i)...db=1m∑i=1m∂ℓ∂b=1m∑i=1mdz(i)
Pseudocode
Vectorization
Logistic Regression Derivatives
Vectorizing Logistic Regression
X=[x(1),x(2),...,x(m)]Y=[y(1),y(2),...,y(m)]Z=[z(1),z(2),...,z(m)]A=[a(1),a(2),...,a(m)]=σ(Z)Implementing Logistic Regression
Broadcasting in Python
General Principle
(m,n)[+−∗/](1,n)→(m,n)[+−∗/](m,n)(m,n)[+−∗/](m,1)→(m,n)[+−∗/](m,n)Week3
Neural Networks Overview
Neural Network Representation
Computing a Neural Network’s Output
z[1]=W[1]Tx+b[1]=W[1]Ta[0]+b[1]a[1]=σ(z[1])z[2]=W[2]Ta[1]+b[2]a[2]=σ(z[2])...Vectorizing across multiple examples
a[2](i):example i, layer 2Activation functions
sigmoida=11+e−z,a′=a(1−a)tanha=ez−e−zez+e−z,a′=1−a2ReLUa=max(0,z),a′={01if z<0if z≥0leaky ReLUa=max(0.01z,z).a′={0.011if z<0if z≥0Why do you need non-linear activation functions
Supposez[1]=W[1]x+b[1]a[1]=g[1](z[1])=z[1]z[2]=W[2]a[1]+b[2]a[2]=g[2](z[2])=z[2]
Then
a[1]=z[1]=W[1]x+b[1]a[2]=z[2]=W[2]a[1]+b[2]→a[2]=W[2](W[1]x+b[1])+b[2]=(W[2]W[1])x+(W[2]b[1]+b[2])
It is similar to
a[2]=W′x+b′
If you were to use linear activation functions or we go to call them identity activation functions, then the new network is just outputting a linear function of the input and we’ll talk about deep networks later new networks with many many layers, many many hidden layers and it turns out that if you use a linear activation function or alternatively if you don’t have an activation function. Then no matter how many layers, your neural network has always doing is just computing a linear activation function.
Gradient Descent for Neural Networks
Backpropogation
dZ[2]=g[2]′(Z[2])dW[2]=1mdZ[2]A[1]Tdb[2]=1mnp.sum(dZ[2],axis=1,keepdims=True)dz[1]=W[2]TdZ[2]∘g[1]′(Z[1])dW[1]=1mdZ[1]XTdb[1]=1mnp.sum(dZ[1],axis=1,keepdims=True)Random Initialization
If initializing weights to zeros, then all weights will update symmetricly. Then no matter how many nodes in one layer, your neural network has always doing is just using one node in one layer.Week4
Building Blocks of Deep Neural Networks
Propagation
Forward Propagation for Layer l
Inputa[l−1]
Cache
z[l]=W[l]a[l−1]+b[l]
Output
a[l]=g[l](z[l])
Vectorized
InputA[l−1]
Cache
Z[l]=W[l]A[l−1]+b[l]
Output
A[l]=g[l](Z[l])
Backward Propagation for Layer l
Inputda[l]
Local
dz[l]=da[l]∘g[l]′(z[l])
Output
dW[l]=dz[l]a[l−1]db[l]=dz[l]da[l−1]=W[l]Tdz[l]
Vectorized
InputdA[l]
Local
dZ[l]=dA[l]∘g[l]′(Z[l])
Output
dW[l]=1mdZ[l]A[l−1]db[l]=1mnp.sum(dZ[l],axis=1,keepdims=True)dA[l−1]=W[l]TdZ[l]
Parameters vs Hyperparameters
Parameters
W[1],b[1]W[2],b[2]...Hyperparameters
Hyperparameters can control W and blearning rate α# of iterations# of hidden layers L# of hidden units n[1],n[2],...choice of activation functionmomentum termmini batch sizevarious forms of regularization parameters
相关文章推荐
- TensorFlow官方教程《Neural Networks and Deep Learning》译(第一章)
- neural-networks-and-deep-learning network.py
- neural-networks-and-deep-learning misleading_gradient_contours.py
- neural-networks-and-deep-learning valley.py
- Simultaneous Feature Learning and Hash Coding with Deep Neural Networks
- 【Notes on Neural Networks and Deep Learning】(to be continued)
- End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human
- 《Neural Networks and Deep Learning》读书笔记:最简单的识别MNIST的神经网络程序(2)
- neural-networks-and-deep-learning backprop_magnitude_nabla.py
- Neural Networks and Deep Learning 学习笔记(四)
- Neural Networks and Deep Learning 学习笔记(八)
- 《Neural Networks and Deep Learning》的理论知识点
- 《Neural Networks and Deep Learning》codes' note
- neural-networks-and-deep-learning test.py
- Neural Networks and Deep Learning CH3
- neural networks and deeplearning chapter 3.1
- Neural Networks and Deep Learning 2
- 《Neural networks and deep learning》概览
- Neural Networks and Deep Learning 学习笔记(十一)
- neuralnetworksanddeeplearning学习_权重初始化方法