您的位置：首页 > 其它

CS231n课程作业(一) Two-layer Neural Network

2018-01-21 14:13 543 查看

神经网络的过程主要就是forward propagation和backward propagation。

forward propagation to evaluate score function & loss function, then back propagation 对每一层计算loss对W和b的梯度，利用梯度完成W和b的更新。

总体过程可以理解为：forward–>backward–>update–>forward–>backward–>update……

一、理论知识

1. 直观理解

这是一个两层的full-connected神经网络，分别是输入层、隐藏层和输出层(输入层不算一层)。输入层4个节点表示样本是4维的，输出层3个节点表示有3类，输出的结果是每个类的score。

FC层(全连接层)：每个神经元都连接上一层的所有神经元的layer。

在神经网络中，bigger = better，即网络模型越大越好。more neurons = more capacity，因为机器学习中经常出现模型表达能力不足的情况。

对于神经网络的更形象理解：

2. score function

这里activation function采用的是ReLU函数。

Why we set activation function？

answer：如果不设置activation function，则每层的output都是input的线性函数，则无论隐藏层有多少层，都与没有隐藏层效果一样。因此，设置activation function，目的是为了对input进行非线性转化，然后output，使神经网络更有意义。

3. loss function及求梯度

本次作业中使用的是softmax loss function。可参考softmax classifier。

完成作业的关键仍是loss function对W求梯度。下面求gradient(注意：我在角标表示上有错误，但不影响理解和代码化)：

这样，score、loss和gradient都表达出来了，2层神经网络也就可以完成了。

二、Two-layer Neural Network

1. Forward Propagation & Backward Propagation

Forward: 计算score，再根据score计算loss

Backward：分别对W2、b2、W1、b1求梯度

def loss(self, X, y=None, reg=0.0):
"""
Compute the loss and gradients for a two layer fully connected neural
network.

Inputs:
- X: Input data of shape (N, D). Each X[i] is a training sample.
- y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
an integer in the range 0 <= y[i] < C. This parameter is optional; if it
is not passed then we only return scores, and if it is passed then we
instead return the loss and gradients.
- reg: Regularization strength.

Returns:
If y is None, return a matrix scores of shape (N, C) where scores[i, c] is
the score for class c on input X[i].

If y is not None, instead return a tuple of:
- loss: Loss (data loss and regularization loss) for this batch of training
samples.
- grads: Dictionary mapping parameter names to gradients of those parameters
with respect to the loss function; has the same keys as self.params.
"""
# Unpack variables from the params dictionary
W1, b1 = self.params['W1'], self.params['b1']
W2, b2 = self.params['W2'], self.params['b2']
N, D = X.shape

# Compute the forward pass
scores = None
#############################################################################
# TODO: Perform the forward pass, computing the class scores for the input. #
# Store the result in the scores variable, which should be an array of      #
# shape (N, C).                                                             #
#############################################################################
h1 = np.maximum(0, np.dot(X,W1) + b1) #(5,10)
#print (h1.shape)
scores = np.dot(h1,W2) + b2 # (5,3)
#print (scores.shape)
#############################################################################
#                              END OF YOUR CODE                             #
#############################################################################

# If the targets are not given then jump out, we're done
if y is None:
return scores

# Compute the loss
loss = None
#############################################################################
# TODO: Finish the forward pass, and compute the loss. This should include  #
# both the data loss and L2 regularization for W1 and W2. Store the result  #
# in the variable loss, which should be a scalar. Use the Softmax           #
# classifier loss.                                                          #
#############################################################################
exp_S = np.exp(scores) #(5,3)
sum_exp_S = np.sum(exp_S,axis = 1)
sum_exp_S = sum_exp_S.reshape(-1,1) #(5,1)
#print (sum_exp_S.shape)
loss = np.sum(-scores[range(N),list(y)]) + sum(np.log(sum_exp_S))
loss = loss / N + 0.5 * reg * np.sum(W1 * W1) +  0.5 * reg * np.sum(W2 * W2)
#print (loss)
#############################################################################
#                              END OF YOUR CODE                             #
#############################################################################

# Backward pass: compute gradients
grads = {}
#############################################################################
# TODO: Compute the backward pass, computing the derivatives of the weights #
# and biases. Store the results in the grads dictionary. For example,       #
# grads['W1'] should store the gradient on W1, and be a matrix of same size #
#############################################################################
'''
网络为4-10-3
对于第2层：input为h1(5,10)
W2:(10,3)
output为score:(5,3)
对于第1层：input为X(5,4)
W1(5,10)
output为h1(5,10)
'''
#---------------------------------#
dscores = np.zeros(scores.shape)
dscores[range(N),list(y)] = -1
dscores += (exp_S/sum_exp_S) #(5,3)
dscores /= N
grads['W2'] = np.dot(h1.T, dscores)
grads['W2'] += reg * W2
grads['b2'] = np.sum(dscores, axis = 0)
#---------------------------------#
dh1 = np.dot(dscores, W2.T)  #(5,10)
dh1_ReLU = (h1>0) * dh1
grads['W1'] = X.T.dot(dh1_ReLU) + reg * W1
grads['b1'] = np.sum(dh1_ReLU, axis = 0)
#---------------------------------#
'''
本人之前的写法：
缺点：无法很少的利用中间变量;
逻辑上可读性差
buf = np.zeros(scores.shape)
buf[range(N),list(y)] = -1
#print (buf+(exp_S/sum_exp_S))
grads['W2'] = np.dot(h1.T,(buf + (exp_S/sum_exp_S))) #(10,3)
grads['W2'] = grads['W2']/N + reg * W2
grads['b2'] = np.sum((exp_S/sum_exp_S)+buf, axis = 0) #(3,)

'''
#############################################################################
#                              END OF YOUR CODE                             #
#############################################################################

return loss, grads

2. Gradient Check

首先计算numeric gradient：

def eval_numerical_gradient(f, x, verbose=True, h=0.00001):
"""
a naive implementation of numerical gradient of f at x
- f should be a function that takes a single argument
- x is the point (numpy array) to evaluate the gradient at
"""

fx = f(x) # evaluate function value at original point
# x: 权重W1,W2,b2,b1
grad = np.zeros_like(x)
# iterate over all indexes in x
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) #构建一个迭代器
while not it.finished:

# evaluate function at x+h
ix = it.multi_index #输出结果为元素的索引
oldval = x[ix] #权重中的元素值
x[ix] = oldval + h # increment by h
fxph = f(x) # evalute f(x + h)
x[ix] = oldval - h
fxmh = f(x) # evaluate f(x - h)
x[ix] = oldval # restore

# compute the partial derivative with centered formula
grad[ix] = (fxph - fxmh) / (2 * h) # the slope
if verbose:
print(ix, grad[ix]) #输出数值梯度
it.iternext() # step to next dimension，如果不加这句话，则始终保持在(0,0)

return grad #返回的是数值梯度

然后check：

from cs231n.gradient_check import eval_numerical_gradient

# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.

loss, grads = net.loss(X, y, reg=0.1)
#print (loss)
#print (grads)
# these should all be less than 1e-8 or so
for param_name in grads:
f = lambda W: net.loss(X, y, reg=0.1)[0] #input：W; output：loss
param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False) #返回数值梯度
#print (np.abs(param_grad_num - grads[param_name]))
print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))

3. Train a Network

既然forward和backward都已经完成了，且gradient检查通过，接下来就是训练出一个较好的模型。

基本迭代过程是这样的：forward–>backward–>update–>forward–>backward–>update……

def train(self, X, y, X_val, y_val,
learning_rate=1e-3, learning_rate_decay=0.95,
reg=5e-6, num_iters=100,
batch_size=200, verbose=False):
"""
Train this neural network using stochastic gradient descent.

Inputs:
- X: A numpy array of shape (N, D) giving training data.
- y: A numpy array f shape (N,) giving training labels; y[i] = c means that
X[i] has label c, where 0 <= c < C.
- X_val: A numpy array of shape (N_val, D) giving validation data.
- y_val: A numpy array of shape (N_val,) giving validation labels.
- learning_rate: Scalar giving learning rate for optimization.
- learning_rate_decay: Scalar giving factor used to decay the learning rate
after each epoch.
- reg: Scalar giving regularization strength.
- num_iters: Number of steps to take when optimizing.
- batch_size: Number of training examples to use per step.
- verbose: boolean; if true print progress during optimization.
"""
num_train = X.shape[0]
iterations_per_epoch = max(num_train / batch_size, 1)

# Use SGD to optimize the parameters in self.model
loss_history = []
train_acc_history = []
val_acc_history = []

for it in xrange(num_iters):
X_batch = None
y_batch = None

#########################################################################
# TODO: Create a random minibatch of training data and labels, storing  #
# them in X_batch and y_batch respectively.                             #
#########################################################################
mask = np.random.choice(num_train,batch_size,replace = True)
X_batch = X[mask]
y_batch = y[mask]
#########################################################################
#                             END OF YOUR CODE                          #
#########################################################################

# Compute loss and gradients using the current minibatch
loss, grads = self.loss(X_batch, y=y_batch, reg=reg)
loss_history.append(loss)

#########################################################################
# TODO: Use the gradients in the grads dictionary to update the         #
# parameters of the network (stored in the dictionary self.params)      #
# using stochastic gradient descent. You'll need to use the gradients   #
# stored in the grads dictionary defined above.                         #
#########################################################################
self.params['W1'] += -learning_rate * grads['W1']
self.params['b1'] += -learning_rate * grads['b1']
self.params['W2'] += -learning_rate * grads['W2']
self.params['b2'] += -learning_rate * grads['b2']
#########################################################################
#                             END OF YOUR CODE                          #
#########################################################################

if verbose and it % 100 == 0:
print('iteration %d / %d: loss %f' % (it, num_iters, loss))

# Every epoch, check train and val accuracy and decay learning rate.
if it % iterations_per_epoch == 0:
# Check accuracy
#print ('第%d个epoch' %it)
train_acc = (self.predict(X_batch) == y_batch).mean()
val_acc = (self.predict(X_val) == y_val).mean()
train_acc_history.append(train_acc)
val_acc_history.append(val_acc)

# Decay learning rate
learning_rate *= learning_rate_decay #减小学习率

return {
'loss_history': loss_history,
'train_acc_history': train_acc_history,
'val_acc_history': val_acc_history,
}

训练完模型，即可以对未知样本做出prediction。

三、涉及到的numpy重要函数

1. np.random.seed(a)

function: 选定固定的随机数，a为任意数字(如0)。即后面每次产生的随机数都固定。

2. np.nditer(array, flags=[‘multi_index’], op_flags=[‘readwrite’])

function: 构建一个迭代器。

参数1：数组; 参数2：多重索引; 参数3：可读写

本作业中，用于输出数组的index，从而进行数值梯度的计算。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航