您的位置:首页 > 其它

cs231n - assignment1 - linear-svm 梯度推导

2016-07-15 22:59 423 查看

Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

- implement a fully-vectorized loss function for the SVM

- implement the fully-vectorized expression for its analytic gradient

- check your implementation using numerical gradient

- use a validation set to tune the learning rate and regularization strength

- optimize the loss function with SGD

- visualize the final learned weights

这个题目主要难点是 Loss 对 W 的偏导数要弄清楚怎么求, 然后就可以程序实现了。

首先参考 lecture3 中的相关公式:

L=1N∑iLi+λ∑kW2k

Li=∑j≠yimax(0,Lij)=∑j≠yimax(0,wTjxi−wTyixi+1)

首先求一个样本的 Li 的一个分量 Lij 对 W 的列向量 wj 的偏导数, 对于大于 0 的Lij 才用求导数:

每一个大于零的项会给导数的两个列带来贡献,对于 j!=yi 的列向量,给导数的第 j 列带来 xi 的贡献(dWj和一个样本xi包含的元素一样多,xi对应位置的分量给对应位置的dWj分量带来贡献),对于j==yi的列向量,带来−xi的贡献:

j==yi:

∂Lij∂wj=xTim

j!=yi:

∂Lij∂wj=−xTim

对 Li 的每一个大于 0 的分量 Lij都求出给导数dW带来的贡献,就可以求得 Li 给 dW 带来的贡献。然后再多所有的样本累计求一遍,然后再除以样本总数,并加上正则项,就可以得到我们要求的 dW。

# linear_svm.py
import numpy as np
from random import shuffle

def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).

Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.

Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength

Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero

# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
num_dimension = W.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin

# calculate the dW, Sj - Syi + 1(j!=yi)
dW[:,j] += X[i,:].T
dW[:,y[i]] -= X[i,:].T

# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train

# Add regularization to the loss.
loss += 0.5 * reg * np.sum(W * W)
dW += reg*W

#############################################################################
# TODO:                                                                     #
# Compute the gradient of the loss function and store it dW.                #
# Rather that first computing the loss and then computing the derivative,   #
# it may be simpler to compute the derivative at the same time that the     #
# loss is being computed. As a result you may need to modify some of the    #
# code above to compute the gradient.                                       #
#############################################################################

return loss, dW

def svm_loss_vectorized(W, X, y, reg):
"""
Structured SVM loss function, vectorized implementation.

Inputs and outputs are the same as svm_loss_naive.
"""
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero

#############################################################################
# TODO:                                                                     #
# Implement a vectorized version of the structured SVM loss, storing the    #
# result in loss.                                                           #
#############################################################################
XW = X.dot(W)
num_train = X.shape[0]
Sy = np.zeros(num_train)

for i in xrange(num_train):
Sy[i] = XW[i, y[i]]

WX = XW.T - Sy + 1

for i in xrange(num_train):
WX[y[i],i] -= 1

loss = np.sum( WX[WX > 0] )
loss /= num_train
#############################################################################
#                             END OF YOUR CODE                              #
#############################################################################

#############################################################################
# TODO:                                                                     #
# Implement a vectorized version of the gradient for the structured SVM     #
# loss, storing the result in dW.                                           #
#                                                                           #
# Hint: Instead of computing the gradient from scratch, it may be easier    #
# to reuse some of the intermediate values that you used to compute the     #
# loss.                                                                     #
#############################################################################
# keep only positive elements
XW = WX.T
num_classes = W.shape[1]
for i in xrange(num_train):
for j in xrange(num_classes):
if (XW[i, j] > 0):
dW[:,j] += X[i,:].T
dW[:,y[i]] -= X[i,:].T

dW /= num_train
dW += reg * W
#############################################################################
#                             END OF YOUR CODE                              #
#############################################################################

return loss, dW


# svm.ipynb
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = [1.4e-7, 1.5e-7, 1.6e-7]
regularization_strengths = [3e4, 3.1e4, 3.2e4, 3.3e4, 3.4e4]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################
params = [(x,y) for x in learning_rates for y in regularization_strengths]
for lrate, regular in params:
svm = LinearSVM()
loss_hist = svm.train(X_train, y_train, learning_rate=lrate, reg=regular,
num_iters=700, verbose=False)
y_train_pred = svm.predict(X_train)
accuracy_train = np.mean(y_train == y_train_pred)
y_val_pred = svm.predict(X_val)
accuracy_val = np.mean(y_val == y_val_pred)
results[(lrate, regular)]=(accuracy_train, accuracy_val)
if (best_val < accuracy_val):
best_val = accuracy_val
best_svm = svm

################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
train_accuracy, val_accuracy = results[(lr, reg)]
print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
lr, reg, train_accuracy, val_accuracy)

print 'best validation accuracy achieved during cross-validation: %f' % best_val
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  cs231n