cs231n:assignment1——Q3: Implement a Softmax classifier
2016-11-25 12:14
519 查看
Jupyter notebook softmaxipynb 内容
Softmax exercise
Softmax Classifier
Inline Question 1
softmaxpy 内容
linear_classifierpy 内容
This exercise is analogous to the SVM exercise. You will:
implement a fully-vectorized loss function for the Softmax classifier
implement the fully-vectorized expression for its analytic gradient
check your implementation with numerical gradient
use a validation set to tune the learning rate and regularization strength
optimize the loss function with SGD
visualize the final learned weights
Your answer: because initialization is random and the sum of classes is 10, so the probably of predict correctly class number is 1/10, then loss would be -log(0.1)
Softmax exercise
Softmax Classifier
Inline Question 1
softmaxpy 内容
linear_classifierpy 内容
Jupyter notebook softmax.ipynb 内容:
Softmax exercise
Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.This exercise is analogous to the SVM exercise. You will:
implement a fully-vectorized loss function for the Softmax classifier
implement the fully-vectorized expression for its analytic gradient
check your implementation with numerical gradient
use a validation set to tune the learning rate and regularization strength
optimize the loss function with SGD
visualize the final learned weights
import random import numpy as np from cs231n.data_utils import load_CIFAR10 import matplotlib.pyplot as plt %matplotlib inline plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray' # for auto-reloading extenrnal modules # see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython %load_ext autoreload %autoreload 2
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500): """ Load the CIFAR-10 dataset from disk and perform preprocessing to prepare it for the linear classifier. These are the same steps as we used for the SVM, but condensed to a single function. """ # Load the raw CIFAR-10 data cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) # subsample the data mask = range(num_training, num_training + num_validation) X_val = X_train[mask] y_val = y_train[mask] mask = range(num_training) X_train = X_train[mask] y_train = y_train[mask] mask = range(num_test) X_test = X_test[mask] y_test = y_test[mask] mask = np.random.choice(num_training, num_dev, replace=False) X_dev = X_train[mask] y_dev = y_train[mask] # Preprocessing: reshape the image data into rows X_train = np.reshape(X_train, (X_train.shape[0], -1)) X_val = np.reshape(X_val, (X_val.shape[0], -1)) X_test = np.reshape(X_test, (X_test.shape[0], -1)) X_dev = np.reshape(X_dev, (X_dev.shape[0], -1)) # Normalize the data: subtract the mean image mean_image = np.mean(X_train, axis = 0) X_train -= mean_image X_val -= mean_image X_test -= mean_image X_dev -= mean_image # add bias dimension and transform into columns X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))]) X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))]) X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))]) X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))]) return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev # Invoke the above function to get our data. X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data() print 'Train data shape: ', X_train.shape print 'Train labels shape: ', y_train.shape print 'Validation data shape: ', X_val.shape print 'Validation labels shape: ', y_val.shape print 'Test data shape: ', X_test.shape print 'Test labels shape: ', y_test.shape print 'dev data shape: ', X_dev.shape print 'dev labels shape: ', y_dev.shape
Train data shape: (49000, 3073) Train labels shape: (49000,) Validation data shape: (1000, 3073) Validation labels shape: (1000,) Test data shape: (1000, 3073) Test labels shape: (1000,) dev data shape: (500, 3073) dev labels shape: (500,)
Softmax Classifier
Your code for this section will all be written inside cs231n/classifiers/softmax.py.# First implement the naive softmax loss function with nested loops. # Open the file cs231n/classifiers/softmax.py and implement the # softmax_loss_naive function. from cs231n.classifiers.softmax import softmax_loss_naive import time # Generate a random softmax weight matrix and use it to compute the loss. W = np.random.randn(3073, 10) * 0.0001 loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0) # As a rough sanity check, our loss should be something close to -log(0.1). print 'loss: %f' % loss print 'sanity check: %f' % (-np.log(0.1))
loss: 2.382097 sanity check: 2.302585
Inline Question 1:
Why do we expect our loss to be close to -log(0.1)? Explain briefly.**Your answer: because initialization is random and the sum of classes is 10, so the probably of predict correctly class number is 1/10, then loss would be -log(0.1)
# Complete the implementation of softmax_loss_naive and implement a (naive) # version of the gradient that uses nested loops. loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0) # As we did for the SVM, use numeric gradient checking as a debugging tool. # The numeric gradient should be close to the analytic gradient. from cs231n.gradient_check import grad_check_sparse f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0] grad_numerical = grad_check_sparse(f, W, grad, 10) # similar to SVM case, do another gradient check with regularization loss, grad = softmax_loss_naive(W, X_dev, y_dev, 1e2) f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 1e2)[0] grad_numerical = grad_check_sparse(f, W, grad, 10)
numerical: -2.808017 analytic: -2.808017, relative error: 2.427338e-08 numerical: 2.623199 analytic: 2.623199, relative error: 1.099454e-08 numerical: 2.103685 analytic: 2.103685, relative error: 2.239864e-09 numerical: 0.707572 analytic: 0.707572, relative error: 1.118403e-08 numerical: 0.075667 analytic: 0.075667, relative error: 1.060205e-07 numerical: 0.518481 analytic: 0.518480, relative error: 6.611405e-08 numerical: -0.330835 analytic: -0.330835, relative error: 7.891339e-08 numerical: -0.231122 analytic: -0.231122, relative error: 2.356736e-07 numerical: -3.721387 analytic: -3.721387, relative error: 2.974035e-09 numerical: 3.969571 analytic: 3.969571, relative error: 2.303792e-08 numerical: -2.865714 analytic: -2.865714, relative error: 2.069379e-08 numerical: -0.233447 analytic: -0.233447, relative error: 5.429056e-09 numerical: -2.300726 analytic: -2.300726, relative error: 1.802526e-08 numerical: -4.972360 analytic: -4.972360, relative error: 9.030189e-09 numerical: 1.826103 analytic: 1.826103, relative error: 2.318045e-08 numerical: 2.912138 analytic: 2.912138, relative error: 2.626010e-08 numerical: 4.397912 analytic: 4.397911, relative error: 2.148299e-08 numerical: 1.548278 analytic: 1.548278, relative error: 4.536950e-08 numerical: -1.672722 analytic: -1.672722, relative error: 8.208277e-09 numerical: 1.472942 analytic: 1.472942, relative error: 3.173142e-09
# Now that we have a naive implementation of the softmax loss function and its gradient, # implement a vectorized version in softmax_loss_vectorized. # The two versions should compute the same results, but the vectorized version should be # much faster. tic = time.time() loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.00001) toc = time.time() print 'naive loss: %e computed in %fs' % (loss_naive, toc - tic) from cs231n.classifiers.softmax import softmax_loss_vectorized tic = time.time() loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.00001) toc = time.time() print 'vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic) # As we did for the SVM, we use the Frobenius norm to compare the two versions # of the gradient. grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro') print 'Loss difference: %f' % np.abs(loss_naive - loss_vectorized) print 'Gradient difference: %f' % grad_difference
naive loss: 2.382097e+00 computed in 0.143677s vectorized loss: 9.704155e-02 computed in 0.063678s Loss difference: 2.285055 Gradient difference: 0.000000
# Use the validation set to tune hyperparameters (regularization strength and # learning rate). You should experiment with different ranges for the learning # rates and regularization strengths; if you are careful you should be able to # get a classification accuracy of over 0.35 on the validation set. from cs231n.classifiers import Softmax results = {} best_val = -1 best_softmax = None learning_rates = [1e-7, 5e-7] regularization_strengths = [5e4, 1e8] ################################################################################ # TODO: # # Use the validation set to set the learning rate and regularization strength. # # This should be identical to the validation that you did for the SVM; save # # the best trained softmax classifer in best_softmax. # ################################################################################ num_splt_lr = 3 num_splt_rs = 8 for i in xrange(num_splt_lr): for j in xrange(num_splt_rs): learning_rate_ij = learning_rates[0] + i * (learning_rates[1] - learning_rates[0]) / num_splt_lr reg_ij = regularization_strengths[0] + j * (regularization_strengths[1] - regularization_strengths[0])/ num_splt_rs softmax = Softmax() loss_hist = softmax.train(X_train, y_train, learning_rate=learning_rate_ij, reg=reg_ij, num_iters=1500, verbose=False) y_train_pred = softmax.predict(X_train) accuracy_train = np.mean(y_train == y_train_pred) y_val_pred = softmax.predict(X_val) accuracy_val = np.mean(y_val == y_val_pred) results[(learning_rate_ij, reg_ij)] = (accuracy_train, accuracy_val) if accuracy_val > best_val: best_val = accuracy_val best_softmax = softmax ################################################################################ # END OF YOUR CODE # ################################################################################ # Print out results. for lr, reg in sorted(results): train_accuracy, val_accuracy = results[(lr, reg)] print 'lr %e reg %e train accuracy: %f val accuracy: %f' % ( lr, reg, train_accuracy, val_accuracy) print 'best validation accuracy achieved during cross-validation: %f' % best_val
cs231n/classifiers/softmax.py:79: RuntimeWarning: overflow encountered in exp exp_scores = np.exp(scores) cs231n/classifiers/softmax.py:84: RuntimeWarning: invalid value encountered in divide norm_exp_scores = exp_scores / row_sum lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.324122 val accuracy: 0.340000 lr 1.000000e-07 reg 1.254375e+07 train accuracy: 0.210694 val accuracy: 0.220000 lr 1.000000e-07 reg 2.503750e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 1.000000e-07 reg 3.753125e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 1.000000e-07 reg 5.002500e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 1.000000e-07 reg 6.251875e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 1.000000e-07 reg 7.501250e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 1.000000e-07 reg 8.750625e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 2.333333e-07 reg 5.000000e+04 train accuracy: 0.333796 val accuracy: 0.355000 lr 2.333333e-07 reg 1.254375e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 2.333333e-07 reg 2.503750e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 2.333333e-07 reg 3.753125e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 2.333333e-07 reg 5.002500e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 2.333333e-07 reg 6.251875e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 2.333333e-07 reg 7.501250e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 2.333333e-07 reg 8.750625e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 3.666667e-07 reg 5.000000e+04 train accuracy: 0.323347 val accuracy: 0.345000 lr 3.666667e-07 reg 1.254375e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 3.666667e-07 reg 2.503750e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 3.666667e-07 reg 3.753125e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 3.666667e-07 reg 5.002500e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 3.666667e-07 reg 6.251875e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 3.666667e-07 reg 7.501250e+07 train accuracy: 0.100265 val accuracy: 0.087000 lr 3.666667e-07 reg 8.750625e+07 train accuracy: 0.100265 val accuracy: 0.087000 best validation accuracy achieved during cross-validation: 0.355000
# evaluate on test set # Evaluate the best softmax on test set y_test_pred = best_softmax.predict(X_test) test_accuracy = np.mean(y_test == y_test_pred) print 'softmax on raw pixels final test set accuracy: %f' % (test_accuracy, )
softmax on raw pixels final test set accuracy: 0.346000
# Visualize the learned weights for each class w = best_softmax.W[:-1,:] # strip out the bias w = w.reshape(32, 32, 3, 10) w_min, w_max = np.min(w), np.max(w) classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] for i in xrange(10): plt.subplot(2, 5, i + 1) # Rescale the weights to be between 0 and 255 wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min) plt.imshow(wimg.astype('uint8')) plt.axis('off') plt.title(classes[i])
softmax.py 内容:
import numpy as np from random import shuffle def softmax_loss_naive(W, X, y, reg): """ Softmax loss function, naive implementation (with loops) Inputs have dimension D, there are C classes, and we operate on minibatches of N examples. Inputs: - W: A numpy array of shape (D, C) containing weights. - X: A numpy array of shape (N, D) containing a minibatch of data. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength Returns a tuple of: - loss as single float - gradient with respect to weights W; an array of same shape as W """ # Initialize the loss and gradient to zero. loss = 0.0 dW = np.zeros_like(W) ############################################################################# # TODO: Compute the softmax loss and its gradient using explicit loops. # # Store the loss in loss and the gradient in dW. If you are not careful # # here, it is easy to run into numeric instability. Don't forget the # # regularization! # ############################################################################# num_classes = W.shape[1] num_train = X.shape[0] for i in xrange(num_train): scores = X[i].dot(W) correct_class = y[i] exp_scores = np.zeros_like(scores) row_sum = 0 for j in xrange(num_classes): exp_scores[j] = np.exp(scores[j]) row_sum += exp_scores[j] loss += -np.log(exp_scores[correct_class]/row_sum) #compute dW loops: for k in xrange(num_classes): if k != correct_class: dW[:,k] += exp_scores[k] / row_sum * X[i] else: dW[:,k] += (exp_scores[correct_class]/row_sum - 1) * X[i] loss /= num_train reg_loss = 0.5 * reg * np.sum(W**2) loss += reg_loss dW /= num_train dW += reg * W ############################################################################# # END OF YOUR CODE # ############################################################################# return loss, dW def softmax_loss_vectorized(W, X, y, reg): """ Softmax loss function, vectorized version. Inputs and outputs are the same as softmax_loss_naive. """ # Initialize the loss and gradient to zero. loss = 0.0 dW = np.zeros_like(W) ############################################################################# # TODO: Compute the softmax loss and its gradient using no explicit loops. # # Store the loss in loss and the gradient in dW. If you are not careful # # here, it is easy to run into numeric instability. Don't forget the # # regularization! # ############################################################################# num_train = X.shape[0] scores = X.dot(W) exp_scores = np.exp(scores) row_sum = exp_scores.sum(axis=1) row_sum = row_sum.reshape((num_train, 1)) #compute loss norm_exp_scores = exp_scores / row_sum row_index = np.arange(num_train) data_loss = norm_exp_scores[row_index, y].sum() loss = data_loss / num_train + 0.5 * reg * np.sum(W*W) norm_exp_scores[row_index, y] -= 1 dW = X.T.dot(norm_exp_scores) dW = dW/num_train + reg * W ############################################################################# # END OF YOUR CODE # ############################################################################# return loss, dW
linear_classifier.py 内容:
import numpy as np from cs231n.classifiers.linear_svm import * from cs231n.classifiers.softmax import * class LinearClassifier(object): def __init__(self): self.W = None def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100, batch_size=200, verbose=False): """ Train this linear classifier using stochastic gradient descent. Inputs: - X: A numpy array of shape (N, D) containing training data; there are N training samples each of dimension D. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label 0 <= c < C for C classes. - learning_rate: (float) learning rate for optimization. - reg: (float) regularization strength. - num_iters: (integer) number of steps to take when optimizing - batch_size: (integer) number of training examples to use at each step. - verbose: (boolean) If true, print progress during optimization. Outputs: A list containing the value of the loss function at each training iteration. """ num_train, dim = X.shape num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes if self.W is None: # lazily initialize W self.W = 0.001 * np.random.randn(dim, num_classes) # Run stochastic gradient descent to optimize W loss_history = [] for it in xrange(num_iters): X_batch = None y_batch = None ######################################################################### # TODO: # # Sample batch_size elements from the training data and their # # corresponding labels to use in this round of gradient descent. # # Store the data in X_batch and their corresponding labels in # # y_batch; after sampling X_batch should have shape (dim, batch_size) # # # # $$this may be wrong, it shuould be (batch_size, dim)$$ # # # # and y_batch should have shape (batch_size,) # # # # Hint: Use np.random.choice to generate indices. Sampling with # # replacement is faster than sampling without replacement. # ######################################################################### batch_inx = np.random.choice(num_train, batch_size) X_batch = X[batch_inx,:] y_batch = y[batch_inx] ######################################################################### # END OF YOUR CODE # ######################################################################### # evaluate loss and gradient loss, grad = self.loss(X_batch, y_batch, reg) loss_history.append(loss) # perform parameter update ######################################################################### # TODO: # # Update the weights using the gradient and the learning rate. # ######################################################################### self.W = self.W - learning_rate * grad ######################################################################### # END OF YOUR CODE # ######################################################################### if verbose and it % 100 == 0: print 'iteration %d / %d: loss %f' % (it, num_iters, loss) return loss_history def predict(self, X): """ Use the trained weights of this linear classifier to predict labels for data points. Inputs: - X: D x N array of training data. Each column is a D-dimensional point. $ it should be X: N x D $ Returns: - y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional array of length N, and each element is an integer giving the predicted class. """ y_pred = np.zeros(X.shape[0]) ########################################################################### # TODO: # # Implement this method. Store the predicted labels in y_pred. # ########################################################################### y_scores = np.dot(X, self.W) y_pred = np.argmax(y_scores, axis=1) ########################################################################### # END OF YOUR CODE # ########################################################################### return y_pred def loss(self, X_batch, y_batch, reg): """ Compute the loss function and its derivative. Subclasses will override this. Inputs: - X_batch: A numpy array of shape (N, D) containing a minibatch of N data points; each point has dimension D. - y_batch: A numpy array of shape (N,) containing labels for the minibatch. - reg: (float) regularization strength. Returns: A tuple containing: - loss as a single float - gradient with respect to self.W; an array of the same shape as W """ pass class LinearSVM(LinearClassifier): """ A subclass that uses the Multiclass SVM loss function """ def loss(self, X_batch, y_batch, reg): return svm_loss_vectorized(self.W, X_batch, y_batch, reg) class Softmax(LinearClassifier): """ A subclass that uses the Softmax + Cross-entropy loss function """ def loss(self, X_batch, y_batch, reg): return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)
相关文章推荐
- 深度学习与计算机视觉[CS231N] 学习笔记(3.2):Softmax Classifier(Loss Function)
- cs231n-assignment1-SVM/Softmax/two-layer-nets梯度求解
- CS231n 学习笔记(2)——神经网络 part2 :Softmax classifier
- cs231n - assignment1- k-Nearest Neighbor Classifier 梯度推导
- cs231n——assignment1: Q1: k-Nearest Neighbor classifier(手动复制版)
- cs231n:assignment1——Q1: k-Nearest Neighbor classifier(自动生成版)
- CS224n (Spring 2017) assignment 2-----1. Tensorflow Softmax
- cs231n_assignment1 SVM and softmax notes(2016版)
- cs231n:assignment2——Q3: Dropout
- 神经网络 part2 :Softmax classifier
- 机器学习 Softmax classifier (无隐含层)
- 机器学习 Softmax classifier (一个隐含层)
- Softmax classifier
- 机器学习:Softmax Classifier (两个隐含层)
- 【cs231n】assignment1 :k-Nearest Neighbor classifier
- 机器学习: Softmax Classifier (三个隐含层)
- tensorflow实现softmax回归(softmax regression)——简单的MNIST识别(第一课)
- Convolutional neural networks(CNN) (七) Softmax Regression Exercise
- softmax相关。。
- Caffe框架源码剖析(9)—损失层SoftmaxWithLossLayer