Deep Learning by Andrew Ng --- stacked autoencoder
2015-04-08 20:05
267 查看
When should we use fine-tuning?
It is typically used only if you have a large labeled training set; in this setting, fine-tuning can significantly improve the performance of your classifier. However, if you have a large unlabeled dataset (for unsupervised feature learning/pre-training) and only a relatively small labeled training set, then fine-tuning is significantly less likely to help.Stacked Autoencoders(Training):
相当于用多个autoencoder去捕获输入集的特征。第一个autoencoder捕获了数据集的特征后,得到特征matrix1(hidden layer的权重).然后将特征matrix1与输入集feedForward处理后的activation作为输入去捕获更高等级的特征matrix2(hidden layer的权重).然后不断重复,再讲最后得到的特征activation作为输入集输入到softmax classifier(或者其他分类器)中训练。(注意并非将训练完后得到的特征matrix直接传给下一个autoencoder,而是将输入集与此输入集同级的特征matrix用feedForward方法得到的activation传入下一个autoencoder,即将输出传给下一个autoencode)。然后整个网络训练完之后,将各个步骤得到的特征matrix与分类器的参数合成新的网络。
fine-tuning:
其实就是将前面分步训练得到的hidden layer的Weight和softmax refression的softmaxTheta作为合成的神经网络的初始参数,然后运用神经网络的前馈和反向算法对初始参数进行微调(注意合成的网络是必须加上分类器的,不然也无法对神经网络的参数进行反向传播和微调(finetuning),此以softmax regression 为例)。具体可参考softmaxCost.m,sparseAutoencoderCost.m。练习题答案(推荐自己先试着完成后参考):
-stackedAEExercise.m%% CS294A/CS294W Stacked Autoencoder Exercise % Instructions % ------------ % % This file contains code that helps you get started on the % sstacked autoencoder exercise. You will need to complete code in % stackedAECost.m % You will also need to have implemented sparseAutoencoderCost.m and % softmaxCost.m from previous exercises. You will need the initializeParameters.m % loadMNISTImages.m, and loadMNISTLabels.m files from previous exercises. % % For the purpose of completing the assignment, you do not need to % change the code in this file. % %%====================================================================== %% STEP 0: Here we provide the relevant parameters values that will % allow your sparse autoencoder to get good filters; you do not need to % change the parameters below. inputSize = 28 * 28; numClasses = 10; hiddenSizeL1 = 200; % Layer 1 Hidden Size hiddenSizeL2 = 200; % Layer 2 Hidden Size sparsityParam = 0.1; % desired average activation of the hidden units. % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p", % in the lecture notes). lambda = 3e-3; % weight decay parameter beta = 3; % weight of sparsity penalty term %%====================================================================== %% STEP 1: Load data from the MNIST database % % This loads our training data from the MNIST database files. % Load MNIST database files trainData = loadMNISTImages('mnist/train-images-idx3-ubyte'); trainLabels = loadMNISTLabels('mnist/train-labels-idx1-ubyte'); trainLabels(trainLabels == 0) = 10; % Remap 0 to 10 since our labels need to start from 1 %%====================================================================== %% STEP 2: Train the first sparse autoencoder % This trains the first sparse autoencoder on the unlabelled STL training % images. % If you've correctly implemented sparseAutoencoderCost.m, you don't need % to change anything here. % Randomly initialize the parameters sae1Theta = initializeParameters(hiddenSizeL1, inputSize); %% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the first layer sparse autoencoder, this layer has % an hidden size of "hiddenSizeL1" % You should store the optimal parameters in sae1OptTheta addpath minFunc/ options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost % function. Generally, for minFunc to work, you % need a function pointer with two outputs: the % function value and the gradient. In our problem, % sparseAutoencoderCost.m satisfies this. options.maxIter = 40; % Maximum number of iterations of L-BFGS to run options.display = 'on'; [sae1OptTheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ... inputSize, hiddenSizeL1, ... lambda, sparsityParam, ... beta, trainData), ... sae1Theta , options); % ------------------------------------------------------------------------- %%====================================================================== %% STEP 2: Train the second sparse autoencoder % This trains the second sparse autoencoder on the first autoencoder % featurse. % If you've correctly implemented sparseAutoencoderCost.m, you don't need % to change anything here. [sae1Features] = feedForwardAutoencoder(sae1OptTheta, hiddenSizeL1, ... inputSize, trainData); % Randomly initialize the parameters sae2Theta = initializeParameters(hiddenSizeL2, hiddenSizeL1); %% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the second layer sparse autoencoder, this layer has % an hidden size of "hiddenSizeL2" and an inputsize of % "hiddenSizeL1" % % You should store the optimal parameters in sae2OptTheta addpath minFunc/ options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost % function. Generally, for minFunc to work, you % need a function pointer with two outputs: the % function value and the gradient. In our problem, % sparseAutoencoderCost.m satisfies this. options.maxIter = 40; % Maximum number of iterations of L-BFGS to run options.display = 'on'; [sae2OptTheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ... hiddenSizeL1, hiddenSizeL2, ... lambda, sparsityParam, ... beta, sae1Features), ... sae2Theta , options); % ------------------------------------------------------------------------- %%====================================================================== %% STEP 3: Train the softmax classifier % This trains the sparse autoencoder on the second autoencoder features. % If you've correctly implemented softmaxCost.m, you don't need % to change anything here. [sae2Features] = feedForwardAutoencoder(sae2OptTheta, hiddenSizeL2, ... hiddenSizeL1, sae1Features); % Randomly initialize the parameters saeSoftmaxTheta = 0.005 * randn(hiddenSizeL2 * numClasses, 1); %% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the softmax classifier, the classifier takes in % input of dimension "hiddenSizeL2" corresponding to the % hidden layer size of the 2nd layer. % % You should store the optimal parameters in saeSoftmaxOptTheta % % NOTE: If you used softmaxTrain to complete this part of the exercise, % set saeSoftmaxOptTheta = softmaxModel.optTheta(:); options.maxIter = 100; softmaxModel = softmaxTrain(hiddenSizeL2, 10, lambda, ... sae2Features, trainLabels , options); saeSoftmaxOptTheta = softmaxModel.optTheta(:); % ------------------------------------------------------------------------- %%====================================================================== %% STEP 5: Finetune softmax model % Implement the stackedAECost to give the combined cost of the whole model % then run this cell. % Initialize the stack using the parameters learned stack = cell(2,1); stack{1}.w = reshape(sae1OptTheta(1:hiddenSizeL1*inputSize), ... hiddenSizeL1, inputSize); stack{1}.b = sae1OptTheta(2*hiddenSizeL1*inputSize+1:2*hiddenSizeL1*inputSize+hiddenSizeL1); stack{2}.w = reshape(sae2OptTheta(1:hiddenSizeL2*hiddenSizeL1), ... hiddenSizeL2, hiddenSizeL1); stack{2}.b = sae2OptTheta(2*hiddenSizeL2*hiddenSizeL1+1:2*hiddenSizeL2*hiddenSizeL1+hiddenSizeL2); % Initialize the parameters for the deep model [stackparams, netconfig] = stack2params(stack); stackedAETheta = [ saeSoftmaxOptTheta ; stackparams ]; %% ---------------------- YOUR CODE HERE --------------------------------- % Instructions: Train the deep network, hidden size here refers to the ' % dimension of the input to the classifier, which corresponds % to "hiddenSizeL2". % % % ------------------------------------------------------------------------- [stackedAEOptTheta, cost] = minFunc(@(p)stackedAECost(p,inputSize,hiddenSizeL2,... numClasses, netconfig,lambda, trainData, trainLabels),... stackedAETheta,options); %%====================================================================== %% STEP 6: Test % Instructions: You will need to complete the code in stackedAEPredict.m % before running this part of the code % % Get labelled test images % Note that we apply the same kind of preprocessing as the training set testData = loadMNISTImages('mnist/t10k-images-idx3-ubyte'); testLabels = loadMNISTLabels('mnist/t10k-labels-idx1-ubyte'); testLabels(testLabels == 0) = 10; % Remap 0 to 10 [pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSizeL2, ... numClasses, netconfig, testData); acc = mean(testLabels(:) == pred(:)); fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc * 100); [pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSizeL2, ... numClasses, netconfig, testData); acc = mean(testLabels(:) == pred(:)); fprintf('After Finetuning Test Accuracy: %0.3f%%\n', acc * 100); % Accuracy is the proportion of correctly classified images % The results for our implementation were: % % Before Finetuning Test Accuracy: 87.7% % After Finetuning Test Accuracy: 97.6% % % If your values are too low (accuracy less than 95%), you should check % your code for errors, and make sure you are training on the % entire data set of 60000 28x28 training images % (unless you modified the loading code, this should be the case)
stackedAECost.m
function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ... numClasses, netconfig, ... lambda, data, labels) % stackedAECost: Takes a trained softmaxTheta and a training data set with labels, % and returns cost and gradient using a stacked autoencoder model. Used for % finetuning. % theta: trained weights from the autoencoder % visibleSize: the number of input units % hiddenSize: the number of hidden units *at the 2nd layer* % numClasses: the number of categories % netconfig: the network configuration of the stack % lambda: the weight regularization penalty % data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. % labels: A vector containing labels, where labels(i) is the label for the % i-th training example %% Unroll softmaxTheta parameter % We first extract the part which compute the softmax gradient softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize); % Extract out the "stack" stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig); % You will need to compute the following gradients softmaxThetaGrad = zeros(size(softmaxTheta)); stackgrad = cell(size(stack)); for delta = 1:numel(stack) stackgrad{delta}.w = zeros(size(stack{delta}.w)); stackgrad{delta}.b = zeros(size(stack{delta}.b)); end cost = 0; % You need to compute this % You might find these variables useful M = size(data, 2); groundTruth = full(sparse(labels, 1:M, 1));%input labels %% --------------------------- YOUR CODE HERE ----------------------------- % Instructions: Compute the cost function and gradient vector for % the stacked autoencoder. % % You are given a stack variable which is a cell-array of % the weights and biases for every layer. In particular, you % can refer to the weights of Layer d, using stack{d}.w and % the biases using stack{d}.b . To get the total number of % layers, you can use numel(stack). % % The last layer of the network is connected to the softmax % classification layer, softmaxTheta. % % You should compute the gradients for the softmaxTheta, % storing that in softmaxThetaGrad. Similarly, you should % compute the gradients for each layer in the stack, storing % the gradients in stackgrad{d}.w and stackgrad{d}.b % Note that the size of the matrices in stackgrad should % match exactly that of the size of the matrices in stack. % % ------------------------------------------------------------------------- depth = numel(stack)% 神经网络的层数(不包括softmax层) z = cell(depth+1,1); a = cell(depth+1,1);%进行前馈神经网络计算所需要的参数 a{1} =data;%输入层 for index = 1:depth%前馈神经网络计算(为什么要加1) z{index+1} = stack{index}.w*a{index}+repmat(stack{index}.b, 1, size(a{index},2)); a{index+1} = sigmoid(z{index+1}); end model = softmaxTheta*a{depth+1}; %神经网络最后一层的activation传入softmax regression。 model = bsxfun(@minus, model , max(model , [], 1)); h = exp(model ); h = bsxfun(@rdivide, h, sum(h)); size(groundTruth); cost = -1/numClasses*sum(sum(groundTruth.*log(h)))+lambda/2*sum(sum(softmaxTheta.^2)); softmaxThetaGrad = -1/numClasses*((groundTruth-h)*a{depth+1}')+lambda*softmaxTheta; %反向传播算法 delta = cell(depth+1); %I is the input labels and P is the vector of conditional probabilities. delta{depth+1} = -(softmaxTheta' * (groundTruth - h)) .* a{depth+1} .* (1-a{depth+1}); for layer = (depth:-1:2) delta{layer} = (stack{layer}.w' * delta{layer+1}) .* a{layer} .* (1-a{layer}); end for layer = (depth:-1:1) stackgrad{layer}.w = (1/numClasses) * delta{layer+1} * a{layer}'; stackgrad{layer}.b = (1/numClasses) * sum(delta{layer+1}, 2); end %% Roll gradient vector grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)]; end % You might find this useful function sigm = sigmoid(x) sigm = 1 ./ (1 + exp(-x)); end
stackedAEPredict.m
function [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data) % stackedAEPredict: Takes a trained theta and a test data set, % and returns the predicted labels for each example. % theta: trained weights from the autoencoder % visibleSize: the number of input units % hiddenSize: the number of hidden units *at the 2nd layer* % numClasses: the number of categories % data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. % Your code should produce the prediction matrix % pred, where pred(i) is argmax_c P(y(c) | x(i)). %% Unroll theta parameter % We first extract the part which compute the softmax gradient softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize); % Extract out the "stack" stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig); %% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute pred using theta assuming that the labels start % from 1. %前馈神经网络算法 depth = numel(stack); z = cell(depth+1,1); a = cell(depth+1, 1); a{1} = data; for layer = (1:depth) z{layer+1} = stack{layer}.w * a{layer} + repmat(stack{layer}.b, [1, size(a{layer},2)]); a{layer+1} = sigmoid(z{layer+1}); end [index, pred] = max(softmaxTheta * a{depth+1});%预测 % ----------------------------------------------------------- end % You might find this useful function sigm = sigmoid(x) sigm = 1 ./ (1 + exp(-x)); end
相关文章推荐
- Deep Learning by Andrew Ng --- Sparse Autoencoder
- Deep Learning by Andrew Ng --- Sparse coding
- Deep Learning by Andrew Ng --- self-taught
- Deep learning by Andrew Ng --- Linear Decoder
- Deep Learning by Andrew Ng --- PCA and whitening
- Deep Learning by Andrew Ng --- Softmax regression
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 3) Logistic Regression & Regularization
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 4) Neural Networks Representation
- Machine Learning by Andrew Ng-----note
- Machine Learning by Andrew Ng --- K-means
- Machine Learning by Andrew Ng --- Logistic Regression by using Regularization
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 1) Linear Regression
- Andrew Ng deeplearning.ai专项课程第四课Convolutional Neural Networks第三周笔记
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 7) Support Vector Machines
- Machine Learning by Andrew Ng --- Logistic Regression of Multi-class Classification
- Machine Learning by Andrew Ng --- neural network learning
- Stanford Machine Learning (by Andrew NG) --- (week 9) Anomaly Detection&Recommende
- 初学 Unsupervised feature learning and deep learning--Sparse autoencoder
- Andrew Ng deeplearning.ai专项课程第四课Convolutional Neural Networks第二周笔记
- Andrew Ng Neural-networks-deep-learning 课程笔记一