您的位置:首页 > 理论基础 > 计算机网络

UFLDL Exercise: Convolutional Neural Network

2014-01-22 14:47 961 查看
Structure: Input layer --> Conv layer --> Mean-pooling layer --> Softmax layer

Property:
1. Cross-entropy loss function
2. Sigmoid activation function
3. Stochastic Gradient Descent with weight decay and momentum
4. Accuracy on MNIST:  a little more than 97%

%% Convolution Neural Network Exercise

%  Instructions
%  ------------
%
%  This file contains code that helps you get started in building a single.
%  layer convolutional nerual network. In this exercise, you will only
%  need to modify cnnCost.m and cnnminFuncSGD.m. You will not need to
%  modify this file.

%%======================================================================
%% STEP 0: Initialize Parameters and Load Data
%  Here we initialize some parameters used for the exercise.

% Configuration
imageDim = 28;
numClasses = 10;  % Number of classes (MNIST images fall into 10 classes)
filterDim = 9;    % Filter size for conv layer
numFilters = 20;   % Number of filters for conv layer
poolDim = 2;      % Pooling dimension, (should divide imageDim-filterDim+1)

% Load MNIST Train
addpath ../common/;
images = loadMNISTImages('train-images-idx3-ubyte');
images = reshape(images,imageDim,imageDim,[]);
labels = loadMNISTLabels('train-labels-idx1-ubyte');
labels(labels==0) = 10; % Remap 0 to 10

% Initialize Parameters
theta = cnnInitParams(imageDim,filterDim,numFilters,poolDim,numClasses);

%%======================================================================
%% STEP 1: Implement convNet Objective
%  Implement the function cnnCost.m.

%%======================================================================
%% STEP 2: Gradient Check
%  Use the file computeNumericalGradient.m to check the gradient
%  calculation for your cnnCost.m function.  You may need to add the
%  appropriate path or copy the file to this directory.

DEBUG=false;  % set this to true to check gradient
if DEBUG
% To speed up gradient checking, we will use a reduced network and
% a debugging data set
db_numFilters = 2;
db_filterDim = 9;
db_poolDim = 5;
db_images = images(:,:,1:10);
db_labels = labels(1:10);
db_theta = cnnInitParams(imageDim,db_filterDim,db_numFilters,...
db_poolDim,numClasses);

[cost grad] = cnnCost(db_theta,db_images,db_labels,numClasses,...
db_filterDim,db_numFilters,db_poolDim);

% Check gradients
numGrad = computeNumericalGradient( @(x) cnnCost(x,db_images,...
db_labels,numClasses,db_filterDim,...
db_numFilters,db_poolDim), db_theta);

% Use this to visually compare the gradients side by side
disp([numGrad grad]);

diff = norm(numGrad-grad)/norm(numGrad+grad);
% Should be small. In our implementation, these values are usually
% less than 1e-9.
disp(diff);

assert(diff < 1e-9,...
'Difference too large. Check your gradient computation again');

end;

%%======================================================================
%% STEP 3: Learn Parameters
%  Implement minFuncSGD.m, then train the model.

options.epochs = 5;
options.minibatch = 256;
options.alpha = 1e-1;
options.momentum = .95;

opttheta = minFuncSGD(@(x,y,z) cnnCost(x,y,z,numClasses,filterDim,...
numFilters,poolDim),theta,images,labels,options);

%%======================================================================
%% STEP 4: Test
%  Test the performance of the trained model using the MNIST test set. Your
%  accuracy should be above 97% after 3 epochs of training

testImages = loadMNISTImages('t10k-images-idx3-ubyte');
testImages = reshape(testImages,imageDim,imageDim,[]);
testLabels = loadMNISTLabels('t10k-labels-idx1-ubyte');
testLabels(testLabels==0) = 10; % Remap 0 to 10

[~,cost,preds]=cnnCost(opttheta,testImages,testLabels,numClasses,...
filterDim,numFilters,poolDim,true);

acc = sum(preds==testLabels)/length(preds);

% Accuracy should be around 97.4% after 3 epochs
fprintf('Accuracy is %f\n',acc);
function [cost, grad, preds] = cnnCost(theta,images,labels,numClasses,filterDim,numFilters,poolDim,pred)
% Calcualte cost and gradient for a single layer convolutional
% neural network followed by a softmax layer with cross entropy
% objective.
%
% Parameters:
%  theta      -  unrolled parameter vector
%  images     -  stores images in imageDim x imageDim x numImges
%                array
%  numClasses -  number of classes to predict
%  filterDim  -  dimension of convolutional filter
%  numFilters -  number of convolutional filters
%  poolDim    -  dimension of pooling area
%  pred       -  boolean only forward propagate and return
%                predictions
%
%
% Returns:
%  cost       -  cross entropy cost
%  grad       -  gradient with respect to theta (if pred==False)
%  preds      -  list of predictions for each example (if pred==True)

if ~exist('pred','var')
pred = false;
end;

imageDim = size(images,1); % height/width of image
numImages = size(images,3); % number of images
numImages_inv = 1./numImages;
lambda = 0.0001;  % weight decay parameter

%% Reshape parameters and setup gradient matrices

% Wc is filterDim x filterDim x numFilters parameter matrix
% bc is the corresponding bias

% Wd is numClasses x hiddenSize parameter matrix where hiddenSize
% is the number of output units from the convolutional layer
% bd is corresponding bias
[Wc, Wd, bc, bd] = cnnParamsToStack(theta,imageDim,filterDim,numFilters,...
poolDim,numClasses);

% Same sizes as Wc,Wd,bc,bd. Used to hold gradient w.r.t above params.
Wc_grad = zeros(size(Wc));
Wd_grad = zeros(size(Wd));
bc_grad = zeros(size(bc));
bd_grad = zeros(size(bd));

%%======================================================================
%% STEP 1a: Forward Propagation
%  In this step you will forward propagate the input through the
%  convolutional and subsampling (mean pooling) layers.  You will then use
%  the responses from the convolution and pooling layer as the input to a
%  standard softmax layer.

%% Convolutional Layer
%  For each image and each filter, convolve the image with the filter, add
%  the bias and apply the sigmoid nonlinearity.  Then subsample the
%  convolved activations with mean pooling.  Store the results of the
%  convolution in activations and the results of the pooling in
%  activationsPooled.  You will need to save the convolved activations for
%  backpropagation.
convDim = imageDim-filterDim+1; % dimension of convolved output
outputDim = (convDim)/poolDim; % dimension of subsampled output

% convDim x convDim x numFilters x numImages tensor for storing activations
activations = zeros(convDim,convDim,numFilters,numImages);

% outputDim x outputDim x numFilters x numImages tensor for storing
% subsampled activations
activationsPooled = zeros(outputDim,outputDim,numFilters,numImages);

%%% YOUR CODE HERE %%%
activations = cnnConvolve(filterDim, numFilters, images, Wc, bc);
activationsPooled = cnnPool(poolDim, activations);

% Reshape activations into 2-d matrix, hiddenSize x numImages,
% for Softmax layer
activationsPooled = reshape(activationsPooled,[],numImages);

%% Softmax Layer
%  Forward propagate the pooled activations calculated above into a
%  standard softmax layer. For your convenience we have reshaped
%  activationPooled into a hiddenSize x numImages matrix.  Store the
%  results in probs.

% numClasses x numImages for storing probability that each image belongs to
% each class.
probs = zeros(numClasses,numImages);

%%% YOUR CODE HERE %%%
probs = Wd * activationsPooled + repmat(bd, [1, numImages]);
probs = bsxfun(@minus, probs, max(probs, [], 1));
probs = exp(probs);
probs = bsxfun(@rdivide, probs, sum(probs));

%%======================================================================
%% STEP 1b: Calculate Cost
%  In this step you will use the labels given as input and the probs
%  calculate above to evaluate the cross entropy objective.  Store your
%  results in cost.

cost = 0; % save objective into cost

%%% YOUR CODE HERE %%%
groundTruth = full(sparse(labels, 1:numImages, 1));
cost = -numImages_inv*(groundTruth(:)'*log(probs(:))) + (lambda/2.)*(sum(Wd(:).^2)+sum(Wc(:).^2));

% Makes predictions given probs and returns without backproagating errors.
if pred
[~,preds] = max(probs,[],1);
preds = preds';
grad = 0;
return;
end;

%%======================================================================
%% STEP 1c: Backpropagation
%  Backpropagate errors through the softmax and convolutional/subsampling
%  layers.  Store the errors for the next step to calculate the gradient.
%  Backpropagating the error w.r.t the softmax layer is as usual.  To
%  backpropagate through the pooling layer, you will need to upsample the
%  error with respect to the pooling layer for each filter and each image.
%  Use the kron function and a matrix of ones to do this upsampling
%  quickly.

%%% YOUR CODE HERE %%%
delta = -(groundTruth - probs);
delta_pool = reshape(Wd'*delta, outputDim, outputDim, numFilters, numImages);
delta_conv = zeros(convDim,convDim,numFilters,numImages);
% upsampling the delta_pool to delta_conv
for i=1:numImages
for j=1:numFilters
delta_conv(:,:,j,i) = (1./poolDim^2) .* kron(squeeze(delta_pool(:,:,j,i)), ones(poolDim));
end
end
delta_conv = activations .* (1-activations) .* delta_conv;

%%======================================================================
%% STEP 1d: Gradient Calculation
%  After backpropagating the errors above, we can use them to calculate the
%  gradient with respect to all the parameters.  The gradient w.r.t the
%  softmax layer is calculated as usual.  To calculate the gradient w.r.t.
%  a filter in the convolutional layer, convolve the backpropagated error
%  for that filter with each image and aggregate over images.

%%% YOUR CODE HERE %%%
Wd_grad = numImages_inv .* delta * activationsPooled' + lambda .* Wd;
bd_grad = numImages_inv .* sum(delta, 2);

for i=1:numFilters
for j=1:numImages
Wc_grad(:,:,i) = Wc_grad(:,:,i) + conv2(squeeze(images(:,:,j)),rot90(squeeze(delta_conv(:,:,i,j)),2),'valid');
end
Wc_grad(:,:,i) = numImages_inv .* Wc_grad(:,:,i) + lambda .* Wc(:,:,i);

temp = delta_conv(:,:,i,:);
bc_grad(i) = numImages_inv .* sum(temp(:));
end

%% Unroll gradient into grad vector for minFunc
grad = [Wc_grad(:) ; Wd_grad(:) ; bc_grad(:) ; bd_grad(:)];

end
function [opttheta] = minFuncSGD(funObj,theta,data,labels,options)
% Runs stochastic gradient descent with momentum to optimize the
% parameters for the given objective.
%
% Parameters:
%  funObj     -  function handle which accepts as input theta,
%                data, labels and returns cost and gradient w.r.t
%                to theta.
%  theta      -  unrolled parameter vector
%  data       -  stores data in m x n x numExamples tensor
%  labels     -  corresponding labels in numExamples x 1 vector
%  options    -  struct to store specific options for optimization
%
% Returns:
%  opttheta   -  optimized parameter vector
%
% Options (* required)
%  epochs*     - number of epochs through data
%  alpha*      - initial learning rate
%  minibatch*  - size of minibatch
%  momentum    - momentum constant, defualts to 0.9

%%======================================================================
%% Setup
assert(all(isfield(options,{'epochs','alpha','minibatch'})),...
'Some options not defined');
if ~isfield(options,'momentum')
options.momentum = 0.9;
end;
epochs = options.epochs;
alpha = options.alpha;
minibatch = options.minibatch;
m = length(labels); % training set size
% Setup for momentum
mom = 0.5;
momIncrease = 20;
velocity = zeros(size(theta));

%%======================================================================
%% SGD loop
it = 0;
for e = 1:epochs

% randomly permute indices of data for quick minibatch sampling
rp = randperm(m);

for s=1:minibatch:(m-minibatch+1)
it = it + 1;

% increase momentum after momIncrease iterations
if it == momIncrease
mom = options.momentum;
end;

% get next randomly selected minibatch
mb_data = data(:,:,rp(s:s+minibatch-1));
mb_labels = labels(rp(s:s+minibatch-1));

% evaluate the objective function on the next minibatch
[cost grad] = funObj(theta,mb_data,mb_labels);

% Instructions: Add in the weighted velocity vector to the
% gradient evaluated above scaled by the learning rate.
% Then update the current weights theta according to the
% sgd update rule

%%% YOUR CODE HERE %%%
velocity = mom.* velocity + alpha .* grad;
theta = theta - velocity;
fprintf('Epoch %d: Cost on iteration %d is %f\n',e,it,cost);
end;

% aneal learning rate by factor of two after each epoch
alpha = alpha/2.0;

end;

opttheta = theta;

end
function convolvedFeatures = cnnConvolve(filterDim, numFilters, images, W, b)
%cnnConvolve Returns the convolution of the features given by W and b with
%the given images
%
% Parameters:
%  filterDim - filter (feature) dimension
%  numFilters - number of feature maps
%  images - large images to convolve with, matrix in the form
%           images(r, c, image number)
%  W, b - W, b for features from the sparse autoencoder
%         W is of shape (filterDim,filterDim,numFilters)
%         b is of shape (numFilters,1)
%
% Returns:
%  convolvedFeatures - matrix of convolved features in the form
%                      convolvedFeatures(imageRow, imageCol, featureNum, imageNum)

numImages = size(images, 3);
imageDim = size(images, 1);
convDim = imageDim - filterDim + 1;

convolvedFeatures = zeros(convDim, convDim, numFilters, numImages);

% Instructions:
%   Convolve every filter with every image here to produce the
%   (imageDim - filterDim + 1) x (imageDim - filterDim + 1) x numFeatures x numImages
%   matrix convolvedFeatures, such that
%   convolvedFeatures(imageRow, imageCol, featureNum, imageNum) is the
%   value of the convolved featureNum feature for the imageNum image over
%   the region (imageRow, imageCol) to (imageRow + filterDim - 1, imageCol + filterDim - 1)
%
% Expected running times:
%   Convolving with 100 images should take less than 30 seconds
%   Convolving with 5000 images should take around 2 minutes
%   (So to save time when testing, you should convolve with less images, as
%   described earlier)

for imageNum = 1:numImages
for filterNum = 1:numFilters

% convolution of image with feature matrix
convolvedImage = zeros(convDim, convDim);

% Obtain the feature (filterDim x filterDim) needed during the convolution

%%% YOUR CODE HERE %%%
filter = W(:,:,filterNum);

% Flip the feature matrix because of the definition of convolution, as explained later
filter = rot90(squeeze(filter),2);

% Obtain the image
im = squeeze(images(:, :, imageNum));

% Convolve "filter" with "im", adding the result to convolvedImage
% be sure to do a 'valid' convolution

%%% YOUR CODE HERE %%%
convolvedImage = conv2(im,filter,'valid');

% Add the bias unit
% Then, apply the sigmoid function to get the hidden activation

%%% YOUR CODE HERE %%%
convolvedImage = sigmoid(convolvedImage + b(filterNum,1));

convolvedFeatures(:, :, filterNum, imageNum) = convolvedImage;
end
end

end

function sigm = sigmoid(x)
sigm = 1./(1+exp(-x));
end
function pooledFeatures = cnnPool(poolDim, convolvedFeatures)
%cnnPool Pools the given convolved features
%
% Parameters:
%  poolDim - dimension of pooling region
%  convolvedFeatures - convolved features to pool (as given by cnnConvolve)
%                      convolvedFeatures(imageRow, imageCol, featureNum, imageNum)
%
% Returns:
%  pooledFeatures - matrix of pooled features in the form
%                   pooledFeatures(poolRow, poolCol, featureNum, imageNum)
%

numImages = size(convolvedFeatures, 4);
numFilters = size(convolvedFeatures, 3);
convolvedDim = size(convolvedFeatures, 1);

pooledFeatures = zeros(convolvedDim / poolDim, ...
convolvedDim / poolDim, numFilters, numImages);

% Instructions:
%   Now pool the convolved features in regions of poolDim x poolDim,
%   to obtain the
%   (convolvedDim/poolDim) x (convolvedDim/poolDim) x numFeatures x numImages
%   matrix pooledFeatures, such that
%   pooledFeatures(poolRow, poolCol, featureNum, imageNum) is the
%   value of the featureNum feature for the imageNum image pooled over the
%   corresponding (poolRow, poolCol) pooling region.
%
%   Use mean pooling here.

%%% YOUR CODE HERE %%%
for imageNum = 1:numImages
for filterNum = 1:numFilters
pooled = conv2(convolvedFeatures(:,:,filterNum,imageNum), ...
ones(poolDim,poolDim), 'valid');
pooledFeatures(:,:,filterNum,imageNum) = pooled(1:poolDim:end,1:poolDim:end) ./ (poolDim*poolDim);
end
end
end


Reference:
1.
Exercise: Convolutional Neural Network

2.
CNN卷积神经网络推导和实现

3.
CNN的反向求导及练习
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  UFLDL cnn 神经网络