您的位置:首页 > 大数据 > 人工智能

UFLDL run_train.m

2016-08-08 21:55 302 查看
原来的函数调用minFunc进行全局最优解的求解,后来稍作修改,使得其能够通过手动的随机梯度下降,比较简单,有效果。。。

后来发现其是学习率可以设置为ab+t,t是迭代的轮数iteration,a和b是初始化的参数。

按照原来的训练方式进行训练:

10000数据 minFunc 92.55%
60000数据 minFunc 96.41%

Step Size below progTol
test accuracy: 0.964100
train accuracy: 1.000000


% runs training procedure for supervised multilayer network
% softmax output layer with cross entropy loss function

%% setup environment
% experiment information
% a struct containing network layer sizes etc
clear;clc;close all;
ei = [];

% add common directory to your path for
% minfunc and mnist data helpers
addpath ../common;
addpath(genpath('../common/minFunc_2012/minFunc'));

%% load mnist data
[data_train, labels_train, data_test, labels_test] = load_preprocess_mnist();

%%mini_batch of whole dataset
num_select = 100;
data_train = data_train(:,1:num_select);
labels_train = labels_train(1:num_select,:);

%% populate ei with the network architecture to train
% ei is a structure you can use to store hyperparameters of the network
% the architecture specified below should produce  100% training accuracy
% You should be able to try different network architectures by changing ei
% only (no changes to the objective function code)

% dimension of input features
ei.input_dim = 784;
% number of output classes
ei.output_dim = 10;
% sizes of all hidden layers and the output layer
ei.layer_sizes = [256 ,128,256,128,ei.output_dim];
% scaling parameter for l2 weight regularization penalty
ei.lambda = 1e-4;
% which type of activation function to use in hidden layers
% feel free to implement support for only the logistic sigmoid function
ei.activation_fun = 'logistic';

%% setup random initial weights
stack = initialize_weights(ei);
params = stack2params(stack);

%% setup minfunc options
options = [];
options.display = 'iter';
options.maxFunEvals = 1e6;
options.Method = 'lbfgs';
options.MaxIter = 500;

%% run training
% [opt_params,opt_value,exitflag,output] = minFunc(@supervised_dnn_cost,...
%     params,options,ei, data_train, labels_train);

lr = 0.1;
sum_iter = 10000;
cost_rec = [];
figure;

for iter = 1:sum_iter
[ cost, grad, pred_prob] = supervised_dnn_cost( params, ei, data_train, labels_train);
scale = (max(grad)-min(grad)) / (max(params)-min(params));
if scale > 1
grad = grad / scale;
end
params = params -  lr * grad;
cost_rec = [cost_rec , cost];
plot(1:iter,cost_rec,'r');
drawnow();
xlabel('iter');
ylabel('loss');

if mod(iter,10) == 0
[~, ~, pred] = supervised_dnn_cost( params, ei, data_test, [], true);
[~,pred] = max(pred);
acc_test = mean(pred'==labels_test);
fprintf('test accuracy: %f\n', acc_test);
[~, ~, pred] = supervised_dnn_cost( params, ei, data_train, [], true);
[~,pred] = max(pred);
acc_train = mean(pred'==labels_train);
fprintf('train accuracy: %f\n', acc_train);
end
end

%% compute accuracy on the test and train set
[~, ~, pred] = supervised_dnn_cost( opt_params, ei, data_test, [], true);
[~,pred] = max(pred);
acc_test = mean(pred'==labels_test);
fprintf('test accuracy: %f\n', acc_test);

[~, ~, pred] = supervised_dnn_cost( opt_params, ei, data_train, [], true);
[~,pred] = max(pred);
acc_train = mean(pred'==labels_train);
fprintf('train accuracy: %f\n', acc_train);
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: