您的位置:首页 > 编程语言 > MATLAB

matlab实现回归分析

2013-04-28 09:51 423 查看
机器学习这门课上,Andrew第一个提到的就是监督学习下面的回归分析,编程作业一也就是用matlab实现回归。主要包括两个方面的内容,计算代价,梯度下降。

计算代价可以用下面这几个式子来说明:



htheta(x)是预测值,y是真实值,目的就是通过训练参数让预测值和真实值之间的差距最小,目标函数有了,下面就是梯度下降迭代的过程了。



这里只有两个参数theta0和theta1,当然可以拓展到多个参数的情况:



知道了公式和原理,下面用matlab来实现难度并不算很大。

下面是作业一里面的 ex1.m,里面最主要的两个函数就是computeCost()和gradientDescent(),其他函数就不赘述了

%% Machine Learning Online Class - Exercise 1: Linear Regression

% Instructions
% ------------
%
% This file contains code that helps you get started on the
% linear exercise. You will need to complete the following functions
% in this exericse:
%
% warmUpExercise.m
% plotData.m
% gradientDescent.m
% computeCost.m
% gradientDescentMulti.m
% computeCostMulti.m
% featureNormalize.m
% normalEqn.m
%
% For this exercise, you will not need to change any code in this file,
% or any other files other than those mentioned above.
%
% x refers to the population size in 10,000s
% y refers to the profit in $10,000s
%

%% Initialization
clear ; close all; clc

%% ==================== Part 1: Basic Function ====================
% Complete warmUpExercise.m
fprintf('Running warmUpExercise ... \n');
fprintf('5x5 Identity Matrix: \n');
warmUpExercise()

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ======================= Part 2: Plotting =======================
fprintf('Plotting Data ...\n')
data = load('ex1data1.txt');
X = data(:, 1); y = data(:, 2);
m = length(y); % number of training examples

% Plot Data
% Note: You have to complete the code in plotData.m

%plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% =================== Part 3: Gradient descent ===================
fprintf('Running Gradient Descent ...\n')

X = [ones(m, 1), data(:,1)]; % Add a column of ones to x
theta = zeros(2, 1); % initialize fitting parameters

% Some gradient descent settings
iterations = 1500;
alpha = 0.01;

% compute and display initial cost
computeCost(X, y, theta)

% run gradient descent
theta = gradientDescent(X, y, theta, alpha, iterations);

% print theta to screen
fprintf('Theta found by gradient descent: ');
fprintf('%f %f \n', theta(1), theta(2));

% Plot the linear fit
hold on; % keep previous plot visible
plot(X(:,2), X*theta, '-')
legend('Training data', 'Linear regression')
hold off % don't overlay any more plots on this figure

% Predict values for population sizes of 35,000 and 70,000
predict1 = [1, 3.5] *theta;
fprintf('For population = 35,000, we predict a profit of %f\n',...
predict1*10000);
predict2 = [1, 7] * theta;
fprintf('For population = 70,000, we predict a profit of %f\n',...
predict2*10000);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ============= Part 4: Visualizing J(theta_0, theta_1) =============
fprintf('Visualizing J(theta_0, theta_1) ...\n')

% Grid over which we will calculate J
theta0_vals = linspace(-10, 10, 100);
theta1_vals = linspace(-1, 4, 100);

% initialize J_vals to a matrix of 0's
J_vals = zeros(length(theta0_vals), length(theta1_vals));

% Fill out J_vals
for i = 1:length(theta0_vals)
for j = 1:length(theta1_vals)
t = [theta0_vals(i); theta1_vals(j)];
J_vals(i,j) = computeCost(X, y, t);
end
end

% Because of the way meshgrids work in the surf command, we need to
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
% Surface plot
figure;
surf(theta0_vals, theta1_vals, J_vals)
xlabel('\theta_0'); ylabel('\theta_1');

% Contour plot
figure;
% Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
xlabel('\theta_0'); ylabel('\theta_1');
hold on;
plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);


下面是computeCost.m,注意这里尽量用矢量化的思想进行计算,采用循环虽然也可以实现相同的效果,但数据量一旦过大那耗时会非常长
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
% You should set J to the cost.
%for j=1:m
% J=J+(X(j,:)*theta-y(j,:))^2;
%end
%J=J/(2*m);

J=sum((X*theta-y).^2)/(2*m);

% =========================================================================

end

下面是gradientDescent.m
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
% K1=0;
% K2=0;
% for j=1:m
% K1=K1+(X(j,:)*theta-y(j,:))*X(j,1);
% K2=K2+(X(j,:)*theta-y(j,:))*X(j,2);
% end
% theta(1)=theta(1)-alpha*(1/m)*K1;
% theta(2)=theta(2)-alpha*(1/m)*K2;

theta=theta-alpha*(1/m)*(X'*(X*theta-y));

% ============================================================

% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);

end

end


通过编写代码,相信能对这个知识点有更好的理解,exercise1的作业都和回归有关,线性回归是其中最简单的,当特征数目变多时,可以看成是在一个高维空间中进行回归分析,搞清楚theta,X,y各代表什么,梯度下降的是什么,要求的又是什么就可以了。



第一次编程作业pass,重学matlab。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: