您的位置:首页 > 其它

Classification and logistic regression

2015-08-21 22:18 633 查看

logistic 回归

1.问题:

在上面讨论回归问题时,讨论的结果都是连续类型,但如果要求做分类呢?即讨论结果为离散型的值。

2.解答:

假设:


其中:


g(z)g(z)的图形如下:



由此可知:当hθ(x)h_\theta(x)<0.5时我们可以认为为0,反之为1,这样就变成离散型的数据了。

推导迭代式:

利用概率论进行推导,找出样本服从的分布类型,利用最大似然法求出相应的θ\theta



因此:




结果:


注意:这里的迭代式增量迭代法

Newton迭代法:

1.问题:

上述迭代法,收敛速度很慢,在利用最大似然法求解的时候可以运用Newton迭代法,即θ\theta := θ−f(θ)f′(θ)\theta - \frac{f(\theta)}{f^{'}(\theta)}

2.解答:

推导:

Newton迭代法是求θ\theta,且f(θ)=0f(\theta) = 0,刚好:l′(θ)=0l^{'}(\theta) = 0

所以可以将Newton迭代法改写成:


定义:

其中:l′(θ)l'(\theta) =




因此:H矩阵就是l′′(θ)l''(\theta),即H−1H^{-1} = 1/l′′(θ)1 / {l''(\theta)}

所以:


应用:

特征值比较少的情况,否则H−1H^{-1}的计算量是很大的

Logistic 0、1分类:

1.自己设定迭代次数

  自己编写相应的循环,给出迭代次数以及下降坡度alpha,进行增量梯度下降。

主要函数及功能:

Logistic_Regression 相当于主函数

gradientDecent 梯度下降更新θ\theta函数

computeCost 计算损失JJ函数

Logistic_Regression

%%  part0: 准备
data = load('ex2data1.txt');
x = data(:,[1,2]);
y = data(:,3);
pos = find(y==1);
neg = find(y==0);

x1 = x(:,1);
x2 = x(:,2);
plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');
pause;

%% part1: GradientDecent and compute cost of J
[m,n] = size(x);
x = [ones(m,1),x];
theta = zeros(3,1);
J = computeCost(x,y,theta);

theta = gradientDecent(x, y, theta);
X = 25:100;
Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);
plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');
pause;


gradientDecent

function theta = gradientDecent(x, y, theta)

%% compute GradientDecent 更新theta,利用的是增量梯度下降
m = size(x,1);
alph = 0.001;
for iter = 1:150000
for j = 1:3
dec = 0;
for i = 1:m
dec = dec + (y(i) - sigmoid(x(i,:)*theta))*x(i,j);
end
theta(j,1) = theta(j,1) + dec*alph/m;
end
end
end


sigmoid

function g = sigmoid(z)

%% SIGMOID Compute sigmoid functoon

g = 1/(1+exp(-z));

end


computeCost

function J = computeCost(x, y, theta)

%% compute cost: J

m = size(x,1);
J = 0;
for i = 1:m
J =  J + y(i)*log(sigmoid(x(i,:)*theta)) + (1 - y(i))*log(1 - sigmoid(x(i,:)*theta));
end
J = (-1/m)*J;
end


结果如下:





2. 利用fminunc函数:

  给出损失JJ的计算方式和θ\theta的计算方式,然后调用fminunc函数计算出最优解

主要函数及功能:

Logistics_Regression 相当于主函数

computeCost给出JJ和θ\theta的计算方式

sigmoid函数

Logistics_Regression

%%  part0: 准备
data = load('ex2data1.txt');
x = data(:,[1,2]);
y = data(:,3);
pos = find(y==1);
neg = find(y==0);

x1 = x(:,1);
x2 = x(:,2);
plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');
pause;

%% part1: GradientDecent and compute cost of J
[m,n] = size(x);
x = [ones(m,1),x];
theta = zeros(3,1);
options = optimset('GradObj', 'on', 'MaxIter', 400);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost
[theta, cost] = ...
fminunc(@(t)(computeCost(x,y,t)), theta, options);
X = 25:100;
Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);
plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');
pause;


sigmoid

function g = sigmoid(z)

%% SIGMOID Compute sigmoid functoon

g = zeros(size(z));
g = 1.0 ./ (1.0 + exp(-z));

end


computeCost

function [J,grad] = computeCost(x, y, theta)

%% compute cost: J

m = size(x,1);
grad = zeros(size(theta));
hx = sigmoid(x * theta);
J = (1.0/m) * sum(-y .* log(hx) - (1.0 - y) .* log(1.0 - hx));
grad = (1.0/m) .* x' * (hx - y);
end


结果





Logistic multi_class

1.条件

自己做的数据:

1,5,1
1,6,1
1.5,3.5,1
2.5,3.5,1
2,6,1
3,7,1
4,6,1
3.5,4.5,1
2,4,1
2,5,1
4,4,1
5,5,1
6,4,1
5,3,1
4,2,1
4,3,2
5,3,2
5,2,2
5,1.5,2
7,1.5,2
5,2.5,2
6,2.5,2
5.5,2.5,2
5,1,2
6,2,2
6,3,2
5,4,2
7,5,2
7,2,2
8,1,2
8,3,2
7,4,3
7,5,3
8.5,5.5,3
9,4,3
8,5.5,3
8,4.5,3
9.5,5.5,3
8,4.5,3
8.5,4.5,3
7,6,3
6,5,3
9,5,3
9,6,3
8,6,3
8,7,3
10,6,3
10,4,3


数据离散图:



2.算法推到

花费JJ :



更新θ\theta:



算法思路(这个算法也叫one_vs_all):



如果样本分成K类,,那我们训练K组θ\theta,依次考虑每一类样本,然后把其它的所有样本当做一类样本,这样就把这类样本和其它分开了。我们把考虑的那类样本的yy值改为1,其它为0。这样就得到K组θ\theta值。

3.代码实现:

这里采用fminuc函数实现

1.函数级功能简介:

Logistic_Regression : 相当于主函数

oneVsAll: 写成一个循环,依次计算出K组θ\theta,利用fminunc调用计算函数

computeCost:其中主要写JJ&θ\theta更新函数

2.代码:

Logistic_Regerssion:

%%  part0: 准备
data = load('data.txt');
x = data(:,[1,2]);
y = data(:,3);
y1 = find(y==1);
y2 = find(y==2);
y3 = find(y==3);

plot(x(y1,1),x(y1,2),'r*',x(y2,1),x(y2,2),'c+',x(y3,1),x(y3,2),'bo');
pause;

%% part1: GradientDecent and compute cost of J
[m,n] = size(x);
x = [ones(m,1),x];
theta = zeros(3,3);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost

[thetas,cost]= one_vs_all(x,y,theta);
X = 1:10;
Y1 = -(thetas(1,1) + thetas(2,1)*X)/thetas(3,1);
Y2 = -(thetas(1,2) + thetas(2,2)*X)/thetas(3,2);
Y3 = -(thetas(1,3) + thetas(2,3)*X)/thetas(3,3);
plot(x(y1,2),x(y1,3),'r*',x(y2,2),x(y2,3),'c+',x(y3,2),x(y3,3),'bo');
hold on
plot(X,Y1,'r',X,Y2,'g',X,Y3,'c');


one_vs_all:

function [theta,cost] = one_vs_all(x, y, theta)

%% compute cost: J

options = optimset('GradObj', 'on', 'MaxIter', 400);
n = size(x,2);
cost = zeros(n,1);
num_labels = 3;
for i = 1:num_labels
L = logical(y==i);
[theta(:,i), cost(i,1)] = ...
fminunc(@(t)(computeCost(x,L,t)), theta(:,i), options);
end


computeCost:

function [J,grad] = computeCost(x, y, thetas)

%% compute cost: J

m = size(x,1);
grad = zeros(size(thetas));
hx = sigmoid(x * thetas);
J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));
grad = (1.0/m) .* x' * (hx - y);
end


3.效果:

θ\theta & JJ cost:

thetas =

6.3988    5.1407  -24.4266
-2.0773    0.2173    2.1641
0.9857   -1.9490    2.2038

>> cost

cost =

0.1715
0.2876
0.1031


图形显示:



注意三条线组成的三角形,,这个地方的点不属于任何类别。

补充:

1.regularized Logistic Regerssion

regularized 和 普通的Logistics没有太大的区别,只是在JJ的计算和θ\theta更新中加上了以前的结果。



2.one_vs_all:

1.简介:

其实one_vs_all还有一种算法,把θ\theta当做单隐层前馈神经网络进行计算,比如说我们有K类样本,第一类样本我们可以看成[1,0,0,0...][1,0,0,0...]共k个数,,然后依次,,第ii个为1则代表第ii类样本。计算方式和上面multi_class一样。

前馈神经网络模型如下:



2.代码:

函数介绍:

one_vs_all:相当于主函数,

IrCostFunction:花费JJ和θ\theta更新

myPredict:统计训练误差

数据 和 训练得到的θ\theta:

点击这儿下载

训练结果:

Local minimum found.

Optimization completed because the size of the gradient is less than
the default value of the function tolerance.

<stopping criteria details>

Local minimum found.

Optimization completed because the size of the gradient is less than
the default value of the function tolerance.

<stopping criteria details>

Training Set Accuracy: 100.000000


one_vs_all:

function [all_theta,cost] = oneVsAll(X, y, num_labels)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logisitc regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds
%   to the classifier for label i

% Some useful variables

m = size(X, 1);
n = size(X, 2);

% You need to return the following variables correctly
all_theta = zeros(n+1,num_labels);

% Add ones to the X data matrix
X = [ones(m, 1),X];

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
%               logistic regression classifiers with regularization
%               parameter lambda.
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell use
%       whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
%       function. It is okay to use a for-loop (for c = 1:num_labels) to
%       loop over the different classes.
%
%       fmincg works similarly to fminunc, but is more efficient when we
%       are dealing with large number of parameters.
%
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
%
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost
%     [theta] = ...
%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
%                 initial_theta, options);
%

cost = zeros(num_labels,1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
for i =1:num_labels
L = logical(y==i);
[all_theta(:,i),cost(i,1)] = ...
fminunc (@(t)(lrCostFunction(t, X, L)),all_theta(:,i), options);
end

myPredict(all_theta,X,y);

% =========================================================================

end


IrCostFunction:

function [J,grad] = lrCostFunction(thetas,x, y)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

%单独调试该函数时用的代码
%x = [ones(m,1),x];
%theta = zeros(size(x,2),1);
%y = logical(y==1);

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
%       efficiently vectorized. For example, consider the computation
%
%           sigmoid(X * theta)
%
%       Each row of the resulting matrix will contain the value of the
%       prediction for that example. You can make use of this to vectorize
%       the cost function and gradient computations.
%
% Hint: When computing the gradient of the regularized cost function,
%       there're many possible vectorized solutions, but one solution
%       looks like:
%           grad = (unregularized gradient for logistic regression)
%           temp = theta;
%           temp(1) = 0;   % because we don't add anything for j = 0
%           grad = grad + YOUR_CODE_HERE (using the temp variable)
%

grad = zeros(size(thetas));
hx = sigmoid(x * thetas);
J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));
grad = (1.0/m) .* x' * (hx - y);
% ================================================x=============

end


myPredict:

function p = myPredict(Theta1,X,y)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = 10;

% You need to return the following variables correctly
p = zeros(size(X, 1), 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned neural network. You should set p to a
%               vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
%       function can also return the index of the max element, for more
%       information see 'help max'. If your examples are in rows, then, you
%       can use max(A, [], 2) to obtain the max for each row.
%

z_2 = X*Theta1;
a_2 = sigmoid(z_2);
for i = 1:m
for j = 1:num_labels
if a_2(i,j) >= 0.5
p(i,1) = j;
break;
end
end
end
fprintf('\nTraining Set Accuracy: %f\n', mean(double(p == y)) * 100);

% =========================================================================

end


与本博客相关知识链接:

由Logistics Regression multi_class中的one_vs _all算法 ——> 双隐层前馈神经网络 :BP神经网络

由Logistic Regerssion —–> SVM : 特征空间映射

Logistic Regerssion 的理论解释: 概率论解释
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: