您的位置：首页 > 其它

Classification and logistic regression

2015-08-21 22:18 633 查看

logistic 回归

1.问题：

在上面讨论回归问题时，讨论的结果都是连续类型，但如果要求做分类呢？即讨论结果为离散型的值。

2.解答：

假设：

其中：

g(z)g(z)的图形如下：

由此可知：当hθ(x)h_\theta(x)<0.5时我们可以认为为0，反之为1，这样就变成离散型的数据了。

推导迭代式：

利用概率论进行推导，找出样本服从的分布类型，利用最大似然法求出相应的θ\theta

因此：

结果：

注意：这里的迭代式增量迭代法

Newton迭代法：

1.问题：

上述迭代法，收敛速度很慢，在利用最大似然法求解的时候可以运用Newton迭代法，即θ\theta := θ−f(θ)f′(θ)\theta - \frac{f(\theta)}{f^{'}(\theta)}

2.解答：

推导：

Newton迭代法是求θ\theta，且f(θ)=0f(\theta) = 0，刚好：l′(θ)=0l^{'}(\theta) = 0

所以可以将Newton迭代法改写成：

定义：

其中：l′(θ)l'(\theta) =

因此：H矩阵就是l′′(θ)l''(\theta)，即H−1H^{-1} = 1/l′′(θ)1 / {l''(\theta)}

所以：

应用：

特征值比较少的情况，否则H−1H^{-1}的计算量是很大的

Logistic 0、1分类：

1.自己设定迭代次数

　　自己编写相应的循环，给出迭代次数以及下降坡度alpha，进行增量梯度下降。

主要函数及功能：

Logistic_Regression 相当于主函数

gradientDecent 梯度下降更新θ\theta函数

computeCost 计算损失JJ函数

Logistic_Regression

%%  part0： 准备
data = load('ex2data1.txt');
x = data(:,[1,2]);
y = data(:,3);
pos = find(y==1);
neg = find(y==0);

x1 = x(:,1);
x2 = x(:,2);
plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');
pause;

%% part1: GradientDecent and compute cost of J
[m,n] = size(x);
x = [ones(m,1),x];
theta = zeros(3,1);
J = computeCost(x,y,theta);

theta = gradientDecent(x, y, theta);
X = 25:100;
Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);
plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');
pause;

gradientDecent

function theta = gradientDecent(x, y, theta)

%% compute GradientDecent 更新theta,利用的是增量梯度下降
m = size(x,1);
alph = 0.001;
for iter = 1:150000
for j = 1:3
dec = 0;
for i = 1:m
dec = dec + (y(i) - sigmoid(x(i,:)*theta))*x(i,j);
end
theta(j,1) = theta(j,1) + dec*alph/m;
end
end
end

sigmoid

function g = sigmoid(z)

%% SIGMOID Compute sigmoid functoon

g = 1/(1+exp(-z));

end

computeCost

function J = computeCost(x, y, theta)

%% compute cost: J

m = size(x,1);
J = 0;
for i = 1:m
J =  J + y(i)*log(sigmoid(x(i,:)*theta)) + (1 - y(i))*log(1 - sigmoid(x(i,:)*theta));
end
J = (-1/m)*J;
end

结果如下：

2. 利用fminunc函数：

　　给出损失JJ的计算方式和θ\theta的计算方式，然后调用fminunc函数计算出最优解

主要函数及功能：

Logistics_Regression 相当于主函数

computeCost给出JJ和θ\theta的计算方式

sigmoid函数

Logistics_Regression

%%  part0： 准备
data = load('ex2data1.txt');
x = data(:,[1,2]);
y = data(:,3);
pos = find(y==1);
neg = find(y==0);

x1 = x(:,1);
x2 = x(:,2);
plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');
pause;

%% part1: GradientDecent and compute cost of J
[m,n] = size(x);
x = [ones(m,1),x];
theta = zeros(3,1);
options = optimset('GradObj', 'on', 'MaxIter', 400);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost
[theta, cost] = ...
fminunc(@(t)(computeCost(x,y,t)), theta, options);
X = 25:100;
Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);
plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');
pause;

sigmoid

function g = sigmoid(z)

%% SIGMOID Compute sigmoid functoon

g = zeros(size(z));
g = 1.0 ./ (1.0 + exp(-z));

end

computeCost

function [J,grad] = computeCost(x, y, theta)

%% compute cost: J

m = size(x,1);
grad = zeros(size(theta));
hx = sigmoid(x * theta);
J = (1.0/m) * sum(-y .* log(hx) - (1.0 - y) .* log(1.0 - hx));
grad = (1.0/m) .* x' * (hx - y);
end

结果

Logistic multi_class

1.条件

自己做的数据：

1,5,1
1,6,1
1.5,3.5,1
2.5,3.5,1
2,6,1
3,7,1
4,6,1
3.5,4.5,1
2,4,1
2,5,1
4,4,1
5,5,1
6,4,1
5,3,1
4,2,1
4,3,2
5,3,2
5,2,2
5,1.5,2
7,1.5,2
5,2.5,2
6,2.5,2
5.5,2.5,2
5,1,2
6,2,2
6,3,2
5,4,2
7,5,2
7,2,2
8,1,2
8,3,2
7,4,3
7,5,3
8.5,5.5,3
9,4,3
8,5.5,3
8,4.5,3
9.5,5.5,3
8,4.5,3
8.5,4.5,3
7,6,3
6,5,3
9,5,3
9,6,3
8,6,3
8,7,3
10,6,3
10,4,3

数据离散图：

2.算法推到

花费JJ :

更新θ\theta：

算法思路（这个算法也叫one_vs_all）：

如果样本分成K类，，那我们训练K组θ\theta，依次考虑每一类样本，然后把其它的所有样本当做一类样本，这样就把这类样本和其它分开了。我们把考虑的那类样本的yy值改为1，其它为0。这样就得到K组θ\theta值。

3.代码实现：

这里采用fminuc函数实现

1.函数级功能简介：

Logistic_Regression : 相当于主函数

oneVsAll: 写成一个循环，依次计算出K组θ\theta，利用fminunc调用计算函数

computeCost：其中主要写JJ&θ\theta更新函数

2.代码：

Logistic_Regerssion:

%%  part0： 准备
data = load('data.txt');
x = data(:,[1,2]);
y = data(:,3);
y1 = find(y==1);
y2 = find(y==2);
y3 = find(y==3);

plot(x(y1,1),x(y1,2),'r*',x(y2,1),x(y2,2),'c+',x(y3,1),x(y3,2),'bo');
pause;

%% part1: GradientDecent and compute cost of J
[m,n] = size(x);
x = [ones(m,1),x];
theta = zeros(3,3);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost

[thetas,cost]= one_vs_all(x,y,theta);
X = 1:10;
Y1 = -(thetas(1,1) + thetas(2,1)*X)/thetas(3,1);
Y2 = -(thetas(1,2) + thetas(2,2)*X)/thetas(3,2);
Y3 = -(thetas(1,3) + thetas(2,3)*X)/thetas(3,3);
plot(x(y1,2),x(y1,3),'r*',x(y2,2),x(y2,3),'c+',x(y3,2),x(y3,3),'bo');
hold on
plot(X,Y1,'r',X,Y2,'g',X,Y3,'c');

one_vs_all:

function [theta,cost] = one_vs_all(x, y, theta)

%% compute cost: J

options = optimset('GradObj', 'on', 'MaxIter', 400);
n = size(x,2);
cost = zeros(n,1);
num_labels = 3;
for i = 1:num_labels
L = logical(y==i);
[theta(:,i), cost(i,1)] = ...
fminunc(@(t)(computeCost(x,L,t)), theta(:,i), options);
end

computeCost:

function [J,grad] = computeCost(x, y, thetas)

%% compute cost: J

m = size(x,1);
grad = zeros(size(thetas));
hx = sigmoid(x * thetas);
J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));
grad = (1.0/m) .* x' * (hx - y);
end

3.效果：

θ\theta & JJ cost：

thetas =

6.3988    5.1407  -24.4266
-2.0773    0.2173    2.1641
0.9857   -1.9490    2.2038

>> cost

cost =

0.1715
0.2876
0.1031

图形显示：

注意三条线组成的三角形，，这个地方的点不属于任何类别。

补充：

1.regularized Logistic Regerssion

regularized 和普通的Logistics没有太大的区别，只是在JJ的计算和θ\theta更新中加上了以前的结果。

2.one_vs_all:

1.简介：

其实one_vs_all还有一种算法，把θ\theta当做单隐层前馈神经网络进行计算，比如说我们有K类样本，第一类样本我们可以看成[1,0,0,0...][1,0,0,0...]共k个数，，然后依次，，第ii个为1则代表第ii类样本。计算方式和上面multi_class一样。

前馈神经网络模型如下：

2.代码：

函数介绍：

one_vs_all:相当于主函数，

IrCostFunction:花费JJ和θ\theta更新

myPredict:统计训练误差

数据和训练得到的θ\theta：

点击这儿下载

训练结果：

Local minimum found.

Optimization completed because the size of the gradient is less than
the default value of the function tolerance.

<stopping criteria details>

Local minimum found.

Optimization completed because the size of the gradient is less than
the default value of the function tolerance.

<stopping criteria details>

Training Set Accuracy: 100.000000

one_vs_all:

function [all_theta,cost] = oneVsAll(X, y, num_labels)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logisitc regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds
%   to the classifier for label i

% Some useful variables

m = size(X, 1);
n = size(X, 2);

% You need to return the following variables correctly
all_theta = zeros(n+1,num_labels);

% Add ones to the X data matrix
X = [ones(m, 1),X];

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
%               logistic regression classifiers with regularization
%               parameter lambda.
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell use
%       whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
%       function. It is okay to use a for-loop (for c = 1:num_labels) to
%       loop over the different classes.
%
%       fmincg works similarly to fminunc, but is more efficient when we
%       are dealing with large number of parameters.
%
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
%
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost
%     [theta] = ...
%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
%                 initial_theta, options);
%

cost = zeros(num_labels,1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
for i =1:num_labels
L = logical(y==i);
[all_theta(:,i),cost(i,1)] = ...
fminunc (@(t)(lrCostFunction(t, X, L)),all_theta(:,i), options);
end

myPredict(all_theta,X,y);

% =========================================================================

end

IrCostFunction:

function [J,grad] = lrCostFunction(thetas,x, y)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

%单独调试该函数时用的代码
%x = [ones(m,1),x];
%theta = zeros(size(x,2),1);
%y = logical(y==1);

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
%       efficiently vectorized. For example, consider the computation
%
%           sigmoid(X * theta)
%
%       Each row of the resulting matrix will contain the value of the
%       prediction for that example. You can make use of this to vectorize
%       the cost function and gradient computations.
%
% Hint: When computing the gradient of the regularized cost function,
%       there're many possible vectorized solutions, but one solution
%       looks like:
%           grad = (unregularized gradient for logistic regression)
%           temp = theta;
%           temp(1) = 0;   % because we don't add anything for j = 0
%           grad = grad + YOUR_CODE_HERE (using the temp variable)
%

grad = zeros(size(thetas));
hx = sigmoid(x * thetas);
J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));
grad = (1.0/m) .* x' * (hx - y);
% ================================================x=============

end

myPredict:

function p = myPredict(Theta1,X,y)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = 10;

% You need to return the following variables correctly
p = zeros(size(X, 1), 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned neural network. You should set p to a
%               vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
%       function can also return the index of the max element, for more
%       information see 'help max'. If your examples are in rows, then, you
%       can use max(A, [], 2) to obtain the max for each row.
%

z_2 = X*Theta1;
a_2 = sigmoid(z_2);
for i = 1:m
for j = 1:num_labels
if a_2(i,j) >= 0.5
p(i,1) = j;
break;
end
end
end
fprintf('\nTraining Set Accuracy: %f\n', mean(double(p == y)) * 100);

% =========================================================================

end

与本博客相关知识链接：

由Logistics Regression multi_class中的one_vs _all算法 ——> 双隐层前馈神经网络：BP神经网络

由Logistic Regerssion —–> SVM : 特征空间映射

Logistic Regerssion 的理论解释：概率论解释

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航