Classification and logistic regression
2015-08-21 22:18
633 查看
logistic 回归
1.问题:
在上面讨论回归问题时,讨论的结果都是连续类型,但如果要求做分类呢?即讨论结果为离散型的值。2.解答:
假设:其中:
g(z)g(z)的图形如下:
由此可知:当hθ(x)h_\theta(x)<0.5时我们可以认为为0,反之为1,这样就变成离散型的数据了。
推导迭代式:
利用概率论进行推导,找出样本服从的分布类型,利用最大似然法求出相应的θ\theta
因此:
结果:
注意:这里的迭代式增量迭代法
Newton迭代法:
1.问题:
上述迭代法,收敛速度很慢,在利用最大似然法求解的时候可以运用Newton迭代法,即θ\theta := θ−f(θ)f′(θ)\theta - \frac{f(\theta)}{f^{'}(\theta)}2.解答:
推导:Newton迭代法是求θ\theta,且f(θ)=0f(\theta) = 0,刚好:l′(θ)=0l^{'}(\theta) = 0
所以可以将Newton迭代法改写成:
定义:
其中:l′(θ)l'(\theta) =
因此:H矩阵就是l′′(θ)l''(\theta),即H−1H^{-1} = 1/l′′(θ)1 / {l''(\theta)}
所以:
应用:
特征值比较少的情况,否则H−1H^{-1}的计算量是很大的
Logistic 0、1分类:
1.自己设定迭代次数
自己编写相应的循环,给出迭代次数以及下降坡度alpha,进行增量梯度下降。主要函数及功能:
Logistic_Regression 相当于主函数
gradientDecent 梯度下降更新θ\theta函数
computeCost 计算损失JJ函数
Logistic_Regression
%% part0: 准备 data = load('ex2data1.txt'); x = data(:,[1,2]); y = data(:,3); pos = find(y==1); neg = find(y==0); x1 = x(:,1); x2 = x(:,2); plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co'); pause; %% part1: GradientDecent and compute cost of J [m,n] = size(x); x = [ones(m,1),x]; theta = zeros(3,1); J = computeCost(x,y,theta); theta = gradientDecent(x, y, theta); X = 25:100; Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1); plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b'); pause;
gradientDecent
function theta = gradientDecent(x, y, theta) %% compute GradientDecent 更新theta,利用的是增量梯度下降 m = size(x,1); alph = 0.001; for iter = 1:150000 for j = 1:3 dec = 0; for i = 1:m dec = dec + (y(i) - sigmoid(x(i,:)*theta))*x(i,j); end theta(j,1) = theta(j,1) + dec*alph/m; end end end
sigmoid
function g = sigmoid(z) %% SIGMOID Compute sigmoid functoon g = 1/(1+exp(-z)); end
computeCost
function J = computeCost(x, y, theta) %% compute cost: J m = size(x,1); J = 0; for i = 1:m J = J + y(i)*log(sigmoid(x(i,:)*theta)) + (1 - y(i))*log(1 - sigmoid(x(i,:)*theta)); end J = (-1/m)*J; end
结果如下:
2. 利用fminunc函数:
给出损失JJ的计算方式和θ\theta的计算方式,然后调用fminunc函数计算出最优解主要函数及功能:
Logistics_Regression 相当于主函数
computeCost给出JJ和θ\theta的计算方式
sigmoid函数
Logistics_Regression
%% part0: 准备 data = load('ex2data1.txt'); x = data(:,[1,2]); y = data(:,3); pos = find(y==1); neg = find(y==0); x1 = x(:,1); x2 = x(:,2); plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co'); pause; %% part1: GradientDecent and compute cost of J [m,n] = size(x); x = [ones(m,1),x]; theta = zeros(3,1); options = optimset('GradObj', 'on', 'MaxIter', 400); % Run fminunc to obtain the optimal theta % This function will return theta and the cost [theta, cost] = ... fminunc(@(t)(computeCost(x,y,t)), theta, options); X = 25:100; Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1); plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b'); pause;
sigmoid
function g = sigmoid(z) %% SIGMOID Compute sigmoid functoon g = zeros(size(z)); g = 1.0 ./ (1.0 + exp(-z)); end
computeCost
function [J,grad] = computeCost(x, y, theta) %% compute cost: J m = size(x,1); grad = zeros(size(theta)); hx = sigmoid(x * theta); J = (1.0/m) * sum(-y .* log(hx) - (1.0 - y) .* log(1.0 - hx)); grad = (1.0/m) .* x' * (hx - y); end
结果
Logistic multi_class
1.条件
自己做的数据:1,5,1 1,6,1 1.5,3.5,1 2.5,3.5,1 2,6,1 3,7,1 4,6,1 3.5,4.5,1 2,4,1 2,5,1 4,4,1 5,5,1 6,4,1 5,3,1 4,2,1 4,3,2 5,3,2 5,2,2 5,1.5,2 7,1.5,2 5,2.5,2 6,2.5,2 5.5,2.5,2 5,1,2 6,2,2 6,3,2 5,4,2 7,5,2 7,2,2 8,1,2 8,3,2 7,4,3 7,5,3 8.5,5.5,3 9,4,3 8,5.5,3 8,4.5,3 9.5,5.5,3 8,4.5,3 8.5,4.5,3 7,6,3 6,5,3 9,5,3 9,6,3 8,6,3 8,7,3 10,6,3 10,4,3
数据离散图:
2.算法推到
花费JJ :更新θ\theta:
算法思路(这个算法也叫one_vs_all):
如果样本分成K类,,那我们训练K组θ\theta,依次考虑每一类样本,然后把其它的所有样本当做一类样本,这样就把这类样本和其它分开了。我们把考虑的那类样本的yy值改为1,其它为0。这样就得到K组θ\theta值。
3.代码实现:
这里采用fminuc函数实现1.函数级功能简介:
Logistic_Regression : 相当于主函数oneVsAll: 写成一个循环,依次计算出K组θ\theta,利用fminunc调用计算函数
computeCost:其中主要写JJ&θ\theta更新函数
2.代码:
Logistic_Regerssion:%% part0: 准备 data = load('data.txt'); x = data(:,[1,2]); y = data(:,3); y1 = find(y==1); y2 = find(y==2); y3 = find(y==3); plot(x(y1,1),x(y1,2),'r*',x(y2,1),x(y2,2),'c+',x(y3,1),x(y3,2),'bo'); pause; %% part1: GradientDecent and compute cost of J [m,n] = size(x); x = [ones(m,1),x]; theta = zeros(3,3); % Run fminunc to obtain the optimal theta % This function will return theta and the cost [thetas,cost]= one_vs_all(x,y,theta); X = 1:10; Y1 = -(thetas(1,1) + thetas(2,1)*X)/thetas(3,1); Y2 = -(thetas(1,2) + thetas(2,2)*X)/thetas(3,2); Y3 = -(thetas(1,3) + thetas(2,3)*X)/thetas(3,3); plot(x(y1,2),x(y1,3),'r*',x(y2,2),x(y2,3),'c+',x(y3,2),x(y3,3),'bo'); hold on plot(X,Y1,'r',X,Y2,'g',X,Y3,'c');
one_vs_all:
function [theta,cost] = one_vs_all(x, y, theta) %% compute cost: J options = optimset('GradObj', 'on', 'MaxIter', 400); n = size(x,2); cost = zeros(n,1); num_labels = 3; for i = 1:num_labels L = logical(y==i); [theta(:,i), cost(i,1)] = ... fminunc(@(t)(computeCost(x,L,t)), theta(:,i), options); end
computeCost:
function [J,grad] = computeCost(x, y, thetas) %% compute cost: J m = size(x,1); grad = zeros(size(thetas)); hx = sigmoid(x * thetas); J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx)); grad = (1.0/m) .* x' * (hx - y); end
3.效果:
θ\theta & JJ cost:thetas = 6.3988 5.1407 -24.4266 -2.0773 0.2173 2.1641 0.9857 -1.9490 2.2038 >> cost cost = 0.1715 0.2876 0.1031
图形显示:
注意三条线组成的三角形,,这个地方的点不属于任何类别。
补充:
1.regularized Logistic Regerssion
regularized 和 普通的Logistics没有太大的区别,只是在JJ的计算和θ\theta更新中加上了以前的结果。2.one_vs_all:
1.简介:
其实one_vs_all还有一种算法,把θ\theta当做单隐层前馈神经网络进行计算,比如说我们有K类样本,第一类样本我们可以看成[1,0,0,0...][1,0,0,0...]共k个数,,然后依次,,第ii个为1则代表第ii类样本。计算方式和上面multi_class一样。前馈神经网络模型如下:
2.代码:
函数介绍:one_vs_all:相当于主函数,
IrCostFunction:花费JJ和θ\theta更新
myPredict:统计训练误差
数据 和 训练得到的θ\theta:
点击这儿下载
训练结果:
Local minimum found. Optimization completed because the size of the gradient is less than the default value of the function tolerance. <stopping criteria details> Local minimum found. Optimization completed because the size of the gradient is less than the default value of the function tolerance. <stopping criteria details> Training Set Accuracy: 100.000000
one_vs_all:
function [all_theta,cost] = oneVsAll(X, y, num_labels) %ONEVSALL trains multiple logistic regression classifiers and returns all %the classifiers in a matrix all_theta, where the i-th row of all_theta %corresponds to the classifier for label i % [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels % logisitc regression classifiers and returns each of these classifiers % in a matrix all_theta, where the i-th row of all_theta corresponds % to the classifier for label i % Some useful variables m = size(X, 1); n = size(X, 2); % You need to return the following variables correctly all_theta = zeros(n+1,num_labels); % Add ones to the X data matrix X = [ones(m, 1),X]; % ====================== YOUR CODE HERE ====================== % Instructions: You should complete the following code to train num_labels % logistic regression classifiers with regularization % parameter lambda. % % Hint: theta(:) will return a column vector. % % Hint: You can use y == c to obtain a vector of 1's and 0's that tell use % whether the ground truth is true/false for this class. % % Note: For this assignment, we recommend using fmincg to optimize the cost % function. It is okay to use a for-loop (for c = 1:num_labels) to % loop over the different classes. % % fmincg works similarly to fminunc, but is more efficient when we % are dealing with large number of parameters. % % Example Code for fmincg: % % % Set Initial theta % initial_theta = zeros(n + 1, 1); % % % Set options for fminunc % options = optimset('GradObj', 'on', 'MaxIter', 50); % % % Run fmincg to obtain the optimal theta % % This function will return theta and the cost % [theta] = ... % fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ... % initial_theta, options); % cost = zeros(num_labels,1); options = optimset('GradObj', 'on', 'MaxIter', 50); for i =1:num_labels L = logical(y==i); [all_theta(:,i),cost(i,1)] = ... fminunc (@(t)(lrCostFunction(t, X, L)),all_theta(:,i), options); end myPredict(all_theta,X,y); % ========================================================================= end
IrCostFunction:
function [J,grad] = lrCostFunction(thetas,x, y) %LRCOSTFUNCTION Compute cost and gradient for logistic regression with %regularization % J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using % theta as the parameter for regularized logistic regression and the % gradient of the cost w.r.t. to the parameters. % Initialize some useful values m = length(y); % number of training examples %单独调试该函数时用的代码 %x = [ones(m,1),x]; %theta = zeros(size(x,2),1); %y = logical(y==1); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta. % You should set J to the cost. % Compute the partial derivatives and set grad to the partial % derivatives of the cost w.r.t. each parameter in theta % % Hint: The computation of the cost function and gradients can be % efficiently vectorized. For example, consider the computation % % sigmoid(X * theta) % % Each row of the resulting matrix will contain the value of the % prediction for that example. You can make use of this to vectorize % the cost function and gradient computations. % % Hint: When computing the gradient of the regularized cost function, % there're many possible vectorized solutions, but one solution % looks like: % grad = (unregularized gradient for logistic regression) % temp = theta; % temp(1) = 0; % because we don't add anything for j = 0 % grad = grad + YOUR_CODE_HERE (using the temp variable) % grad = zeros(size(thetas)); hx = sigmoid(x * thetas); J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx)); grad = (1.0/m) .* x' * (hx - y); % ================================================x============= end
myPredict:
function p = myPredict(Theta1,X,y) %PREDICT Predict the label of an input given a trained neural network % p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the % trained weights of a neural network (Theta1, Theta2) % Useful values m = size(X, 1); num_labels = 10; % You need to return the following variables correctly p = zeros(size(X, 1), 1); % ====================== YOUR CODE HERE ====================== % Instructions: Complete the following code to make predictions using % your learned neural network. You should set p to a % vector containing labels between 1 to num_labels. % % Hint: The max function might come in useful. In particular, the max % function can also return the index of the max element, for more % information see 'help max'. If your examples are in rows, then, you % can use max(A, [], 2) to obtain the max for each row. % z_2 = X*Theta1; a_2 = sigmoid(z_2); for i = 1:m for j = 1:num_labels if a_2(i,j) >= 0.5 p(i,1) = j; break; end end end fprintf('\nTraining Set Accuracy: %f\n', mean(double(p == y)) * 100); % ========================================================================= end
与本博客相关知识链接:
由Logistics Regression multi_class中的one_vs _all算法 ——> 双隐层前馈神经网络 :BP神经网络由Logistic Regerssion —–> SVM : 特征空间映射
Logistic Regerssion 的理论解释: 概率论解释
相关文章推荐
- windows下面常用的***测试命令
- Meteor.call Wrapper
- GDOI2016模拟8.21新Nim游戏
- C函数返回局部变量
- 【POJ 1845】 Sumdiv (整数唯分+约数和公式+二分等比数列前n项和+同余)
- CocoaPods详解之----使用篇
- WEB基础之:CSS
- 2015多校联合第十场hdu5410CRB and His Birthday 01背包+完全背包
- Android 百度地图API(01)_开发环境 HelloBaiduMap
- 阅读干货-大数据和Python
- POJ 2777 Count Color (线段树区间修改 + 状态压缩)
- 我的脚印
- POJ 2777 Count Color (线段树区间修改 + 状态压缩)
- Android学习笔记(十)
- WEB基础之:HTML
- BufferedReader子类LineNumberReader装饰类的简单介绍
- Biker's Trip Odomete
- Java 泛型详解
- 8_21日作业date命令
- QT中静态库的生成与使用