Machine Learning Week 3
2015-11-01 22:12
411 查看
Linear Regression cant be used for Classification Problems
Linear Regression isnt Working Well for Regression Problem
Logistic Regression Model
Decision Boundary
Cost Function and Gradient Descent for Logistic Regression
Multiclass Classification
One Vs All
Problem of Overfitting
How to solve overfitting - Regularisation
Cost Function and Gradient Descent With Regularization
Normal Equation
Regularized Logistic Regression
Weekly Matlab Exercise
hθ(x)=g(θTx)
g(Z)=11+e−Z
216cb
0,then y=0
Example:
Together will be:
J(θ)=1m∑i=1mCost(hθ(x(i)),y(i))
for linear regression:
Cost(hθ(x(i)),y(i))=12(hθ(x(i))−y(i))2
for logistic regression:
Cost(hθ(x),y)=−ylog(hθ(x)−(1−y)log(1−hθ(x))
Gradient descent of logistic regression is the same with linear regression
θj:=θj−α∑i=1m(hθ(x(i))−y(i))x(i)j
Several different way of optimization algorithm
Intuition of Regularisation - to make θ small
Shrink all the parameters - starts from θ1 not from θ0:
J(θ)=12m[∑i=1m(hθ(x(i))−y(i))2+λ∑i=1nθ2j]
Regularisation Parameters
But the new parameter λ used for regularisation can’t be too big otherwise will cause the all the θ too small (almost equals to 0) - underfitting
J(θ)=12m[∑i=1m(hθ(x(i))−y(i))2+λ∑i=1nθ2j]
Gradient Descent:
θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x(i)0
θj:=θj−α[1m∑i=1m(hθ(x(i))=y(i))x(i)j+λmθj(j=1,23...,n)
The θ corresponding to global minimum:
θ=(XTX+λ[000010001.........])−1XTy
This will also make the original non-invertible matrix invertible
J(θ)=−[1m∑i=1my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]+λ2m∑j=1nθ2j
Repeating the Gradient Descent:
θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x(i)o
θj:=θj−α1m∑i=1m(hθ(x(i))−y(i))x(i)j+λmθj
(j=1,2,3...),hθ(x)=11+e−θTX
Matlab syntax:
fminunc(costFunction)
function[jVal,gradient]=costFunction(X,y,θ)
Linear Regression isnt Working Well for Regression Problem
Logistic Regression Model
Decision Boundary
Cost Function and Gradient Descent for Logistic Regression
Multiclass Classification
One Vs All
Problem of Overfitting
How to solve overfitting - Regularisation
Cost Function and Gradient Descent With Regularization
Normal Equation
Regularized Logistic Regression
Weekly Matlab Exercise
Linear Regression can’t be used for Classification Problems
Linear Regression isn’t Working Well for Regression Problem
An extra unusual point may affect all the linear function thus cause some error in the classification process.Logistic Regression Model
Sigmoid Function or Logistic Functionhθ(x)=g(θTx)
g(Z)=11+e−Z
Decision Boundary
The predict is that if hθ(x)≥0.5,then y=1,and if hθ(x)<0.5,then y=0 which is also equivalent to that if θTx≥0,then y=1 and if θTx<216cb
0,then y=0
Example:
Cost Function and Gradient Descent for Logistic Regression
Cost Function for Logistic RegressionTogether will be:
J(θ)=1m∑i=1mCost(hθ(x(i)),y(i))
for linear regression:
Cost(hθ(x(i)),y(i))=12(hθ(x(i))−y(i))2
for logistic regression:
Cost(hθ(x),y)=−ylog(hθ(x)−(1−y)log(1−hθ(x))
Gradient descent of logistic regression is the same with linear regression
θj:=θj−α∑i=1m(hθ(x(i))−y(i))x(i)j
Several different way of optimization algorithm
Multiclass Classification
One Vs All
Problem of Overfitting
Under fitting - Just right - Over fittingHow to solve overfitting - Regularisation
Intuition of Regularisation - to make θ small
Shrink all the parameters - starts from θ1 not from θ0:
J(θ)=12m[∑i=1m(hθ(x(i))−y(i))2+λ∑i=1nθ2j]
Regularisation Parameters
But the new parameter λ used for regularisation can’t be too big otherwise will cause the all the θ too small (almost equals to 0) - underfitting
Cost Function and Gradient Descent With Regularization
Cost Function:J(θ)=12m[∑i=1m(hθ(x(i))−y(i))2+λ∑i=1nθ2j]
Gradient Descent:
θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x(i)0
θj:=θj−α[1m∑i=1m(hθ(x(i))=y(i))x(i)j+λmθj(j=1,23...,n)
Normal Equation
The θ corresponding to global minimum:
θ=(XTX+λ[000010001.........])−1XTy
This will also make the original non-invertible matrix invertible
Regularized Logistic Regression
Cost Function:J(θ)=−[1m∑i=1my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]+λ2m∑j=1nθ2j
Repeating the Gradient Descent:
θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x(i)o
θj:=θj−α1m∑i=1m(hθ(x(i))−y(i))x(i)j+λmθj
(j=1,2,3...),hθ(x)=11+e−θTX
Matlab syntax:
fminunc(costFunction)
function[jVal,gradient]=costFunction(X,y,θ)
Weekly Matlab Exercise
%sigmoid: function g = sigmoid(z) g = zeros(size(z)); g=1./(1+exp(-z)) end %plotData: function plotData(X, y) figure; hold on; pos=find(y==1);neg=find(y==0); plot(X(pos,1),X(pos,2),'k+','LineWidth',2,'MarkerSize',7); plot(X(neg,1),X(neg,2),'ko','MarkerFaceColor','y','MarkerSize',7); hold off; end %costFunction(without regularisation): function [J, grad] = costFunction(theta, X, y) m = length(y); % number of training examples J = 0; grad = zeros(size(theta)); J=1/m*(-log(sigmoid(X*theta))'*y-log(1-sigmoid(X*theta))'*(1-y)); grad=1/m*((sigmoid(X*theta)-y)'*X)'; end %predict: function p = predict(theta, X) m = size(X, 1); % Number of training examples p = zeros(m, 1); p=round(1./(1+exp(-X*theta))) end %costFunction(with regularisation): function [J, grad] = costFunctionReg(theta, X, y, lambda) m = length(y); % number of training examples J = 0; grad = zeros(size(theta)); theta2=theta(2:size(theta,1),1) J=1/m*(-y'*log(sigmoid(X*theta))-(1.-y)'*log(1-sigmoid(X*theta)))+lambda./(2*m)*(theta2'*theta2) grad1=(1/m*(sigmoid(X*theta)-y)'*X)' theta1=lambda/m*theta theta1(1,1)=0 grad=grad1+theta1 end %use the fminunc function: initial_theta = zeros(size(X, 2), 1); % Set regularization parameter lambda to 1 (you should vary this) lambda = 1; % Set Options options = optimset('GradObj', 'on', 'MaxIter', 400); % Optimize [theta, J, exit_flag] = ... fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
相关文章推荐
- 用Python从零实现贝叶斯分类器的机器学习的教程
- My Machine Learning
- 机器学习---学习首页 3ff0
- 也谈 机器学习到底有没有用 ?
- 量子计算机编程原理简介 和 机器学习
- 初识机器学习算法有哪些?
- 10个关于人工智能和机器学习的有趣开源项目
- 机器学习实践中应避免的7种常见错误
- 机器学习书单
- 北美常用的机器学习/自然语言处理/语音处理经典书籍
- 如何提升COBOL系统代码分析效率
- 自动编程体系设想(一)
- 自动编程体系设想(一)
- 支持向量机(SVM)算法概述
- 神经网络初步学习手记
- 常用的分类评估--基于R语言
- 开始spark之旅
- spark的几点备忘
- 关于机器学习的学习笔记(一):机器学习概念
- 关于机器学习的学习笔记(二):决策树算法