吴恩达机器学习笔记(五)--多变量线性回归
吴恩达机器学习笔记(五)–多变量线性回归
学习基于:吴恩达机器学习.
1. Multiple Features
Linear regression with multiple variables is also known as “multivariate linear regression”.
equation | notation |
---|---|
xj(i)x_j^{(i)}xj(i) | value of feature j in the ith training example |
x(i)x^{(i)}x(i) | the input (features) of the ith training example |
mmm | the number of training examples |
nnn | the number of features |
-
The multivariable form of the hypothesis function accommodating these multiple features is as follows:
hθ(x)=θ0x0+θ1x1+θ2x2+...+θnxnh_\theta(x) = \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_nx_nhθ(x)=θ0x0+θ1x1+θ2x2+...+θnxn (x0≡1)( x_0 \equiv 1 )(x0≡1) -
Using the definition of matrix multiplication, our multivariable hypothesis function can be concisely represented as:
hθ(x)=[θ0θ1...θn][x0x1...xn]=θTxh_\theta(x) = \left[ \begin{matrix} \theta_0 & \theta_1 & ... & \theta_n \end{matrix} \right]\left[ \begin{matrix} x_0 \\ x_1 \\ ... \\ x_n \end{matrix} \right] = \theta^Txhθ(x)=[θ0θ1...θn]⎣⎢⎢⎡x0x1...xn⎦⎥⎥⎤=θTx
2. Gradient Descent For Multiple Variables
The gradient descent equation itself is generally the same form; we just have to repeat it for our ‘n’ features:
- repeat until convergence: {
θj:=θj−α1m∑i=1m(hθ(xi)−yi)xj(i)\theta_{j} := \theta_{j} - \alpha\frac{1}{m}\sum_{i = 1}^{m}(h_{\theta}(x^{i})-y^{i})x_j^{(i)}θj:=θj−αm1∑i=1m(hθ(xi)−yi)xj(i)
for j:=0...nj := 0 ... nj:=0...n
}
1) Feature Scaling
We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.
- The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally: −1≤xi≤1-1 \leq x_i \leq 1−1≤xi≤1
2) Learning Rate
-
This is the gradient descent algorithm:
θj:=θj−α∂∂θjJ(θ0,θ1).\theta_{j} := \theta_{j} - \alpha\frac{\partial}{\partial\theta_{j}}J(\theta_{0}, \theta_{1}).θj:=θj−α∂θj∂J(θ0,θ1).
We need to adjust the value of α\alphaα so that gradient descent can converge
- If α is too small: slow convergence.
- If α is too large: may not decrease on every iteration and thus may not converge.
3. Polynomial Regression
Our hypothesis function need not be linear (a straight line) if that does not fit the data well.
- For example: hθ(x)=θ0+θ1x+θ2x2h_{\theta}(x) = \theta_0 + \theta_1x +\theta_2x^2hθ(x)=θ0+θ1x+θ2x2
4. Normal Equation
We can use normal equation to get the optimal value of θ\thetaθ.
- θ=(XTX)−1XTY\theta = (X^TX)^{-1}X^TYθ=(XTX)−1XTY
In Octave or MATLAB:
pinv(X'*X)*X'*Y
function pinv() is to calculate the pseudo inverse matrix, so no matter the matrix is invertible or not, we can still get the correct result.
So what’s the difference between gradient descent and normal equation?
Difference | Gradient Descent | Normal Equation |
---|---|---|
Need to choose α\alphaα | Yes | No |
Need many iterations | Yes | No |
When nnn is large | Works well | Works slowly |
- 吴恩达Coursera机器学习课程笔记-单变量线性回归
- 吴恩达机器学习笔记(四)多变量线性回归
- 吴恩达机器学习笔记(3)——多变量线性回归(Multivariate Linear Regression)
- 机器学习笔记(四) 多变量线性回归
- 吴恩达机器学习课程笔记——Ch2 单变量线性回归
- Coursera公开课笔记: 斯坦福大学机器学习第二课“单变量线性回归(Linear regression with one variable)”
- [机器学习笔记] Note3--多变量线性回归
- Andrew NG机器学习课程笔记系列之——机器学习之单变量线性回归(Linear Regression with One Variable)
- Andrew NG 机器学习 笔记-week2-多变量线性回归
- 机器学习笔记-第一章 单变量线性回归
- 第一周(基础知识 + 单变量线性回归)-【机器学习-Coursera Machine Learning-吴恩达】
- [机器学习笔记] Note2--单变量线性回归
- Coursera公开课笔记: 斯坦福大学机器学习第四课“多变量线性回归(Linear Regression with Multiple Variables)”
- Andrew NG 机器学习 笔记-week1-单变量线性回归
- 斯坦福机器学习笔记三 - 多变量线性回归
- 斯坦福大学机器学习笔记——多变量的线性回归以及梯度下降法注意事项(内有代码)
- 机器学习笔记_02单变量线性回归
- Coursera公开课笔记: 斯坦福大学机器学习第二课“单变量线性回归(Linear regression with one variable)”
- 吴恩达机器学习线性回归练习题:多变量线性回归(python实现)
- 斯坦福机器学习笔记-单变量线性回归