您的位置：首页 > 移动开发

Stanford机器学习第六讲（上）Advices for applying machine learning--Deciding what to try next

2013-06-12 16:39 603 查看

Advices for applying machine learning--Deciding what to try next

下面介绍prettysimple technique来排除上面的某些调整方法，使得不用漫无目的的调整，减少没有意义的调整，节省时间。如一些Machine learning diagnostics，让你采取有效方式提升算法performance.

它有助于帮助你提前发现你打算进行的尝试是毫无结果的。

一、如何evaluate algorithms

当fitparameters时，尝试选择那些能minimize training error的parameters，但是trainingerror越小hypothesis不一定越好。

首先将dataset划分成training set和test set。如下图所示。

接下来计算error。

二、Model Selection and Train/Validation/Test Sets

假设你想确定拟合dataset的polynomial的阶数（算法该包含哪些features）或是regularizationparameter
lamda，那么该如何决定，这就是model selection。

按照之前的做法是，将dataset划分成training set和test set，在training set上求出theta，用此theta在test
set上计算test error，如下图所示

[align=left] [/align]

这里当d=5的时候test error最小，那么我们

作为最终模型.

这时我们或许会问Howwell does the model generalize?我们能做的就是看我们选择的5th order polynomial hypothesis在test
set上效果如何。

但是这并不是fair estimate of how well the hypothesis generalize.

原因是我们在testset上fit出了这个extra parameter d(degree of polynomial)，我们使用test
set选择了d的大小，那么再在这个testset上evaluate我们的hypothesis是no longer
fair的，因为我们的hypothesis is likely to do better on this test set than that wouldon new examples that hasn’t seen before，我们所要考虑的就是要对此进行修正。

换个方式思考，在之前讨论的将dataset划分成training set和test set的情形中，we
saw that if we fit some set of parameters to training set, thenthe performance of this fitted model on the training set is not predictive ofhow well the hypothesis generalize new examples, this because the parameterswere fitted to the training set, so they
would likely do well on the trainingset even if the parameters don’t do well on other examples。

总结一下，Specifically what we did is we fit the parameter d to the test set,and by having fit the parameters of test set means the performance ofhypothesis
on that test set would not be fair estimate of how well thehypothesis is likely to do on new examples we haven’t seen before

To address this problem in a modelselection setting,，我们不仅仅把dataset划分成training set和test
set，而是划分成training set、validation set和test set。如下图所示。

相应的，各种误差计算如下。

我们用trainingset来获得parameter theta，并在validation set上测试出d，因此不能再用validation
set来estimate generalization error，所以用test set来 estimategeneralization error。

用validation set来select the model and evaluate it on the test set。如下图所示。

[align=left] [/align]

三、Bias/Variance

[align=left] [/align]

下面是training error和validation error随着degree of polynomial变化的变动示意图。

在degree ofpolynomial较小时，是low variance, high bias的。在degree ofpolynomial较大时，是high
variance, low bias的。

四、Regularization and Bias/Variance

Regularization能够防止overfitting,下面探讨regularization如何影响Bias和variance。下图以linear
regression为例示范了regularization term对bias和variance的影响。

下面给出各种error的定义

根据Part 2中model selection的内容，给出在有regularization term时的model selection过程。

其中在trainingset上计算出各种lamda下的theta值，根据在validation set上的validation
error确定合适的theta和lamda，由上图所示，最终选中第5个模型，对其计算test error。

下图给出了随着lamda的变化，training error和validation error变动的示意图，以及bias和variance变化的情况。

五、Learning curve

Learning curve用来判断algorithm是suffer Bias还是variance或者二者both。

下图给出了随着training set size变化，validation error和training error变化示意图。

当已经是High bias时，下面给出此种情形下，随着training set size变化，validation error和training
error变化示意图。Validation error和training error都很大，而且二者差距越来越小。注意在算法已经是high bias时，再增加training data对改进算法是没有帮助的。

当已经是High variance时，下面给出此种情形下，随着training set size变化，validation error和training
error变化示意图。Validation error和training error一开始差距很大，随着样本数增加二者差距越来越小。注意在算法已经是high variance时，再增加training
data有助于改进算法。

[align=left] [/align]

由此可知，我们每次调整算法的时候可以将learning curve画出来，据此判断该如何对算法进行调整。

六、Deciding what to do next

在本篇一开始说到当算法predictionerror比较大的时候，可以进行如下调整。下图给出所做的调整对high bias和high variance问题的改进。（可以很据part3、part4、part5中的曲线图得出结论）

[align=left] [/align]

下面谈论下Neuralnetworks中的overfitting问题。虽然small NN能防止overfitting，但是large
NN 加regularization对address overfitting的效果更好。

对于hidden layer个数的选择可以通过cross validation的方法。

另外当cross validation error is much larger than thetraining error时，Is increasing the number of hidden
units likelyto help? The answer is no. Because it is currently sufferingfrom high variance, so adding hidden units is unlikely to help.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航