您的位置:首页 > 其它

吴恩达课程深度学习笔记

2018-02-08 11:22 363 查看
1. Train / Dev / Test sets
Then traditionally you might take all the data you have and carve off some portion of it to be your training set. Some portion of it to be your hold-out cross validation set, and this is sometimes also called the development set. And for brevity I'm just going to call this the dev set, but all of these terms mean roughly the same thing. 
吴恩达的以上表述似乎不是很准确,见如下:
First, I think you're mistaken about what the three partitions do. You don't make any choices based on the test data. Your algorithms adjust their parameters based on the training data. You then run them on the validation data to compare your algorithms (and their trained parameters) and decide on a winner. You then run the winner on your test data to give you a forecast of how well it will do in the real world.You don't validate on the training data because that would overfit your models. You don't stop at the validation step's winner's score because you've iteratively been adjusting things to get a winner in the validation step, and so you need an independent test (that you haven't specifically been adjusting towards) to give you an idea of how well you'll do outside of the current arena.
来自: https://stats.stackexchange.com/questions/9357/why-only-three-partitions-training-validation-test/9364#9364 《 Elements of statistical learning 》中说:
The training set is used to fit the models; the validation set is used to estimate prediction error for model selection; the test set is used for assessment of the generalization error of the final chosen model. 

2. 第二课第一周作业介绍里有一句“Recognize that a model without regularization gives you a better accuracy on the training set but nor necessarily on the test set”????错了吧???
正则化消除过拟合的代价难道不是 make the accuracy on the training set worse?
3. A well chosen initialization can:
Speed up the convergence of gradient descent
Increase the odds of gradient descent converging to a lower training (and generalization) error
4.There is also some evidence that the ease of learning an identity function--even more than skip connections helping with vani
4000
shing gradients--accounts for ResNets' remarkable performance
5.
The skip-connections help to address the Vanishing Gradient problem. They also make it easy for a ResNet block to learn an identity function.
6. y=[pc,bx,by,bh,bw,c1,c2,c3] 其中x轴怎么是横向?
7. style cost中,Gij其实就是layer i和j的units值的协方差(未扣除均值偏移的)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐