吴恩达课程深度学习笔记
2018-02-08 11:22
363 查看
1. Train / Dev / Test sets
Then traditionally you might take all the data you have and carve off some portion of it to be your training set. Some portion of it to be your hold-out cross validation set, and this is sometimes also called the development set. And for brevity I'm just going to call this the dev set, but all of these terms mean roughly the same thing.
吴恩达的以上表述似乎不是很准确,见如下:
First, I think you're mistaken about what the three partitions do. You don't make any choices based on the test data. Your algorithms adjust their parameters based on the training data. You then run them on the validation data to compare your algorithms (and their trained parameters) and decide on a winner. You then run the winner on your test data to give you a forecast of how well it will do in the real world.You don't validate on the training data because that would overfit your models. You don't stop at the validation step's winner's score because you've iteratively been adjusting things to get a winner in the validation step, and so you need an independent test (that you haven't specifically been adjusting towards) to give you an idea of how well you'll do outside of the current arena.
来自: https://stats.stackexchange.com/questions/9357/why-only-three-partitions-training-validation-test/9364#9364 《 Elements of statistical learning 》中说:
The training set is used to fit the models; the validation set is used to estimate prediction error for model selection; the test set is used for assessment of the generalization error of the final chosen model.
2. 第二课第一周作业介绍里有一句“Recognize that a model without regularization gives you a better accuracy on the training set but nor necessarily on the test set”????错了吧???
正则化消除过拟合的代价难道不是 make the accuracy on the training set worse?
3. A well chosen initialization can:
Speed up the convergence of gradient descent
Increase the odds of gradient descent converging to a lower training (and generalization) error
4.There is also some evidence that the ease of learning an identity function--even more than skip connections helping with vani
4000
shing gradients--accounts for ResNets' remarkable performance
5.
The skip-connections help to address the Vanishing Gradient problem. They also make it easy for a ResNet block to learn an identity function.
6. y=[pc,bx,by,bh,bw,c1,c2,c3] 其中x轴怎么是横向?
7. style cost中,Gij其实就是layer i和j的units值的协方差(未扣除均值偏移的)
Then traditionally you might take all the data you have and carve off some portion of it to be your training set. Some portion of it to be your hold-out cross validation set, and this is sometimes also called the development set. And for brevity I'm just going to call this the dev set, but all of these terms mean roughly the same thing.
吴恩达的以上表述似乎不是很准确,见如下:
First, I think you're mistaken about what the three partitions do. You don't make any choices based on the test data. Your algorithms adjust their parameters based on the training data. You then run them on the validation data to compare your algorithms (and their trained parameters) and decide on a winner. You then run the winner on your test data to give you a forecast of how well it will do in the real world.You don't validate on the training data because that would overfit your models. You don't stop at the validation step's winner's score because you've iteratively been adjusting things to get a winner in the validation step, and so you need an independent test (that you haven't specifically been adjusting towards) to give you an idea of how well you'll do outside of the current arena.
来自: https://stats.stackexchange.com/questions/9357/why-only-three-partitions-training-validation-test/9364#9364 《 Elements of statistical learning 》中说:
The training set is used to fit the models; the validation set is used to estimate prediction error for model selection; the test set is used for assessment of the generalization error of the final chosen model.
2. 第二课第一周作业介绍里有一句“Recognize that a model without regularization gives you a better accuracy on the training set but nor necessarily on the test set”????错了吧???
正则化消除过拟合的代价难道不是 make the accuracy on the training set worse?
3. A well chosen initialization can:
Speed up the convergence of gradient descent
Increase the odds of gradient descent converging to a lower training (and generalization) error
4.There is also some evidence that the ease of learning an identity function--even more than skip connections helping with vani
4000
shing gradients--accounts for ResNets' remarkable performance
5.
The skip-connections help to address the Vanishing Gradient problem. They also make it easy for a ResNet block to learn an identity function.
6. y=[pc,bx,by,bh,bw,c1,c2,c3] 其中x轴怎么是横向?
7. style cost中,Gij其实就是layer i和j的units值的协方差(未扣除均值偏移的)
相关文章推荐
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(4-4)-- 特殊应用:人脸识别和神经风格迁移
- 深度学习:吴恩达深度学习课程笔记之CNN
- 吴恩达深度学习课程笔记之神经网络基础
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(4-3)-- 目标检测
- 吴恩达深度学习课程笔记-3
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(5-2)-- NLP和词嵌入
- 吴恩达神经网络和深度学习课程自学笔记(八)之机器学习策略
- 吴恩达深度学习课程笔记-4
- 吴恩达神经网络和深度学习课程自学笔记(三)之浅层神经网络
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(1-2)-- 神经网络基础(转载)
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(4-1)-- 卷积神经网络基础
- 吴恩达深度学习课程笔记
- 吴恩达深度学习课程第二部分笔记要点
- AI角 | 把吴恩达深度学习系列课程画出来,这有份诚意满满的笔记求查收
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(2-3)-- 超参数调试 和 Batch Norm
- 吴恩达深度学习课程笔记
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(1-4)-- 深层神经网络
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(4-2)-- 深度卷积模型
- 这份深度学习课程笔记获吴恩达点赞
- 吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(3-2)-- 机器学习策略(2)