Machine Learning week 1 quiz: Linear Regression with One Variable
2015-11-07 21:36
941 查看
Linear Regression with One Variable
1.Consider the problem of predicting how well a student does in her second year of college/university, given how well they did in their first year.
Specifically, let x be equal to the number of "A" grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to
predict the value of y, which we define as the number of "A" grades they get in their second year (sophomore year).
Refer to the following training set of a small sample of different students' performances (note that this training set will also be referenced in other questions in this quiz). Here
each row is one training example. Recall that in linear regression, our hypothesis is hθ(x)=θ0+θ1x,
and we use m to
denote the number of training examples.
For the training set given above, what is the value of m?
In the box below, please enter your answer (which should be a number between 0 and 10).
2.
Consider the following training set of m=4 training
examples:
x | y |
1 | 0.5 |
2 | 1 |
4 | 2 |
0 | 0 |
What are the values of θ0 and θ1 that
you would expect to obtain upon running gradient descent on this model? (Linear regression will be able to fit this data perfectly.)
θ0=0,θ1=0.5
θ0=1,θ1=1
θ0=0.5,θ1=0
θ0=0.5,θ1=0.5
θ0=1,θ1=0.5
3.
Suppose we set θ0=−2,θ1=0.5.
What is hθ(6)?
4.
Let f be
some function so that
f(θ0,θ1) outputs
a number. For this problem,
f is
some arbitrary/unknown smooth function (not necessarily the
cost function of linear regression, so f may
have local optima).
Suppose we use gradient descent to try to minimize f(θ0,θ1)
as a function of θ0 and θ1.
Which of the
following statements are true? (Check all that apply.)
If θ0 and θ1 are
initialized at
a local minimum, then one iteration will not change their values.
If θ0 and θ1 are
initialized so that θ0=θ1,
then by symmetry (because we do simultaneous updates to the two parameters), after one iteration of gradient descent, we will still have θ0=θ1.
If the learning rate is too small, then gradient descent may take a very long
time to converge.
Even if the learning rate α is
very large, every iteration of
gradient descent will decrease the value of f(θ0,θ1).
5.
Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we
have some training set, and for our training set we managed to find some θ0, θ1 such
that J(θ0,θ1)=0.
Which
of the statements below must then be true? (Check all that apply.)
For these values of θ0 and θ1 that
satisfy J(θ0,θ1)=0,
we have that hθ(x(i))=y(i) for
every training example (x(i),y(i))
For this to be true, we must have θ0=0 and θ1=0
so that hθ(x)=0
We can perfectly predict the value of y even
for new examples that we have not yet seen.
(e.g., we can perfectly predict prices of even new houses that we have not yet seen.)
This is not possible: By the definition of J(θ0,θ1),
it is not possible for there to exist
θ0 and θ1 so
that J(θ0,θ1)=0
相关文章推荐
- 用Python从零实现贝叶斯分类器的机器学习的教程
- My Machine Learning
- 机器学习---学习首页 3ff0
- 也谈 机器学习到底有没有用 ?
- 量子计算机编程原理简介 和 机器学习
- 初识机器学习算法有哪些?
- 10个关于人工智能和机器学习的有趣开源项目
- 机器学习实践中应避免的7种常见错误
- 机器学习书单
- 北美常用的机器学习/自然语言处理/语音处理经典书籍
- 如何提升COBOL系统代码分析效率
- 自动编程体系设想(一)
- 自动编程体系设想(一)
- 支持向量机(SVM)算法概述
- 神经网络初步学习手记
- 常用的分类评估--基于R语言
- 开始spark之旅
- spark的几点备忘
- 关于机器学习的学习笔记(一):机器学习概念
- 关于机器学习的学习笔记(二):决策树算法