哥伦比亚大学Coursera课程Natural Language Processing:Quiz 1: covers material from weeks 1 and 2
2016-02-18 09:46
453 查看
Quiz 1: covers material from weeks 1 and 2Help
Center
Warning: The hard deadline has passed. You can attempt it, but you will not get credit for it. You are welcome to try it as a learning exercise.This is an open note quiz: you can use the slides from the class, and the notes at http://www.cs.columbia.edu/~mcollins/notes-spring2013.html as a resource.
In
accordance with the Coursera Honor Code, I certify that the answers here are my own work.
Question 1
Say we'd like to derive the Viterbi algorithm for a bigram HMM tagger. The model takes the formp(x1…xn,y1…yn+1)=∏n+1i=1q(yi|yi−1)∏ni=1e(xi|yi).Which of the following statements is true?
We
can use a dynamic programming algorithm with entries π(k,u),
and definitions π(0,∗)=1 and π(k,v)=maxu∈Sk−1(π(k−1,u)×q(v|u)×e(xk|v))
We
can use a dynamic programming algorithm with entries π(k,u),
and definitions π(0,∗)=1 and π(k,v)=maxu∈Sk−1(π(k−2,u)×q(v|u)×e(xk|v))
We
can implement the Viterbi algorithm in exactly the same way as before, but with the following modification to the recursive definition: π(k,u,v)=maxw∈Sk−2(π(k−1,w,u)×q(v|u)×e(xk|v))
Question 2
Say we define a backed-off model qBO(wi|wi−1) exactlyas we defined it in lecture, and we define the discounted counts as Count∗(wi−1,wi)=Count(wi−1,wi)−1.5.
Which of the following statements is true?
There
may be some words u such
that ∑v∈V∪{STOP}qBO(v|u)≠1.
There
may be some bigrams u,v such
that qBO(v|u)<0
Question 3
Consider the following two bigram language models (recall that a bigram language model defines p(x1…xn)=∏ni=1q(xi|xi−1)).Language Model 1
V={the,dog}
q(the|∗)=q(dog|the)=q(STOP|dog)=1
All other q parameters
are equal to 0.
Language Model 2
V={the,a,dog}
q(the|∗)=q(a|∗)=0.5
q(dog|a)=q(dog|the)=q(STOP|dog)=1
All other q parameters
are equal to 0.
Now assume that we have a test sentence consisting of a single sentence,
the dog STOP
Which language model gives lower perplexity on this test corpus?
Language
Model 1
Language
Model 2
Question 4
We are now going to derive a version of the Viterbi algorithm that takes as input an integer n,and finds
maxy1…yn+1,x1…xnp(x1…xn,y1…yn+1)
for a trigram tagger, as defined in lecture. Hence the input to the algorithm is an integer n,
and the output from the algorithm is the highest scoring pair of sequences x1…xn, y1…yn+1 under
the model.
Which of the following recursive definitions gives a correct algorithm for this problem?
π(0,∗,∗)=1,
and π(k,u,v)=maxw∈Sk−2(π(k−1,w,u)×q(v|w,u))
π(0,∗,∗)=1,
and π(k,u,v)=maxw∈Sk−2(π(k−1,w,u)×q(v|w,u)×m(v)),
where m(v)=maxx∈Ve(x|v)
None
of the above.
Question 5
We'd like to define a language model with V={the,a,dog},and
p(x1…xn)=γ×0.5n
for any x1…xn,
such that xi∈V for i=1…(n−1),
and xn=STOP,
where γ is
some expression.
What should our definition of γ be?
(Hint: recall that ∑∞n=10.5n=1)
γ=13n
γ=13n−1
γ=3n
γ=3n−1
γ=1
Question 6
Say we train a trigram HMM tagger on a training set with the following two sentences:the dog saw the cat, D N V D N
the cat saw the saw, D N V D N
Assume that we estimate the parameters of the HMM with maximum-likelihood estimation (no smoothing).
Now assume that we have the sentence
x1…xn= the
cat saw the saw
what is the value for
maxy1…yn+1p(x1…xn,y1…yn+1) in
this case? (Please give your answer up to 3 decimal places.)
Answer for Question 6
Question 7
Assume we have a bigram language model withV={the,a}
q(a|∗)=0.6, q(the|∗)=0.4, q(a|a)=0.9, q(STOP|a)=0.1, q(the|the)=0.8, q(STOP|the)=0.2,
all other parameter values equal to 0.
Now say we'd like to define a bigram HMM model which defines the same distribution over sentences as the language model. By this we mean the following. The bigram HMM defines a distribution over sentences x1…xn paired
with tag sequences y1…yn+1 as
follows:
p′(x1…xn,y1…yn+1)=∏n+1i=1q′(yi|yi−1)∏ni=1e′(xi|yi)
(Note we use the notation p′, q′ and e′ to
distinguish this from the distribution p and
parameters q in
the language model.)
The bigram HMM defines the same distribution over sentences as the language model if for any sentence x1…xn,
p(x1…xn)=∑y1…yn+1p′(x1…xn,y1…yn+1)
where p and p′ are
the distributions under the language model and the bigram HMM respectively.
Our HMM will have a set of tags S={1,2},
and a vocabulary V={the,a}.
We define q′(1|∗)=0.6.
In this question you should choose the parameters of the HMM so that it gives the same distribution over sentences as the language model given above. What should be the values for q′(2|∗), q′(1|1), q′(2|1), q′(STOP|1), e′(the|1), e′(the|2)?
Write your answers in order in the box below, separated by spaces. For example, you could write
0.2 0.3 1 0 0.4 0.5
相关文章推荐
- SQL Server 作业批量停止
- SQL2008定时任务作业创建教程
- 作业
- OS的处理机调度层次-01
- Linux系统使用crontab命令实现计划任务
- 初来宝地
- CFG桩施工组织设计
- 求助啊有会做网页的进
- 苦恼~
- 哪位仁兄帮我编译以下程序阿?
- e路通电子传真(实现企业传真无纸化办公)beta版震撼发布
- [Oracle Client and Net Service]Oracle Client安装及Net Service的配置
- 利用OMS自带工具做Oracle增量备份与恢复
- 最美又最遥远的回忆
- 误打误撞进了这里
- 求高人帮忙呀
- 成功的部门经理一周速成(转载)
- 进程的问题
- 我要努力!!!
- 教出一个优秀的女儿