您的位置:首页 > 产品设计 > UI/UE

哥伦比亚大学Coursera课程Natural Language Processing:Quiz 1: covers material from weeks 1 and 2

2016-02-18 09:46 453 查看


Quiz 1: covers material from weeks 1 and 2Help
Center

Warning: The hard deadline has passed. You can attempt it, but you will not get credit for it. You are welcome to try it as a learning exercise.

This is an open note quiz: you can use the slides from the class, and the notes at http://www.cs.columbia.edu/~mcollins/notes-spring2013.html as a resource.

In
accordance with the Coursera Honor Code, I certify that the answers here are my own work.


Question 1

Say we'd like to derive the Viterbi algorithm for a bigram HMM tagger. The model takes the formp(x1…xn,y1…yn+1)=∏n+1i=1q(yi|yi−1)∏ni=1e(xi|yi).
Which of the following statements is true?

We
can use a dynamic programming algorithm with entries π(k,u),
and definitions π(0,∗)=1 and π(k,v)=maxu∈Sk−1(π(k−1,u)×q(v|u)×e(xk|v))
We
can use a dynamic programming algorithm with entries π(k,u),
and definitions π(0,∗)=1 and π(k,v)=maxu∈Sk−1(π(k−2,u)×q(v|u)×e(xk|v))
We
can implement the Viterbi algorithm in exactly the same way as before, but with the following modification to the recursive definition: π(k,u,v)=maxw∈Sk−2(π(k−1,w,u)×q(v|u)×e(xk|v))


Question 2

Say we define a backed-off model qBO(wi|wi−1) exactly
as we defined it in lecture, and we define the discounted counts as Count∗(wi−1,wi)=Count(wi−1,wi)−1.5.
Which of the following statements is true?

There
may be some words u such
that ∑v∈V∪{STOP}qBO(v|u)≠1.
There
may be some bigrams u,v such
that qBO(v|u)<0


Question 3

Consider the following two bigram language models (recall that a bigram language model defines p(x1…xn)=∏ni=1q(xi|xi−1)).

Language Model 1

V={the,dog}

q(the|∗)=q(dog|the)=q(STOP|dog)=1

All other q parameters
are equal to 0.

Language Model 2

V={the,a,dog}

q(the|∗)=q(a|∗)=0.5

q(dog|a)=q(dog|the)=q(STOP|dog)=1

All other q parameters
are equal to 0.

Now assume that we have a test sentence consisting of a single sentence,

the dog STOP

Which language model gives lower perplexity on this test corpus?

Language
Model 1
Language
Model 2


Question 4

We are now going to derive a version of the Viterbi algorithm that takes as input an integer n,
and finds

maxy1…yn+1,x1…xnp(x1…xn,y1…yn+1)

for a trigram tagger, as defined in lecture. Hence the input to the algorithm is an integer n,
and the output from the algorithm is the highest scoring pair of sequences x1…xn, y1…yn+1 under
the model.

Which of the following recursive definitions gives a correct algorithm for this problem?

π(0,∗,∗)=1,
and π(k,u,v)=maxw∈Sk−2(π(k−1,w,u)×q(v|w,u))
π(0,∗,∗)=1,
and π(k,u,v)=maxw∈Sk−2(π(k−1,w,u)×q(v|w,u)×m(v)),
where m(v)=maxx∈Ve(x|v)
None
of the above.


Question 5

We'd like to define a language model with V={the,a,dog},
and

p(x1…xn)=γ×0.5n

for any x1…xn,
such that xi∈V for i=1…(n−1),
and xn=STOP,
where γ is
some expression.

What should our definition of γ be?

(Hint: recall that ∑∞n=10.5n=1)

γ=13n
γ=13n−1
γ=3n
γ=3n−1
γ=1


Question 6

Say we train a trigram HMM tagger on a training set with the following two sentences:

the dog saw the cat, D N V D N
the cat saw the saw, D N V D N

Assume that we estimate the parameters of the HMM with maximum-likelihood estimation (no smoothing).

Now assume that we have the sentence

x1…xn= the
cat saw the saw

what is the value for

maxy1…yn+1p(x1…xn,y1…yn+1) in
this case? (Please give your answer up to 3 decimal places.)

Answer for Question 6


Question 7

Assume we have a bigram language model with

V={the,a}

q(a|∗)=0.6, q(the|∗)=0.4, q(a|a)=0.9, q(STOP|a)=0.1, q(the|the)=0.8, q(STOP|the)=0.2,
all other parameter values equal to 0.

Now say we'd like to define a bigram HMM model which defines the same distribution over sentences as the language model. By this we mean the following. The bigram HMM defines a distribution over sentences x1…xn paired
with tag sequences y1…yn+1 as
follows:

p′(x1…xn,y1…yn+1)=∏n+1i=1q′(yi|yi−1)∏ni=1e′(xi|yi)

(Note we use the notation p′, q′ and e′ to
distinguish this from the distribution p and
parameters q in
the language model.)

The bigram HMM defines the same distribution over sentences as the language model if for any sentence x1…xn,

p(x1…xn)=∑y1…yn+1p′(x1…xn,y1…yn+1)

where p and p′ are
the distributions under the language model and the bigram HMM respectively.

Our HMM will have a set of tags S={1,2},
and a vocabulary V={the,a}.
We define q′(1|∗)=0.6.

In this question you should choose the parameters of the HMM so that it gives the same distribution over sentences as the language model given above. What should be the values for q′(2|∗), q′(1|1), q′(2|1), q′(STOP|1), e′(the|1), e′(the|2)?

Write your answers in order in the box below, separated by spaces. For example, you could write

0.2 0.3 1 0 0.4 0.5
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息