## NPTEL Natural Language Processing Week 2 Assignment Answers 2024

1. According to Zipf’s law which statements) is/are correct?

(i) A small number of words occur with high frequency.

ii) A large number of words occur with low frequency.

a. Both (i) and ii) are correct

b. Only (ii) is correct

c. Only (i) is correct

d. Neither (i) nor (ii) is correct

2. Consider the following corpus C1 of 4 sentences. What is the total count of unique bi-grams for which the likelihood will be estimated? Assume we do not perform any pre-processing.

today is Sneha’s birthday

she likes ice cream

she is also fond of cream cake

we will celebrate her birthday with ice cream cake

a. 24

b. 28

с. 27

d. 23

3. A 3-gram model is a___________ order Markov Model.

a. Two

b. Five

c. Four

d. Three

4. Which of these is/are – valid Markov assumption?

a. The probability of a word depends only on the current word.

b. The probability of a word depends only on the previous word.

c. The probability of a word depends only on the next word.

d. The probability of a word depends only on the current and the previous word.

5. For the string ‘mash’, identify which of the following set of strings have a Levenshtein distance

of 1.

a. smash, mas, lash, mushy, hash

b. bash, stash, lush, flash, dash

c. smash, mas, lash, mush, ash

d. None of the above

6.

7.

8. Calculate **P(they play in a big garden**) assuming a bi-gram language model.

a. 1/8

b. 1/12

c. 1/24

d. None of the above

9. Considering the same model as in Question 7, calculate the perplexity of **<s> they play in a big garden <\s>.**

a. 2.289

b. 1.426

c. 1.574

d. 2.178

10. Assume that you are using a bi-gram language model with add one smoothing. Calculate **P(they play in a beautiful garden).**

a. 4.472 × 10^-6

b. 2.236 × 101-6

с. 3.135 × 101-6

d. None of the above

