## NPTEL Reinforcement Learning Week 2 Assignment Answers 2024

1. Which of the following statements is NOT true about Thompson Sampling or Posterior Sampling?

- After each sample is drawn, the
**q**∗ distribution for that sampled arm is updated to be closer to the true distribution. - Thompson sampling has been shown to generally give better regret bounds than UCB.
- In Thompson sampling, we do not need to eliminate arms each round to get good sample complexity.
- The algorithm requires that we use Gaussian priors to represent distributions over q∗ values for each arm.

Answer :-For AnswerClick Here

Answer :-For AnswerClick Here

Answer :-For AnswerClick Here

4. Which of the following is true about the Median Elimination algorithm?

- It is a regret minimizing algorithm.
- The probability of the ϵl-optimal arms of round l being eliminated is less than δl for the round.
- It is guaranteed to provide an ϵ-optimal arm at the end.
- Replacing ϵ with ϵ2 doubles the sample complexity.

Answer :-

5. We need 8 rounds of median-elimination to get an (ϵ,δ) − PAC arm. Approximately how many samples would have been required using the naive (ϵ,δ) − PAC algorithm given (ϵ,δ) = (1/2, 1/e) ? (Choose the value closest to the correct answer)

- 15000
- 10000
- 500
- 20000

Answer :-

Answer :-For AnswerClick Here

Answer :-

Answer :-

9. Suppose we are facing a non-stationary bandit problem. We want to use posterior sampling for picking the correct arm. What is the likely change that needs to be done to the algorithm so that it can adapt to non-stationarity?

- Update the posterior rarely.
- Randomly shift the posterior drastically from time to time.
- Keep adding a slight noise to the posterior to prevent its variance from going down quickly.
- No change is required.

Answer :-

Answer :-For AnswerClick Here