NPTEL Reinforcement Learning Week 3 Assignment Answers 2024

Join Our WhatsApp Group Join Now
Join Us On Telegram Join Now

NPTEL Reinforcement Learning Week 3 Assignment Answers 2024

1. Which of the following is true for an MDP?

  • Pr(st+1,rt+1|st,at)=Pr(st+1,rt+1)
  • Pr(st+1,rt+1|st,at,st−1,at−1,st−2,at−2,…,s0,a0)=Pr(st+1,rt+1|st,at)
  • Pr(st+1,rt+1|st,at)=Pr(st+1,rt+1|s0,a0)
  • Pr(st+1,rt+1|st,at)=Pr(st,rt|st−1,at−1)
Answer :- For Answer Click Here 

2. The baseline in the REINFORCE update should not depend on which of the following (without voiding any of the steps in the proof of REINFORCE)?

  • rn−1
  • rn
  • Action taken(an)
  • None of the above
Answer :- For Answer Click Here 

3. In many supervised machine learning algorithms, such as neural networks, we rely on the gradient descent technique. However, in the policy gradient approach to bandit problems, we made use of gradient ascent. This discrepancy can mainly be attributed to the differences in

  • the objectives of the learning tasks
  • the parameters of the functions whose gradient are being calculated
  • the nature of the feedback received by the algorithms
Answer :- 

4. In case of linear bandits, let’s consider we have 2 actions – a1 and a2. The policy π
to be followed when encountering a state s is given by

NPTEL Reinforcement Learning Week 3 Assignment Answers 2024
Answer :- 
NPTEL Reinforcement Learning Week 3 Assignment Answers 2024
Answer :- 
NPTEL Reinforcement Learning Week 3 Assignment Answers 2024
Answer :-  For Answer Click Here 

7. The actions in contextual bandits do not determine the next state, but typically do in full RL problems. True or false?

  • True
  • False
Answer :- 

8. In a continuous action space environment, we can employ any value function-based algorithm to discover an optimal policy.

  • True
  • False
Answer :- 
NPTEL Reinforcement Learning Week 3 Assignment Answers 2024
Answer :- 

10. In solving a multi-arm bandit problem using the policy gradient method, are we assured of converging to the optimal solution?

  • No
  • Yes
Answer :-  For Answer Click Here