## NPTEL Reinforcement Learning Week 4 Assignment Answers 2024

1. State True/False

The state transition graph for any MDP is a directed acyclic graph.

- True
- False

Answer :-For AnswerClick Here

2. Consider the following statements:

(i) The optimal policy of an MDP is unique.

(ii) We can determine an optimal policy for a MDP using only the optimal value function(v*), without accessing the MDP parameters.

(iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(q*), without accessing the MDP parameters.

Which of these statements are true?

- Only (ii)
- Only (iii)
- Only (i), (ii)
- Only (i), (iii)
- Only (ii), (iii)

Answer :-For AnswerClick Here

3. Which of the following is a benefit of using RL algorithms for solving MDPs?

- They do not require the state of the agent for solving a MDP.
- They do not require the action taken by the agent for solving a MDP.
- They do not require the state transition probability matrix for solving a MDP.
- They do not require the reward signal for solving a MDP.

Answer :-For AnswerClick Here

4. Consider the following equations:

(i) vπ(s)=E[∑∞i=tγi−tRi+t|St=s]

(ii) qπ(s,a)=∑s′p(s′|s,a)vπ(s′)

(iii) vπ(s)=∑aπ(a|s)qπ(s,a)

Which of the above are correct?

- Only (i)
- Only (i), (ii)
- Only (ii), (iii)
- Only (i), (iii)
- (i), (ii), (iii)

Answer :-

5. State True/False

While solving MDPs, in case of discounted rewards, the value of γ(discount factor) cannot affect the optimal policy

- True
- False

Answer :-

6. Consider the following statements for a finite MDP (I is an identity matrix with dimensions |S| × |S|(S is the set of all states) and Pπ is a stochastic matrix):

(i) MDP with stochastic rewards may not have a deterministic optimal policy.

(ii) There can be multiple optimal stochastic policies.

(iii) If 0≤γ<1, then rank of the matrix I−γPπ is equal to |S|.

(iv) If 0≤γ<1, then rank of the matrix I−γPπ is less than |S|.

Which of the above statements are true?

- Only (ii), (iii)
- Only (ii), (iv)
- Only (i), (iii)
- Only (i), (ii), (iii)

Answer :-For AnswerClick Here

7. Consider an MDP with 3 states A, B, C. From each state, we can go to either of the two states, i.e, from state A, we can perform 2 actions, that lead to state B and C respectively. The rewards for all the transitions are: r(A, B) = 2 (reward if we go from A to B), r(B, A) = 5, r(B, C) = 7, r(C, B) = 10, r(A, C) = 1, r(C, A) = 12. The discount factor is 0.7. Find the value function for the policy given by: π(A)=C

(if we are in state A, we choose the action to go to C),π(B)=A

and π(C)=B([vπ(A),vπ(B),vπ(C)])

- [10.2, 16.7, 20.2]
- [14.2, 16.5, 15.1]
- [15.9, 16.1, 21.3]
- [12.2, 6.2, 14.5]

Answer :-

8. Suppose x is a fixed point for the function A, y is a fixed point for the function B, and x = BA(x), where BA is the composition of B and A. Consider the following statements:

(i) x is a fixed point for B

(ii) x = y

(iii) BA(y) = y

Which of the above must be true?

- Only (i)
- Only (ii)
- Only (i), (ii)
- (i), (ii), (iii)

Answer :-

9. Which of the following is not a valid norm function? (x is a D dimensional vector)

- maxd∈{1,….,D}|xd|
- ∑Dd=1x2d−−−−−−−√
- mind∈{1,….,D}|xd|
- ∑Dd=1|xd|

Answer :-

10. Which of the following is a contraction mapping in any norm?

- T([x1,x2])=[0.5×1,0.5×2]
- T([x1,x2])=[2×1,2×2]
- T([x1,x2])=[2×1,3×2]
- T([x1,x2])=[x1+x2,x1−x2]

Answer :-For AnswerClick Here