# NPTEL Reinforcement Learning Week 4 Assignment Answers 2024

## NPTEL Reinforcement Learning Week 4 Assignment Answers 2024

1. State True/False
The state transition graph for any MDP is a directed acyclic graph.

• True
• False
`Answer :- For Answer Click Here `

2. Consider the following statements:
(i) The optimal policy of an MDP is unique.
(ii) We can determine an optimal policy for a MDP using only the optimal value function(v*), without accessing the MDP parameters.
(iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(q*), without accessing the MDP parameters.
Which of these statements are true?

• Only (ii)
• Only (iii)
• Only (i), (ii)
• Only (i), (iii)
• Only (ii), (iii)
`Answer :- For Answer Click Here `

3. Which of the following is a benefit of using RL algorithms for solving MDPs?

• They do not require the state of the agent for solving a MDP.
• They do not require the action taken by the agent for solving a MDP.
• They do not require the state transition probability matrix for solving a MDP.
• They do not require the reward signal for solving a MDP.
`Answer :- For Answer Click Here `

4. Consider the following equations:
(i) vπ(s)=E[∑∞i=tγi−tRi+t|St=s]
(ii) qπ(s,a)=∑s′p(s′|s,a)vπ(s′)
(iii) vπ(s)=∑aπ(a|s)qπ(s,a)
Which of the above are correct?

• Only (i)
• Only (i), (ii)
• Only (ii), (iii)
• Only (i), (iii)
• (i), (ii), (iii)
`Answer :- `

5. State True/False
While solving MDPs, in case of discounted rewards, the value of γ(discount factor) cannot affect the optimal policy

• True
• False
`Answer :- `

6. Consider the following statements for a finite MDP (I is an identity matrix with dimensions |S| × |S|(S is the set of all states) and Pπ is a stochastic matrix):
(i) MDP with stochastic rewards may not have a deterministic optimal policy.
(ii) There can be multiple optimal stochastic policies.
(iii) If 0≤γ<1, then rank of the matrix I−γPπ is equal to |S|.
(iv) If 0≤γ<1, then rank of the matrix I−γPπ is less than |S|.
Which of the above statements are true?

• Only (ii), (iii)
• Only (ii), (iv)
• Only (i), (iii)
• Only (i), (ii), (iii)
`Answer :- For Answer Click Here `

7. Consider an MDP with 3 states A, B, C. From each state, we can go to either of the two states, i.e, from state A, we can perform 2 actions, that lead to state B and C respectively. The rewards for all the transitions are: r(A, B) = 2 (reward if we go from A to B), r(B, A) = 5, r(B, C) = 7, r(C, B) = 10, r(A, C) = 1, r(C, A) = 12. The discount factor is 0.7. Find the value function for the policy given by: π(A)=C
(if we are in state A, we choose the action to go to C),π(B)=A
and π(C)=B([vπ(A),vπ(B),vπ(C)])

• [10.2, 16.7, 20.2]
• [14.2, 16.5, 15.1]
• [15.9, 16.1, 21.3]
• [12.2, 6.2, 14.5]
`Answer :- `

8. Suppose x is a fixed point for the function A, y is a fixed point for the function B, and x = BA(x), where BA is the composition of B and A. Consider the following statements:
(i) x is a fixed point for B
(ii) x = y
(iii) BA(y) = y
Which of the above must be true?

• Only (i)
• Only (ii)
• Only (i), (ii)
• (i), (ii), (iii)
`Answer :- `

9. Which of the following is not a valid norm function? (x is a D dimensional vector)

• maxd∈{1,….,D}|xd|
• ∑Dd=1x2d−−−−−−−√
• mind∈{1,….,D}|xd|
• ∑Dd=1|xd|
`Answer :- `

10. Which of the following is a contraction mapping in any norm?

• T([x1,x2])=[0.5×1,0.5×2]
• T([x1,x2])=[2×1,2×2]
• T([x1,x2])=[2×1,3×2]
• T([x1,x2])=[x1+x2,x1−x2]
`Answer :- For Answer Click Here `