NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answer

## NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answer 2023

**1. Which step does Nesterov accelerated gradient descent perform before finding the update size?**

- Increase the momentum
- Estimate the next position of the parameters
- Adjust the learning rate
- Decrease the step size

Answer :- For AnswerClick Here

**2. Select the parameter of vanilla gradient descent controls the step size in the direction of the gradient.**

- Learning rate
- Momentum
- Gamma
- None of the above

Answer :-For AnswerClick Here

**3. What does the distance between two contour lines on a contour map represent?**

- The change in the output of function
- The direction of the function
- The rate of change of the function
- None of the above

Answer :-For AnswerClick Here

**4. Which of the following represents the contour plot of the function f(x,y) = x2−y?**

Answer :-For AnswerClick Here

**5. What is the main advantage of using Adagrad over other optimization algorithms?**

- It converges faster than other optimization algorithms.
- It is less sensitive to the choice of hyperparameters (learning rate).
- It is more memory-efficient than other optimization algorithms.
- It is less likely to get stuck in local optima than other optimization algorithms.

Answer :-For AnswerClick Here

**6. We are training a neural network using the vanilla gradient descent algorithm. We observe that the change in weights is small in successive iterations. What are the possible causes for the following phenomenon?**

- η is large
- ∇w is small
- ∇w is large
- η is small

Answer :-For AnswerClick Here

**7. You are given labeled data which we call X where rows are data points and columns feature. One column has most of its values as 0. What algorithm should we use here for faster convergence and achieve the optimal value of the loss function?**

- NAG
- Adam
- Stochastic gradient descent
- Momentum-based gradient descent

Answer :-For AnswerClick Here

**8. What is the update rule for the ADAM optimizer?**

- wt=wt−1−lr∗(mt/(vt−−√+ϵ))
- wt=wt−1−lr∗m
- wt=wt−1−lr∗(mt/(vt+ϵ))
- wt=wt−1−lr∗(vt/(mt+ϵ))

Answer :-For AnswerClick Here

**9. What is the advantage of using mini-batch gradient descent over batch gradient descent?**

- Mini-batch gradient descent is more computationally efficient than batch gradient descent.
- Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch gradient descent.
- Mini batch gradient descent gives us a better solution.
- Mini-batch gradient descent can converge faster than batch gradient descent.

Answer :-For AnswerClick Here

**10. Which of the following is a variant of gradient descent that uses an estimate of the next gradient to update the current position of the parameters?**

- Momentum optimization
- Stochastic gradient descent
- Nesterov accelerated gradient descent
- Adagrad

Answer :-For AnswerClick Here

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## Important Links

Follow us & Join Our Groups for Latest InformationUpdated by US | |

🔥Follow US On Google News | ✔Click Here |

🔥WhatsApp Group Join Now | ✔Click Here |

🔥Join US On Telegram | ✔Click Here |

🔥Website | ✔Click Here |