NPTEL Business Intelligence & Analytics Week 5 Assignment Answers 2024

Join Our WhatsApp Group Join Now
Join Us On Telegram Join Now

NPTEL Business Intelligence & Analytics Week 5 Assignment Answers 2024

1. In regression analysis, multicollinearity refers to:

  • The perfect linear relationship between the dependent and independent variables.
  • The presence of outliers in the dataset that affect the regression coefficients.
  • High intercorrelation among the independent variables, leading to unstable estimates of the regression coefficients.
  • The variance in the residuals of the regression model.
Answer :- For Answer Click Here 

2. What type of data transformation technique scales data to a specific range, such as 0 to 1?

  • Database normalization
  • Aggregation
  • Smoothing techniques
  • Standardization/Normalization
Answer :- For Answer Click Here 

3. Which of the following statements about the coefficient of determination (R-squared) is true?

  • A higher R-squared value always indicates a lower model performance.
  • A higher R-squared value always indicates better model performance, regardless of the number of predictor variables.
  • R-squared ranges from 0 to 1 and represents the percentage of variation in the dependent variable explained by the independent variables.
  • R-squared can only take positive values and is unaffected by the presence of multicollinearity in the regression model.
Answer :- For Answer Click Here 

4. What does Ordinary Least Squares (OLS) aim to minimize in the context of linear regression?

  • The sum of squared errors between the predicted and observed values of the dependent variable.
  • The sum of squared residuals between the predicted and observed values of the independent variable.
  • The total variance of the independent variables
  • The sum of squared errors between the predicted and observed values of the independent variable.
Answer :- 

5. The coefficient of determination (R-squared) value of 0.98 in a regression model implies:

  • The model has a high level of multicollinearity
  • 98% of the variability in the dependent variable is explained by the independent variable.
  • The regression model is overfitting the data by 98 %
  • The residuals in the model are normally distributed with z value of 0.98
Answer :- 

6. Prediction error in a model refers to:

  • The difference between actual and predicted values.
  • The degree of overfitting in the model.
  • The number of features used in the model.
  • The variability of the target variable.
Answer :-  For Answer Click Here 

7. Which of the following statements is wrong with regards to Overfitting in a machine learning model?

  • The model is too simple to capture the underlying patterns in the data
  • The model performs well on training data but poorly on unseen data
  • The model fits the noise in the training data.
  • None of the above
Answer :- 

8. Underfitting in a machine learning model results in:

  • Low bias and high variance.
  • High bias and low variance.
  • High bias and high variance
  • Low bias and low variance.
Answer :- 

9. When should one focus on reducing bias in a machine learning model?

  • When the model performs well on the training data but poorly on test data
  • When the model shows high variability in predictions.
  • When the model consistently overfits the training data.
  • When the model doesn’t fit the data well, and works poorly in explanatory/predictive performance
Answer :- 

10. What is the bias-variance trade-off in machine learning?

  • Balancing the computational resources used in training with model accuracy
  • Aiming to minimize the difference between predicted and actual values in a model.
  • Finding the equilibrium between model complexity and its ability to generalize to unseen data.
  • Choosing the best algorithm that minimizes both bias and variance simultaneously.
Answer :-  For Answer Click Here 

11. Training error refers to:

  • Error calculated on the training dataset.
  • Error due to overfitting.
  • Error calculated on the testing dataset
  • Error due to underfitting.
Answer :- 

12. What does Leave-One-Out Cross-Validation (LOOCV) do?

  • It iteratively uses all but one sample as the test set and the remaining sample as the training set.
  • It divides the dataset into k subsets and uses each subset as the testing set in turn.
  • It creates a validation set from a small portion of the data.
  • It iteratively uses all but one sample as the training set and the remaining sample as the testing set.
Answer :- 

13. What is the primary purpose of cross-validation in machine learning?

  • To fit the model to the training data efficiently.
  • To evaluate the model’s performance on unseen data
  • To increase model complexity for better predictions.
  • To reduce the number of features in the dataset.
Answer :- 

14. What are the three sources of error in predicted Y in machine learning?

  • Measurement error, data preprocessing error, and feature selection error
  • Model complexity error, parameter tuning error, and overfitting error.
  • Reducible error due to inaccurate estimation of f, irreducible error due to randomness, and test data variation.
  • Training error, validation error, and testing error.
Answer :-  For Answer Click Here 

15. Which of the following statements most accurately distinguishes supervised learning from unsupervised learning in machine learning?

  • Supervised learning requires labelled data for training models to predict specific outcomes, while unsupervised learning uncovers patterns or structures in data without predefined outcomes.
  • Supervised learning primarily deals with clustering data points based on similarities, while unsupervised learning focuses on predicting future trends based on historical data.
  • Supervised learning utilizes human supervision to label data for analysis, while unsupervised learning relies on algorithms to classify data into distinct categories.
  • Supervised learning involves training models without any prior knowledge of the dataset, while unsupervised learning requires prior information about the characteristics of the data.
Answer :-  For Answer Click Here