## NPTEL Business Intelligence & Analytics Week 5 Assignment Answers 2024

1. In regression analysis, multicollinearity refers to:

- The perfect linear relationship between the dependent and independent variables.
- The presence of outliers in the dataset that affect the regression coefficients.
- High intercorrelation among the independent variables, leading to unstable estimates of the regression coefficients.
- The variance in the residuals of the regression model.

Answer :-For AnswerClick Here

2. What type of data transformation technique scales data to a specific range, such as 0 to 1?

- Database normalization
- Aggregation
- Smoothing techniques
- Standardization/Normalization

Answer :-For AnswerClick Here

3. Which of the following statements about the coefficient of determination (R-squared) is true?

- A higher R-squared value always indicates a lower model performance.
- A higher R-squared value always indicates better model performance, regardless of the number of predictor variables.
- R-squared ranges from 0 to 1 and represents the percentage of variation in the dependent variable explained by the independent variables.
- R-squared can only take positive values and is unaffected by the presence of multicollinearity in the regression model.

Answer :-For AnswerClick Here

4. What does Ordinary Least Squares (OLS) aim to minimize in the context of linear regression?

- The sum of squared errors between the predicted and observed values of the dependent variable.
- The sum of squared residuals between the predicted and observed values of the independent variable.
- The total variance of the independent variables
- The sum of squared errors between the predicted and observed values of the independent variable.

Answer :-

5. The coefficient of determination (R-squared) value of 0.98 in a regression model implies:

- The model has a high level of multicollinearity
- 98% of the variability in the dependent variable is explained by the independent variable.
- The regression model is overfitting the data by 98 %
- The residuals in the model are normally distributed with z value of 0.98

Answer :-

6. Prediction error in a model refers to:

- The difference between actual and predicted values.
- The degree of overfitting in the model.
- The number of features used in the model.
- The variability of the target variable.

Answer :-For AnswerClick Here

7. Which of the following statements is wrong with regards to Overfitting in a machine learning model?

- The model is too simple to capture the underlying patterns in the data
- The model performs well on training data but poorly on unseen data
- The model fits the noise in the training data.
- None of the above

Answer :-

8. Underfitting in a machine learning model results in:

- Low bias and high variance.
- High bias and low variance.
- High bias and high variance
- Low bias and low variance.

Answer :-

9. When should one focus on reducing bias in a machine learning model?

- When the model performs well on the training data but poorly on test data
- When the model shows high variability in predictions.
- When the model consistently overfits the training data.
- When the model doesn’t fit the data well, and works poorly in explanatory/predictive performance

Answer :-

10. What is the bias-variance trade-off in machine learning?

- Balancing the computational resources used in training with model accuracy
- Aiming to minimize the difference between predicted and actual values in a model.
- Finding the equilibrium between model complexity and its ability to generalize to unseen data.
- Choosing the best algorithm that minimizes both bias and variance simultaneously.

Answer :-For AnswerClick Here

11. Training error refers to:

- Error calculated on the training dataset.
- Error due to overfitting.
- Error calculated on the testing dataset
- Error due to underfitting.

Answer :-

12. What does Leave-One-Out Cross-Validation (LOOCV) do?

- It iteratively uses all but one sample as the test set and the remaining sample as the training set.
- It divides the dataset into k subsets and uses each subset as the testing set in turn.
- It creates a validation set from a small portion of the data.
- It iteratively uses all but one sample as the training set and the remaining sample as the testing set.

Answer :-

13. What is the primary purpose of cross-validation in machine learning?

- To fit the model to the training data efficiently.
- To evaluate the model’s performance on unseen data
- To increase model complexity for better predictions.
- To reduce the number of features in the dataset.

Answer :-

14. What are the three sources of error in predicted Y in machine learning?

- Measurement error, data preprocessing error, and feature selection error
- Model complexity error, parameter tuning error, and overfitting error.
- Reducible error due to inaccurate estimation of f, irreducible error due to randomness, and test data variation.
- Training error, validation error, and testing error.

Answer :-For AnswerClick Here

15. Which of the following statements most accurately distinguishes supervised learning from unsupervised learning in machine learning?

- Supervised learning requires labelled data for training models to predict specific outcomes, while unsupervised learning uncovers patterns or structures in data without predefined outcomes.
- Supervised learning primarily deals with clustering data points based on similarities, while unsupervised learning focuses on predicting future trends based on historical data.
- Supervised learning utilizes human supervision to label data for analysis, while unsupervised learning relies on algorithms to classify data into distinct categories.
- Supervised learning involves training models without any prior knowledge of the dataset, while unsupervised learning requires prior information about the characteristics of the data.

Answer :-For AnswerClick Here