IMT- CDL ASSIGNMENT-ITAE 002 TEST1

0 Comments

IMT SOLVED ASSIGNMENTS AVAILABLE – ITAE 002 TEST1

ASSIGNMENT

************FOR ANSWERS CONTACT US AT 9015596280, 9212129290*****************

Question 1 of 30:
Generally, the low-complexity model has a_____ bias, it has a____ variance

a. High, low
b. Low, high
c. Low, low
d. High, high

Question 2 of 30:
A multiple regression model uses a______ surface, such as a______ to approximate the relationship between a continuous response (target) variable and a set of predictor variables.

a. Linear, plane or hyperplane
b. Non-linear, parabola or hyperbola
c. Both (a) and (b)
d. None of these

Question 3 of 30:
Which of the following is useful to find relationship among different data attributes when prior information is not available?

a. Hypothesis testing
b. Exploratory data analysis
c. Model evaluation
d. None of these

Question 4 of 30:
Which of the following methods is least sensitive to the presence of outliers?

a. Standard deviation
b. Mean absolute deviation
c. Z-score
d. Inter quartile range

Question 5 of 30:
For data mining in general, data analyst has____

a. a priori hypothesis in mind for which needs to be validated
b. There is no priori hypothesis but task is to find actionable inference from data
c. There is model of data exists, however it needs to get output from model with new data set
d. None of these

Question 6 of 30:
The proportion of false positives and the proportion of false negatives, which are additive
inverses of the proportion of_____ and the proportion of____ respectively.

a. True positives, true negatives
b. True negatives, true positives
c. False positives, false negatives
d. None of these

Question 7 of 30:
In ANOVA for continuous variable, as extension of two sample t-tests, if we have three-fold partition of data set, then it analyzes that the_____ continuous variable is the same across the subsets of data.

a. mean
b. variance
c. error
d. None of these

Question 8 of 30:
Least squares regression works by choosing the unique regression line that____

a. Minimizes the sum of squared residuals over all the data points
b. Maximizes the sum of squared residuals over all the data points
c. Minimizes the sum of squared residuals over some data points
d. None of these

Question 9 of 30:
In ANOVA, the F-distribution statistics F-data is calculated as the ratio of

a. MSTR/MSTE
b. OMSE/MSTR
c. MSTR/MSE
d. OMSTE/MSE

Question 10 of 30:
In general,_ null hypothesis if p-value is less than level of significance a (a small preset value, say 0.05).

a. Accept
b. Reject
c. More rest required
d. None of these

Question 11 of 30:
For most of the real-world data, skewness is

a. Positive
b. Negative
c. Zero
d. None of these

Question 12 of 30:
If the false positive cost increases, then specificity should

a. decrease
b. increase
c. not be changed
d. none of these

Question 13 of 30:
Thumb rule is to flag observations whose standardized residuals exceed___ in absolute value as being outliers.

a. 2
b. 3
c. 5
d. 1

Question 14 of 30:
The factor solutions provided by factor analysis are not invariant to_____

a. Transformations
b. Rotation
c. Scaling
d. None of these

Question 15 of 30:
Hypothesis testing with too many variables may result into

a. Underfitting the data
b. Overfitting the data
c. Perfect fitting the data
d. None of these

Question 16 of 30:
In which phase of CRISP-DM, report is generated?

a. Data understanding phase
b. Modeling phase
c. Evaluation phase
d. Deployment phase

Question 17 of 30:
Which of the following methods is best for binning numerical variables?

a. Equal width binning
b. Equal frequency binning
c. Binning by clustering
d. None of these

Question 18 of 30:
Generally, by increasing complexity of model, it performs well on training set and may results in_____ on test data.

a. Overfitting
b. Underfitting
c. Perfectly well
d. None of these

Question 19 of 30:
Communality represents the proportion of

a. Mean
b. Mode
c. Variance
d. Median

Question 20 of 30:
Predictive analytics is the process of

a. Just cleaning data
b. Just compressing data
c. Guessing about present output without any data
d. Information retrieval to make useful predictions about future outcomes

Question 21 of 30:
Most data mining algorithms searched for patterns and structures among all the variables with respect to

a. Error
b. Model
c. Target
d. None of these

Question 22 of 30:
In ANOVA, the F-distribution statistics, to reject null hypothesis, the F-data will be_____ when between
sample variability is much_____ than within sample variability.
a. Large, greater
b. Small, greater
c. Small, lesser
d. Large, lesser

Question 23 of 30:
____is known as the standard error of the estimate.

a. RMSE (root mean square error)
b. MSE (mean square error)
c. MAE (mean absolute error)
d. None of these

Question 24 of 30:
___will treat all errors equally, whether outliers or not, and thereby avoid the problem of undue influence of outliers.

a. MAE (mean absolute error)
b. MSE (mean square error)
c. RMSE
d. None of these

Question 25 of 30:
In general, a user-defined composite is simply a_ measure.

a. Homogenous
b. Superposition
c. Linear
d. Non-linear

Question 26 of 30:
Extrapolation refers to estimates and predictions of the target variable made using the regression equation with values of the predictor variable outside of the range of the values of in the data set
.
a. X
b. y
c. Both (a) and (b)
d. None of these

Question 27 of 30:
If the false positive cost increases, then___ should decrease.

a. Sensitivity
b. Specificity
c. Accuracy
d. None of these

Question 28 of 30:
As general rule of thumb, the number of eigen values and hence corresponding eigen vector to be in PCA is related to value of eigen value, for which value threshold must be taken as,

a. Eigen values less than zero
b. Eigen values equal to zero
c. Eigen values less than one
d. Eigen values equal or greater than one

Question 29 of 30:
95% Confidence interval about the mean number of customer service calls for all customers indicat
es:
a. We are 95% confident that the population mean number of customer service calls for all customers falls between some range
b. We are 95% confident that the sample mean number of customer service calls for all customers falls between some range.
c. We are 5% confident that the population mean number of customer service calls for all falls between some range
d. None of these

Question 30 of 30:
For flag variable, generally two sample Z-tests are used for

a. Difference in means
b. Difference in proportions
c. Homogeneity of proportions
d. None of these