IMT- CDL ASSIGNMENT-ITAE 002 TEST 2

0 Comments

IMT SOLVED ASSIGNMENTS AVAILABLE – ITAE 002 TEST 2

ASSIGNMENT

************FOR ANSWERS CONTACT US AT 9015596280, 9212129290*****************

Question 1 of 30:
Sample mean is equal to population mean,

a. Always
b. When sample represents population
c. Cannot Equal
d. None of these

Question 2of 30:
Thumb rule is to flag observations whose standardized residuals exceed___ in absolute value as being outliers.

a. 2
b. 3
c. 5
d. 1

Question 3 of 30:
Let’s assume, data mining is applied to measure performance of students in girls school. There are variables, like, gender, age, percentage, residential area code, etc. Which variable can be removed from the list without sacrificing the result?

a. Gender
b. Age
c. Percentage
d. None of these

Question 4 of 30:
Which of the following is useful to find relationship among different data attributes when priori information is not available
?
a. Hypothesis testing
b. Exploratory data analysis
c. Model evaluation
d. None of these

Question 5 of 30:
95% Confidence interval about the mean number of customer service calls for all customers indicate
s:
a. We are 95% confident that the population mean number of customer service calls for all customers falls between some range
b. We are 95% confident that the sample mean number of customer service calls for all customers falls between some range.
c. We are 5% confident that the population mean number of customer service calls for all falls between some range
d. None of these

Question 6 of 30:
In the regression model, changing the ordering of the variables into the model changes nothing except the

a. Sequential sum of squares
b. Sum of squares
c. Difference of squares
d. None of these

Question 7 of 30:
For multinomial variable, generally the test is used for

a. Difference in means
b. Difference in proportions
c. Homogeneity of proportions
d. None of these

Question 8 of 30:
In the real-world applications, in general, data mining methods are wide spread applicability and

a. Each method is distinct and the are no correlation between methods
b. There are multiple methods to be used for a real-world task
c. There are multiple methods to be used for a real-world task
d. All of these

Question 9 of 30:
Least squares regression works by choosing the unique regression line that____

a. Minimizes the sum of squared residuals over all the data points
b. Maximizes the sum of squared residuals over all the data points
c. Minimizes the sum of squared residuals over some data points
d. None of these

Question 10 of 30:
For most of the real-world data, skewness is

a. Positive
b. Negative
c. Zero
d. None of these

Question 11 of 30:
In general,_ null hypothesis if p-value is less than level of significance a (a small preset value, say 0.05).

a. Accept
b. Reject
c. More rest required
d. None of these

Question 12 of 30:
The proportion of false positives and the proportion of false negatives, which are additive inverses of the proportion of_____ and the proportion of____ respectivel
y.
a. True positives, true negatives
b. True negatives, true positives
c. False positives, false negatives
d. None of these

Question 13 of 30:
A multiple regression model uses a______ surface, such as a______to approximate the relationship between a continuous response (target)variable and a set of predictor variables.

a. Linear, plane or hyperplane
b. Non-linear, parabola or hyperbola
c. Both (a) and (b)
d. None of these

Question 14 of 30:
In ANOVA, the F-distribution statistics F-data is calculated as the ratio of

a. MSTR/MSTE
b. OMSE/MSTR
c. MSTR/MSE
d. OMSTE/MSE

Question 15 of 30:
In which phase, performance of selected models is tested?

a. Deployment
b. Evaluation
c. Modelling
d. Data preparation

Question 16 of 30:
In ANOVA for continuous variable, as extension of two sample t-tests, if we have three-fold partition of data set, then it analyzes that the_____ value of the continuous variable is the same across the subsets of data
.
a. mean
b. variance
c. error
d. None of these

Question 17 of 30:
Predictive analytics is the process of

a. Just cleaning data
b. Just compressing data
c. Guessing about present output without any data
d. Information retrieval to make useful predictions about future outcomes

Question 18 of 30:
Generally in ANOVA, the small p-value and large F-data value leads to__

a. Reject null hypothesis
b. Accept null hypothesis
c. Reject alternate hypothesis
d. Accept alternate hypothesis

Question 19 of 30:
___sample size is the only way to decrease the margin of error while maintaining a constant level of confidence
.
a. Increasing
b. Decreasing
c. Keeping constant
d. None of these

Question 20 of 30:
Which of the following methods is least sensitive to the presence of outliers?

a. Standard deviation
b. Mean absolute deviation
c. Z-score
d. Inter quartile range

Question 21 of 30:
Generally, the low-complexity model has a___ bias, it has a__ variance

a. High, low
b. Low, high
c. Low, low
d. High, high

Question 22 of 30:
Sensitivity measures the ability of the model to classify a record___ while specificity measures the ability to classify a record___

a. Negatively, positively
b. Positively, negatively
c. Positively, positively
d. Negatively, negatively

Question 23 of 30:
Most data mining algorithms search for patterns and structures among all the variables with respect to

a. Error
b. Model
c. Target
d. None of these

Question 24 of 30:
Which of the following is used as standard error of estimate for linear regression models?

a. RMSE (root mean square error)
b. MSE (mean square error)
c. MAE (mean absolute error)
d. None of these

Question 25 of 30:
According to the minimum descriptive length principle, it quantifies that the best representation (or description) of a model or body of data is the one that the information required (in bits) to encode (i) the model and (ii) the exceptions to the model.

a. Minimize
b. Maximize
c. Neither minimize nor maximize
d. None of these

Question 26 of 30:
_______will treat all errors equally, whether outliers or not, and thereby avoid the problem of undue influence of outliers.

a. MAE (mean absolute error)
b. MSE (mean square error)
c. RMSE
d. None of these

Question 27 of 30:
As general rule of thumb, the number of eigen values and hence corresponding eigen vector to be in PCA is related to value of eigen value, for which value threshold must be taken as,

a. Eigen values less than zero
b. Eigen values equal to zero
c. Eigen values less than one
d. Eigen values equal or greater than one

Question 28 of 30:
For flag variable, generally two sample Z-tests are used for

a. Difference in means
b. Difference in proportions
c. Homogeneity of proportions
d. None of these

Question 29 of 30:
For continuous variable, generally two sample t-tests are used for

a. Difference in means
b. Difference in proportions
c. Homogeneity of proportions
d. None of these

Question 30 of 30:
It is to be stressed that model evaluation techniques should be performed on the data set, rather than on the training set, or on the data set as a whole.

a. Test
b. Verification
c. Training
d. None of these