DATA SCIENCE
- The variable whose measurement is done in terms such as height and weight are classified as .
a)continuous variable
b)measuring variable
c)discrete variable
d)flowchart variable - Qualitative data is also known as .
a) Numerical data
b) Categorical data
c)Discrete data
d) Continuous data - Which of the following type of data do have a natural order?
a)Ordinal data
b)Nominal data
c)Binary data
d)Continuous data - Discrete data is based on count and it can take a number of
values. a)Infinite b)simple c)complex d)finite - also known as primary data which is data collected from a
source.
a)Ordinal data b)Ordinary data c)Existing data d)Raw data - What is secondary data?
a) Ordinal data b)Unimportant data c)Existing data d)Ordinary data - Age group – Young, Adult, Senior Citizen is an example of .
a)Nominal data b)Discrete data c)Continuous data d)Ordinal
data - An example of discrete data is .
a) the number of children b)height of children c)weight of children d)behaviour of children - XQuery is a functional query language used to retrieve information stored in
format. a)HTML b)XML c)UML
d)Jscript - XPath specification has types of
nodes.
a)Four b)Five c)Six d)Seven - State True or False.
(i)Data Visualization helps users in analyzing a small amount of data in a simpler way.
(ii) Data Visualization makes complex data more accessible, understandable,
and usable.
a)true, false b)false, true c)true, true d)false, false - Data visualization is also an element of the broader .
a)deliver presentation architecture
b)data presentation architecture
c)dataset presentation architecture
d) data process architecture - Which one of the following is most basic and commonly used techniques for
EDA?
a)Line charts b)Scatter plots c)Population pyramids d)Area charts - Which of the following is not a part of data science process?
a) Discovery b) Model Planning c)Communication Building d)Operationalize - Which of the following is not an application of data science?
a) Recommendation Systems b) Image & Speech Recognition
c) Online Price Comparison d) Privacy Checker - Amazon Web Services fall into which of the following cloud-computing category?
a) Platform as a Service
b) Software as a Service
c)Infrastructure as a Service
d) Back-end as a Service - Which of the following is the most important language for Data
Science?
a)Java b)Ruby c)R d)HTML - In XQeury symbol preceded before the variable
name.
a)@ b)$ c)# d)* - MongoDB support cross platform and is written in
language.
a)C++ b)Java c)R d)PHP - MongoDB is Database.
a)SQL b)NoSQL c)RDBMS d)Firebas - Ridge Regression is used when data suffers from .
a)Collinearity b)Multicollinearity c)Regression
d)Classification - Joins are used for combining product.
a)Vector b)Euler c)Scalar d)Cartesian - __ is the process of assigning storage, usually in the form of server disk drive space, in
order to optimize the performance of a storage area network.
a) Storage Provisioning b)Data mining c)Storage assignment d)Data Warehousing - Clustering comes under learning.
a) Supervised b)Unsupervised c)Reinforcement d)Classification - In , the distance between two clusters is defined as the shortest distance between two
points in each cluster.
a) Single Linkage b)Complete Linkage c)Average Linkage d)Multiple Linkage - In , the distance between two clusters is defined as the longest distance between two
points in each cluster.
a) Single Linkage b)Complete Linkage c)Average Linkage d)Multiple Linkage - is the variability of model prediction for a given data point or a value which tells us spread of our
data.
a) Variance b) Bias c) Underfitting d) Bug - A rise in prices before a festival is an example of .
a) Cyclical variation b)Trend variation c)Irregular variation d)Seasonal variation - Seasonal variations are .
a) Long term variation b)Short term variation c)Sudden variation d)Instant variation - Time series data consists of
components.
a)three b)six c)five d)four - The best-fitted trend line is one for which sum of squares of residual or error is .
a)maximum b)minimum c)negative d)1 - data is used to build a model.
a) Training b)Testing c)Validation d)Primary - Which of the following is also called as exploratory learning?
a) Supervised learning
b)Active learning
c)Unsupervised learning
d)Reinforcement learning - Which of the following statement is true about prediction problem?
a)The output attribute must be categorical.
b)The model is designed to determine future outcomes.
c)The output attribute must be numeric.
d) The model is designed to classify current behaviour. - Decision Nodes are represented by .
a) Disks b)Squares c)Circles d)Triangles - LASSO stands for .
a) Least Absolute Shrinkage and Selection operator.
b) Low Attribute Shrinkage and Selection operator.
c)Least Attribute Shrinkage and Selection operator.
d) Low Absolute Shrinkage and Selection operator. - Another name for an input variable is .
a) random variable b)Independent variable c)estimated variable d)dependent variable - Data collected by someone else for some other purpose but being utilized by the investigator for
another purpose is called as .
a) Primary data b)Secondary data c)Raw data d)First hand data - There are type of methods in Data
Collection.
a)two b)four c)five d)six - is a graphical representation method used to depict groups of numerical data through their
quartiles.
a) Histogram b)Box plot c)Scatter plot d)Line - Agglomerative and Divisive are types of algorithm.
a) Hierarchical Clustering b)Binary Classification c)Regression d)Multi-classification - AIC is measured by an equation .
a)AIC = -2k+2 b)AIC = 2LL+2k c)AIC = -2LL+2k d)AIC = 2k+2 - is caused by a hypothesis function that fits the available data but does not generalize well to
predict new data.
a) Underfitting b)Overfitting c)Low variance d)Low bias - is a means of managing data that makes it more useful for users engaging in data discovery and
analysis.
a) Data curation b)Data processing c)Data Munging d)Data mining - The spreadsheet is an example of data.
a) structured data b)unstructured data c)semi structured data d)half structured - CouchDB is an example of database.
a)NoSQL b)RDBMS c)SQL
d)DBMS - SVM stands for .
a)Standalone Validate Machine.
b)Standalone Vector Machine.
c)Support Validate Machine.
d) Support Vector Machine. - is the process of dimensionality reduction by which a set of data is reduced to more
manageable groups for processing.
a) Regression b)Feature Extraction c)Aggregating d)Feature Elimination - PCA is used for .
a) dimensionality reduction
b) feature extraction
c) data augmentation
d) variance normalization - State True or False.
(i)KNN can be used in both classification and regression.
(ii) KNN can be used in Reinforcement
learning.
a)True, False
b)True, True
c)False, False
d)False, True