Data Mining is a process of extracting usable data from a more extensive set of raw data by using some methods along with machine learning, statistics, and database systems. It implies analyzing data patterns in large batches of data using one or more software.

Following is the list of most important Data Mining techniques:
Prediction: This technique specifies the relationship between independent and dependent instances. For example, while considering sales data, if we want to predict the future profit, the sale acts as a separate instance, whereas the payoff is the dependent instance. Accordingly, based on sales and profit’s historical data, the associated profit is the predicted value.
Decision trees: It specifies a tree structure where the decision tree’s root acts as a condition/question having multiple answers. Each answer sets to specific data that helps in determining the final decision based on the data.
Clustering analysis: This technique specifies that a cluster of objects having similar characteristics is formed automatically. The clustering method defines classes and then places suitable objects in each class.
Sequential Patterns: This technique is used to specify the pattern analysis used for discovering identical patterns in transaction data or regular events. For example, customers’ historical data helps a brand identify the patterns in the transactions that happened in the past year.
Classification Analysis: This is a Machine Learning based method in which each item in a particular set is classified into predefined groups. It uses advanced techniques like linear programming, neural networks, decision trees, etc.
Association rule learning: This technique is used to create a pattern based on the items’ relationship in a single transaction.
Data Mining MCQ Questions & Answers 2021
Which of the following association measure helps in identifying how frequently the item appears in a dataset?
Selec the best answer from given options :
a) Confidence
b) Lift
c) Support
Answer:-c) Support
Clustering process works on _ measure.
Selec the best answer from given options :
a) Lift
b) Support
c) Confidence
d) Probability
e) Distance
Answer:-e) Distance
__ step of KDD process helps in identifying valuable patterns.
Selec the best answer from given options :
a) Pattern Evaluation
b) Knowledge Presentation
c) Data Mining
Answer:-a) Pattern Evaluation
__ aids in identifying associations, correlations, and frequent patterns in data.
Selec the best answer from given options :
a) Association Rule Mining
b) Classification
c) Clustering
Answer:-a) Association Rule Mining
Explanatory variable is a __.
Selec the best answer from given options :
a) Predictor Variable
b) Dependent Variable
c) None of the options
d) All the options
e) Response variable
Answer:-a) Predictor Variable
Response variable is a __.
Selec the best answer from given options :
a) Dependent Variable
b) Predictor Variable
c) Explanatory Variable
d) All the options
Answer:-a) Dependent Variable
Classification is a __ task.
Selec the best answer from given options :
a) Data Analysis
b) Data Transformation
c) Data Integration
d) Data Cleaning
Answer:-a) Data Analysis
__ term portrays the process of discovering small pieces from a large volume of raw material.
a) Selec the best answer from given options :
b) Data
c) Data Cleaning
d) Mining
Answer:-d) Mining
__ outlier significantly deviates based on the context selected.
Selec the best answer from given options :
a) Collective Outlier
b) Global Outlier
c) Contextual Outlier
d) None of the options
Answer:-c) Contextual Outlier
__________statistics provides inferences on population.
Selec the best answer from given options :
a) Descriptive
b) Inferential
Answer:-b) Inferential
In Association Rules, the Antecedent and Consequent form a disjoint set.
Selec the best answer from given options :
a) True
b) False
Answer:-a) True
Classification predicts the value of __ variable.
Selec the best answer from given options :
a) Continuous
b) Categorical
Answer:-b) Categorical
Derived relationships in Association Rule Mining are represented in the form of __.
Selec the best answer from given options :
a) Charts
b) Decision Tree
c) All the options
d) Rules
Answer:-d) Rules
The science of collecting, interpreting, and analyzing data is known as __.
Selec the best answer from given options :
a) Statistics
b) Probability
c) Data Collection
d) Data Description
Answer:-a) Statistics
Descriptive statistics is used in __ datasets.
Selec the best answer from given options :
a) Sample
b) Population
c) All the options
Answer:-a) Sample
__ parameter of regression helps in identifying the direction of relationship between variables.
Selec the best answer from given options :
a) Measure of Discrepancy
b) Regression Coefficient
Answer:-b) Regression Coefficient
Which among the following is/are (an) outlier detection method(s)?
Selec the best answer from given options :
a) All the options
b) None of the options
c) Proximity-based approach
d) Clustering-based approach
e) Classification approach
f) Statistical approach
Answer:-a) All the options
__ stage of data science process helps in converting raw data into a machine-readable format.
Selec the best answer from given options :
a) Data Description
b) Data Cleaning
c) Exploratory Data Analysis
d) Data Gathering
Answer:-c) Exploratory Data Analysis
Inferential statistics is used in __ datasets.
Selec the best answer from given options :
a) Sample
b) Population
c) All the Options
Answer:-b) Population
Which of the following helps in measuring the dispersion range of the data?
Selec the best answer from given options :
a) Variance
b) None of the options
c) All the options
d) Standard Deviation
e) Range
f) Interquartile range
Answer:-c) All the options
Distance measure(s) used in clustering process of Numeric Dataset is/are __.
a) Minkowski
b) Hamming
c) All the options
d) Manhattan Distance
Answer:-c) All the options
Jacard Index distance measure is used on __.
Selec the best answer from given options :
a) Numeric dataset
b) Non-numeric dataset
Answer:-b) Non-numeric dataset
Which of the following helps in measuring the central tendency of the dataset?
Choose the correct option from below list
a) Median
b) Mode
c) All the options
d) Mean
Answer:-c) All the options
__________association measure compares the confidence with the expected confidence.
Choose the correct option from below list
a) Lift
b) Confidence
c) Support
Answer:-a) Lift
Identify the Unsupervised Learning method.
Choose the correct option from below list
a) Classification
b) Clustering
c) Association Rule Mining
Answer:-b) Clustering
Regression can be used in predicting/forecasting Applications.
Selec the best answer from given options :
a) True
b) False
Answer:-a) True
Collective outlier significantly deviates from the entire dataset.
Selec the best answer from given options :
a) True
b) False
Answer:-b) False
What is KDD and its Process
Some of the people says that data mining as a synonym of Knowledge Discovery in Databases or KDD and some others consider Data Mining as a vital step in the KDD process.
Below are the steps in KDD Process
a) Data Cleaning – Here we will Remove the noisy and inconsistent data.
b) Data Integration – Here data from diverse sources are unified.
c) Data Selection – Here we will get retrieved the relevant data.
d) Data Transformation – Here Data is transformed into appropriate forms.
e) Data Mining -This is Intelligent methods which is applied to extract knowledge and patterns.
f) Pattern Evaluation – This is used to identifies valuable patterns.
g)Knowledge Presentation- Visualization and presentation of the extracted knowledge and the identified patterns
Identify the algorithm that works based on the concept of clustering.
Selec the best answer from given options :
a) K-Means
b) SVM
c) Decision Tree
Answer:-a) K-Means
__ step of classification contributes to the construction of learning model.
Selec the best answer from given options :
a) Classification Step
b) Learning Step
Answer:-b) Learning Step
Which process of KDD aids in unifying data from different sources?
Selec the best answer from given options :
a) Data Cleaning
b) Data Selection
c) Data Mining
d) Pattern Evaluation
e) Data Integration
Answer:-e) Data Integration
Additional Questions and Answers for Data Mining with Explanation:
Q1. Name the different Data Mining techniques and explain the scope of Data Mining.
The different Data Mining techniques are:
a) Prediction – It discovers the relationship between independent and dependent instances. For instance, when considering sales data, if you wish to predict the future profit, the sale acts as an independent instance, whereas the profit is the dependent instance. Accordingly, based on the historical data of sales and profit, the associated profit is predicted value.
b)Classification analysis – In this ML-based method, each item in a particular set is classified into predefined groups. It uses advanced techniques like linear programming, neural networks, decision trees, etc.
c)Association rule learning – This method creates a pattern based on the relationship of the items in a single transaction.
d) Decision trees – The root of a decision tree functions as a condition/question having multiple answers. Each answer leads to specific data that helps in determining the final decision based on the data.
e) Sequential patterns – It refers to the pattern analysis used for discovering identical patterns in transaction data or regular events. For example, historical data of customers helps a brand to identify the patterns in the transactions that happened in the past year.
f) Clustering analysis – In this technique, automatically a cluster of objects having similar characteristics is formed. Clustering method defines classes and then places suitable objects in each class.
The scope of Data Mining is to:
a) Discover previously unknown patterns – Data Mining tools sweep and scrape through a broad and diverse range of databases to identify the previously hidden trends. This is nothing but a pattern discovery process.
b) Predict trends and behaviours – Data Mining automates the process of identifying predictive information in large datasets/databases.
Q2. What are the different fields where data mining is used?
Data Mining is mainly used by big consumer-based companies that focus on retail, financial, communication, and marketing fields. It is used to get the consumer’s transactional data pattern to determine price, customer preferences, and product positioning, which later impact sales, customer satisfaction, and corporate profits.
Following is the list of most important areas where data mining is widely used:
Healthcare and Personal Grooming
Data mining has a significant impact in the field of healthcare. It uses data and analytics to identify the best practices that can improve care and reduce costs. Scientists use several Data Mining approaches like multi-dimensional databases, machine learning, soft computing, data visualization, statistics, etc., to make things easy for patients. Using Data Mining, we can predict the volume of patients in every category and make sure that the patients get the appropriate care at the right place and at the right time.
Market Basket Analysis
This modeling technique follows the theory that if you buy a specific group of items, you are more likely to buy another group of items. Using this technique, the retailer can understand the purchase behavior of a buyer and change the store’s layout according to the buyer’s needs.
Education & Training
Educational Data Mining is used to identify and predict the students’ future learning behavior. If a student is studying a particular course, then the institutes can know which related course they may apply later by using Data Mining. This is also beneficial to make focus on what to teach and how to teach. The institutes can capture the learning pattern of the students and use to develop techniques to teach them.
Manufacturing Engineering
By using Data mining tools, we can discover patterns in complex manufacturing processes. We can use this to predict the product development span time, cost, and dependencies, among other tasks.
Fraud Detection
Data Mining can be used as a perfect fraud detection system to protect the information of all users. By Data Mining, we can classify fraudulent or non-fraudulent data and make an algorithm to identify whether the record is fraudulent or not.
Customer Relationship Management
We can use Data Mining to maintain a proper relationship with a customer.
Some other areas where data mining is used:
- Intrusion Detection
- Lie Detection
- Customer Segmentation
- Financial Banking
- Corporate Surveillance
- Research Analysis
- Criminal Investigation
- Bio Informatics
Q3. What is clustering?
In Data Mining, clustering is the process used to group abstract objects into classes containing similar objects. Here, a cluster of data objects is treated as one group. Thus, during the analysis process, data partition happens in groups which are then labelled based on identical data. Cluster analysis is pivotal to Data Mining because it is highly scalable and dimensional, and it can also deal with different attributes, interpretability, and messy data.
Data clustering is used in several applications, including image processing, pattern recognition, fraud detection, and market research.
Q4. What do you understand by Data Purging?
Data Purging is a process that is used in database management systems to maintain relevant data in a database. It is used to clean the junk data by eliminating or deleting the row and columns’ unnecessary NULL values. It is essential because whenever we need to load new data in the database, we have to purge the irrelevant data from the database.
Using Data Purging of the database frequently, we can remove the junk data that takes up a fair amount of database memory and slow down the database’s performance. So, we can say that data purging is mandatory when the database’s size gets too large.
Q5. What are the advantages and disadvantages of using the MOLAP storage model?
The term MOLAP stands for “Multidimensional Online Analytical Processing.” As the name shows, it is a multidimensional storage model. This storage model type stores the data in multidimensional cubes and not in the standard relational databases.
Advantages of using the MOLAP storage model:
It stores the data in multidimensional cubes, so the query performance is excellent.
The calculations are pre-generated when a cube is created.
Disadvantages of using the MOLAP storage model:
The most significant disadvantage of using MOLAP is that it can store only a limited amount of data. In this storage model, the calculations are triggered at the cube generation process so, it cannot support a large amount of data.
It requires a lot of skill to utilize this.
It is not free. You have to pay the license cost associated with it.
Q6. What are the advantages and disadvantages of using the HOLAP storage model?
The term HOLAP stands for “Hybrid Online Analytical Processing.” It is a combination of MOLAP and ROLAP. This is a hybrid storage model and was built to overcome the MOLAP and ROLAP storage model’s limitations.
Advantages of using the HOLAP storage model:
It provides better accessibility in comparison to both ROLAP & MOLAP storage models.
Because of its cache facility, the querying is faster in this storage model.
The query performance is moderate. It is faster than ROLAP but slower than MOLAP.
Its cubes are smaller than MOLAP, so only precise data is fetched for processing.
It is best when data volume is expected to increase over time.
Its processing ability is higher as compared to ROLAP and MOLAP systems.
Disadvantages of using HOLAP storage model:
In this storage model, both ROLAP and MOLAP are combined to form HOLAP, so the data volume is large.
It occupies a lot of storage space, as it contains the data from relational databases and multidimensional databases.
The processing speed is slow while querying.
It requires system processing whenever data is updated, inserted, or deleted in the database.
We need to update the cache whenever an update happens in the database associated with the stored queries and relational data.
Maintenance is complex in this storage model because it quite often updates.
Conclusion
Data Mining is mainly used in the following fields:
Finance & Banking Sectors
Data Mining is very important in the finance & banking field because data extraction provides financial institutions information on loans and credit reports. It facilitates us to create a model for historic customers by determining their good or bad credits. It is also used to detect fraudulent transactions by credit cards that protect a credit card owner.
Marketing & Retails
Marketing companies use Data Mining to create models based on the shopping history of their customers. By using this technique, they can sell profitable products to their targeted customers.
Increasing Brand Loyalty
Companies use Data Mining techniques in marketing campaigns after understanding their customers’ needs and habits. After getting the right information, the companies can quickly increase their brand loyalty.
Helps in Decision Making
Companies use Data Mining techniques to help them in making some decisions in marketing or business. By using this technology, it is effortless to determine all information. Also, the company can decide what is unknown and unexpected.
To Predict Future Trends
Data Mining can be used to predict future trends by studying the data patterns for a long time. It can also help people to adopt behavioral changes.
Increase Company Revenue
Data mining technology involves collecting information on goods sold online. This can eventually reduce the cost of products and increase the company revenue.
Determining Customer Groups
Data Mining provides market analysis so we can get a response directly from customers. It also includes information during the identification of customer groups.
Increases Website Optimization
Data Mining can find all kinds of unseen element information, which can help you optimize your website.