Data Warehouse (DW) is a process for managing and collecting the data in order to provide meaningful insights.
It is commonly used to connect and analyze business data from various individual heterogeneous sources.
The data warehouse is the core of the Business Intelligence system which is built for data analysis and reporting.
Datawarehouse system is also known by the following name:
Decision Support System (DSS)
Management Information System
Executive Information System
Business Intelligence Solution
Data may be:
General stages of DataWarehouse
Earlier, organizations started relatively simple use of data warehousing. However, over time, more sophisticated use of data warehousing begun.
The following are general stages of use of the datawarehouse (DWH):
Offline Data Warehouse
Data in the Datawarehouse is regularly updated from the Operational Database. The data in Datawarehouse is mapped and transformed to meet the Datawarehouse objectives.
Offline Operational Database
In this stage, data is just copied from an operational system to another server. In this way, loading, processing, and reporting of the copied data do not impact the operational system’s performance.
Real time Database
In this stage, Datawarehouses are updated whenever any transaction takes place in operational database. For example, Airline or railway booking system.
It is a blend of technologies and components which aids the strategic use of data. It is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users in a timely manner to make a difference.
these Data Warehousing Interview Questions have been designed especially to get you acquainted with the nature of questions you may encounter during your interview for the subject of Data Warehousing.
Question: Define data warehouse?
Answer: Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management’s decision-making process.
Question: What is the benefit of normalization?
Answer: Normalization helps in reducing data redundancy.
Question: List any five applications of data warehouse.
Answer: Some applications include financial services, banking services, customer goods, retail sectors, controlled manufacturing.
Question: What do OLAP and OLTP stand for?
Answer: OLAP is an acronym for Online Analytical Processing and OLTP is an acronym of Online Transactional Processing.
Question: What does subject-oriented data warehouse signify?
Answer: Subject oriented signifies that the data warehouse stores the information around a particular subject such as product, customer, sales, etc.
Question: What is the very basic difference between datawarehouse and operational databases?
Answer: A datawarehouse contains historical information that is made available for analysis of the business whereas an operational database contains current information that is required to run the business.
Question: What is Data Warehousing?
Answer: Data Warehousing is the process of constructing and using the datawarehouse.
Question: List the process that are involved in Data Warehousing.
Answer: DataWarehousing involves data cleaning, data integration and data consolidations.
Question: What do you mean by Data Extraction?
Answer: Data extraction means gathering data from multiple heterogeneous sources.
Question: List the Schema that a data warehouse system can implements.
Answer: A data Warehouse can implement star schema, snowflake schema, and fact constellation schema.
Question: Out of star schema and snowflake schema, whose dimension table is normalized?
Answer: Snowflake schema uses the concept of normalization.
Question: Define metadata?
Answer: Metadata is simply defined as data about data. In other words, we can say that metadata is the summarized data that leads us to the detailed data.
Question: Define the functions of a load manager.
Answer: A load manager extracts data from the source system. Fast load the extracted data into temporary data store. Perform simple transformations into structure similar to the one in the data warehouse.
Question: What does Metadata Respiratory contain?
Answer: Metadata respiratory contains definition of data warehouse, business metadata, operational metadata, data for mapping from operational environment to data warehouse, and the algorithms for summarization.
Question: How does a Data Cube help?
Answer: Data cube helps us to represent the data in multiple dimensions. The data cube is defined by dimensions and facts.
Question: Define dimension?
Answer: The dimensions are the entities with respect to which an enterprise keeps the records.
Question: Explain data mart.
Answer: Data mart contains the subset of organization-wide data. This subset of data is valuable to specific groups of an organization. In other words, we can say that a data mart contains data specific to a particular group.
Question: List the functions of data warehouse tools and utilities.
Answer: The functions performed by Data warehouse tool and utilities are Data Extraction, Data Cleaning, Data Transformation, Data Loading and Refreshing.
Question: What is Virtual Warehouse?
Answer: The view over an operational data warehouse is known as virtual warehouse.
Question: List the phases involved in the data warehouse delivery process.
Answer: The stages are IT strategy, Education, Business Case Analysis, technical Blueprint, Build the version, History Load, Ad hoc query, Requirement Evolution, Automation, and Extending Scope.
Question: List the functions performed by OLAP.
Answer: OLAP performs functions such as roll-up, drill-down, slice, dice, and pivot.
Question: Define load manager.
Answer: A load manager performs the operations required to extract and load the process. The size and complexity of load manager varies between specific solutions from data warehouse to datawarehouse.
Question: Define a warehouse manager.
Answer: Warehouse manager is responsible for the warehouse management process. The warehouse manager consist of third party system software, C programs and shell scripts. The size and complexity of warehouse manager varies between specific solutions.
Question: Define the functions of a warehouse manager.
Answer: The warehouse manager performs consistency and referential integrity checks, creates the indexes, business views, partition views against the base data, transforms and merge the source data into the temporary store into the published data warehouse, backs up the data in the data warehouse, and archives the data that has reached the end of its captured life.
Question: What is Summary Information?
Answer: Summary Information is the area in data warehouse where the predefined aggregations are kept.
Question: What does the Query Manager responsible for?
Answer: Query Manager is responsible for directing the queries to the suitable tables.
Question: List the types of OLAP server
Answer: There are four types of OLAP servers, namely Relational OLAP, Multidimensional OLAP, Hybrid OLAP, and Specialized SQL Servers.
Question: What is Normalization?
Answer: Normalization splits up the data into additional tables.
Question: Which one is faster, Multidimensional OLAP or Relational OLAP?
Answer: Multidimensional OLAP is faster than Relational OLAP.
Question: How many dimensions are selected in dice operation?
Answer: For dice operation two or more dimensions are selected for a given cube.
Question: How many fact tables are there in a star schema?
Answer: There is only one fact table in a star Schema.
Question: Which language is used for defining Schema Definition?
Answer: Data Mining Query Language (DMQL) is used for Schema Definition.
Question: What language is the base of DMQL?
Answer: DMQL is based on Structured Query Language (SQL).
Question: What are the reasons for partitioning?
Answer: Partitioning is done for various reasons such as easy management, to assist backup recovery, to enhance performance.
Question: How many dimensions are selected in Slice operation?
Answer: Only one dimension is selected for the slice operation.
Question: What kind of costs are involved in Data Marting?
Answer: Data Marting involves hardware & software cost, network access cost, and time cost.
Key Practice while implement a Data Warehouse
- Never replace operational systems and reports
- Decide a plan to test the consistency, accuracy, and integrity of the data.
- Don’t spend too much time on extracting, cleaning and loading data.
- The data warehouse must be well integrated, well defined and time stamped.
- While designing Data warehouse make sure you use right tool, stick to life cycle, take care about data conflicts and ready to learn you’re your mistakes.
- Establish that Data warehousing is a joint/ team project. You don’t want to create Data warehouse that is not useful to the end users.
- Ensure to involve all stakeholders including business personnel in Data warehouse implementation process.
- Prepare a training plan for the end users.