ETL Process and Data Mart

ETL Tools

ETL process stands for E-Extract, T-Transform, and L- Load. It is a process of transferring data from source which is a database to destination which is a data warehouse. In this process the data is extracted from the source database, transformed into a format as required and then loaded to data warehouse destination. To carry out this process ETL tools are used.

The source databases are multiple and destination data warehouse is single, thus ETL process takes care of each movement and processing of data to load data in most integrated form in the data warehouse.

 

Let us understand each step of ETL process in deep detail

#. Extraction

This is the first step of EL process. In this step, data from desired data source which may be heterogeneous in nature is extracted for further processing. During this step the required data is not known thus large amount of data is pulled which is then sent for filtering process. The extraction process should be done carefully so that it doesn’t affect the original data source performance and response time. There are two types of extractions to choose from: 

Full Extraction: In this type, the data modifications are not tracked, so by extraction full data is returned. Incremental Extraction: This type of extraction identifies the modifications made to data. It keeps track of changes made to data since last extraction took place.

The type of extraction preferred by businesses depends on the business needs and destination data warehouse requirements.

 

#. Transformation

In this step some rules are applied to the extracted data to change the format as per the requirements. Some complex queries, joins, sorts, concatenations are applied to the database to fetch the records in ordered fashion.

 

#. Load

This is the last step of ETL process in which the formatted data is loaded into the data warehouse. The successfully loaded data ensures data integrity.

 

Data Marts

Data Marts are subset of the information content of data warehouse that supports the requirements of a particular department or business function. Data mart are often built and controlled by a single department within an enterprise. The data may or may not be sourced from an enterprise data warehouse. Data Marts are very popular in some solutions, where the choice of user access tool requires the use of data marting.

 

Why do we need Data Marts?

Using data marts improve end user response time, query performance. We can easily get information for single department such as Marketing, Finance, and HR etc. Creating data marts requires less cost in implementation in comparison with data warehouse. Data Marts are also helpful for significant decision making for organizations. These provide data in a form that matches the collective view of a group of user.

 

Types of Data Marts

There are 3 types of data marts:

#. Independent Data Marts:

Independent data marts focuses exclusively on one subject area and it is not designed in an enterprise context. Ex. Manufacturing, Finance, Sales.

#. Dependent Data Marts:

Dependent data marts data comes from a data warehouse.

#. Hybrid Data Marts:

Combined approach of above two is hybrid data marts. A hybrid data mart allows you to combine input from sources other than a data warehouse. This is useful in need of ad hoc integration, such as after a new group or product is added to the organization. 

ETT: Extraction-Transformation-Transportation

The main difference between independent and dependent data marts is how the data is populated in the data mart; that is, how you get data from of the sources and into the data mart. This step, called the Extraction-Transformation-Transportation (ETT) process, involves moving data from operational systems, filtering it, and loading it into the data mart.

 

Difference between Independent and Dependent Data Marts

Dependent Data Mart Independent Data Mart
The ETT process here is a process of identifying the right subset of data relevant to the chosen data mart subject and moving a copy of it It deals with all aspects of the ETT process. The work effort is as much as we do with a central data warehouse.
 The process is simplified because formatted and summarized (clean) data has already been loaded into the central data warehouse.   The number of sources are fewer and the amount of data associated with the data mart is less than the warehouse
Dependent data marts are usually built to achieve improved performance and availability, better control, and lower costs resulting from local access of data relevant to a specific department.   These are often driven by the need to have a solution within a shorter time  

Hybrid data marts simply combine the issues of independent and dependent data marts.

 

How to design a data mart

#. Designing

The design step first involves gathering the business and technical requirements, identifying data sources, selecting the appropriate subset of data and designing the logical and physical structure of the data mart.

 

#. Constructing

This step creates the physical database and storage structures, such as table spaces, associated with the data mart. Creation of the schema objects, such as tables and indexes defined in this design step. It determines how the setup of tables and accessibility be optimized.

 

#. Accessing

This step sets up an intermediate later called metalayer. This layer translates the database structures and objects into business terminology so that the interaction with the data mart is easier. This step also manages the business interfaces, data structures, summarized so that the queries are answered quickly and efficiently.

 

Managing

In this step, management tasks are performed. It provides secure access to the data. It manages the growth of the data. It optimizes the system performance. It ensures the availability of data during the system failures also

 

Challenges in Data Marts

  • The operating cost of data mart is high as well as redevelopment costs are high.
  • It is difficult to maintain multiple data marts.
  • Populating the data mart during Loading consumes time.
  • Running out of time window generally happens while populating data mart as lot of access tools are used.
  • The source data is different so loading into data mart using same access tools also poses a challenge.

⇓ Subscribe Us ⇓


If you are not regular reader of this website then highly recommends you to Sign up for our free email newsletter!! Sign up just providing your email address below:


 

Check email in your inbox for confirmation to get latest updates Software Testing for free.


  Happy Testing!!!
 
Data Warehousing Tutorial

Leave a Comment

Share This Post