Data Warehouse and It’s Architecture

Data warehouse adopts a 3 tier architecture. The bottom layer is called the warehouse database layer, the middle layer is the online analytical processing server (OLAP) while the topmost layer is the front end user interface layer. We will discuss the data warehouse architecture in detail here.

Multitier Architecture of Data warehouse

The image below shows the 3 tier architecture of data warehouse.

Multitier Architecture of Data warehouse

Let us discuss each of the layers in detail.

  1. Database Layer: The bottom-most layer comprises of the warehouse database layer. It is mostly the relational database system. The data from various external sources and operational databases is fed into this layer. Before feeding this data, preprocessing techniques are applied. The data is extracted, cleaned and transformed before loading the data in the database layer using back end tools. Also, the cleaned data is loaded and refreshed to update the data warehouse.

The extraction of data from external sources is done using gateways which generate SQL code. Example of the gateways is ODBC (Open Database Connection) and OLEDB (Object Linking and Embedding Database). The database layer also contains metadata. Metadata stores information about data warehouse.

  • OLAP Server: The OLAP server is either a relational online analytical processor (ROLAP) or Multidimensional OLAP (MOLAP). The ROLAP server converts the multidimensional data into relational operations while the MOLAP directly implements the multidimensional operations.
  • User Interface Layer:  This layer provides necessary tools for querying and reporting.

The Architecture of A Data Warehouse

  1. The basic architecture of data warehouse
Basic architecture of data warehouse

The image above shows a simple single tier architecture of a data warehouse. Various components of this architecture are:

  1. Data source: The operational systems are systems used for day- to day transactions. The data processing in these systems takes place in such a manner that data integrity is maintained. The data from these are operational data which contains a lot of information about the company. Some forms of operational data can be :
Basic architecture of data warehouse
  • Warehouse: Warehouse contains Metadata, Raw Data, and Summary Data.
    • Metadata: It describes data about other data and data structures such as objects, business rules, and processes. In a data warehouse, metadata defines the warehouse objects. It is used to generate scripts that build and populate the data warehouse. Metadata is also used to locate the contents of the data warehouse. The functions of metadata are explained by the image below:
Data warehouse metadata
  • Raw Data: It is unprocessed data from data sources. It is converted to information using selection, extraction, and organization of data. Example of raw data, the POS (point of sale) in supermarkets generate loads of data every day. It is raw data. This needs to be processed to show interesting results. The useful information generated from raw data can be used for predictive analysis technology.
    • Summary Data: This component is very important as it precomputes long computations in advance and stores itself. Summary tables’ stores aggregated and summarized data for optimal performance. 80% of the businesses use summarized data for decision making. This summarization happens in multidimensional space using one or more dimensions. Aggregation happens by combining a large amount of detailed data together. Example: Analyzing the accounts with 4 dimensions: customer, region, month and service.

Data Warehouse Architecture with Staging

Data warehouse architecture with Staging

The Staging area of the data warehouse is a temporary space where the data from sources are stored. This area is required in data warehouses for timing. The staging component performs the functions of consolidating data, cleaning data, aligning the data to correct place.

Data Warehouse Architecture with Staging and Data Mart

Data warehouse Architecture with Staging and Data Mart

The Data Mart is a subset of Data warehouse focusing on a single line of business. As data warehouse contains company-wide data of all department, data marts contain data of single departments such as Sales, Inventory, and Marketing, etc. Each department has owns a data mart including its hardware, software, and data. With Data Marts it is easier for departments to maintain their individual department data.

What are Data Warehouse Models?

 From an architecture point of view, the data warehouse has 3 models:

  1. Enterprise Data warehouse: This data warehouse constitutes data from all the departments of an organization. It spans the entire organization. This type of data warehouse integrates data from all operational systems and external sources. The data ranges from few gigabytes to hundreds of gigabytes and terabytes. It contains detailed information as well as summarized information. Implementation of this warehouse requires extensive modeling and many years of building. It is built on mainframes and parallel architecture platforms.
  2. Data Mart: Data Mart are subsets of a data warehouse that focus on a specific group. The scope of Data Mart is limited to particular subjects. These are implemented on Unix/Linux or Windows-based servers. Data marts implementation also requires complex business modeling but can be built in a few weeks. Data Marts are of 2 types:
    1. Dependent Data Mart: These are populated directly from the corporate data warehouses.
    1. Independent Data Mart: These are populated from operational systems or external data sources or directly from department data.
  3. Virtual Warehouse: A virtual warehouse is created by summarized views over operational databases. It is easier to build but requires additional capacity on operational database servers.

Data Warehouse Development

The data warehouses are developed with 2 approaches:

  1. Top-down Approach: This approach is systematic and minimizes integration issues. Maintaining consistency in the data model using this approach is a challenge. Also, this approach lacks flexibility and takes a long time to build and is costly.
  2. Bottom-up Approach: This approach design builds and deploys independent data marts. It is a flexible, low-cost solution but can create issues in integration when data from multiple data marts is sent to the data warehouse.
  3. Evolutionary Approach: This is an incremental and evolutionary model. In this model, a high-level data model is created in a span of 1-2 months. This high-level model showsa company-wide, consistent and integrated view of data from different departments. With this high-level model, the integration issues are reduced. Then independent data marts are implemented side by side data warehouse using the same data model. Thirdly, the distributed data marts can also be constructed by integrating different data marts. At last a multitier data warehouse is built which can then be used to populate dependent data marts. The image below shows the approach to data warehouse development.
Data warehouse development

Conclusion

A data warehouse is a subject oriented, non-volatile, time variant, integrated database giving information about the history and analysis of subject rather than transaction processing.  The data warehouse architecture has different components each playing an important role. Metadata keeps details about data warehouse data. Summarized data shows views of operational data for better performance. Data Marts are like data warehouses but focus only on a single subject. These are either populated directly from enterprise DW or externally from operational DB, flat file, etc.


⇓ Subscribe Us ⇓


If you are not regular reader of this website then highly recommends you to Sign up for our free email newsletter!! Sign up just providing your email address below:


 

Check email in your inbox for confirmation to get latest updates Software Testing for free.


  Happy Testing!!!
 
Data Warehousing Tutorial

Leave a Comment

Share This Post