Thursday, January 21, 2010

Design aspects of data staging layer

Overview

You usually have two to three layers approach for data warehousing solution. On of the layer is called a "staging layer".

Data from various sources is staged here temporary till it is processed and transformed into data warehouse. Data in this layer can be relational in nature.
Different data flows of data staging
Pull mode data staging

Push mode data staging

There are two ways to flow data into the data staging layer, one is data pull mode and other is data push mode. What I mean by this is, in pull mode you define a process to read from the data source, either using a date range or reading from delta source. The process reads data and put it into the staging tables. In push mode, delta or changed data is transferred into the staging table by means middleware.

This was the different ways of data flow into data sharing. Let us see now different techniques you can use for this environment.

Different data staging techniques
Store and forward: In this technique, a data is stored in staging area and then used for transformation and loading into Data Warehouse environment, like ELT
Forward and Forward: In this technique, a data is directly read from ODS and directly will be inserted or updated in the Data Warehouse environment

Different types of data loads
Full data load: This is specially used for first time data load into data warehouse. Generally you fetch all the needed data from the source into staging tables. This is time consuming process and the processing time will gradually increase because of data growth rate in the source system.

Delta data load: This is specially used to extract the changed/new records.

Different ways to store data in staging
File: Data can be stored in a file which can then be transformed and loaded into data warehouse.
Database: Staging data can be stored in the database table(s) either permanently or for some time till it gets loaded into the Data Warehouse.

Conclusion
Before designing data staging layer one has to see these aspects of the design. What should be the data flow technique? What should be the data load technique? What should data storing technique?

No comments:

Post a Comment