Skip to main content

DW Architecture - Traditional vs Bigdata Approach

  • DW Flow Architecture - Traditional

           Using ETL tools like Informatica and Reporting tools like OBIEE.
  1. Source OLTP to Stage data load using ETL process.
  2. Load Dimensions using ETL process.
  3. Cache dimension keys.
  4. Load Facts using ETL process.
  5. Load Aggregates using ETL process.
  6. OBIEE connect to DW for reporting.

  • DW Flow Hybrid Architecture - Bigdata integrated with Traditional Load Methods.

     Hybrid architecture blend traditional ETL load with bigdata processing techniques. Here are some of the key features.
  1. ETL process will be used for low volume structured data loads.
  2. Bigdata processing techniques used for high volume , unstructured or semi structured data loads.
  3. Source OLTP to HDFS load using sqoop import.
  4. Dimensions load require update / insert processing and loaded via ETL.
  5. Fact tables loaded using map reduce jobs.
  6. Aggregates created using Hive QLs.
  7. Aggregates exported to DW using Sqoop.
  8. OBIEE connect to both DW and Hive tables for reporting.




  1. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic.

    Informatica training in chennai|Best Informatica Training In Chennai|Fita Chennai reviews|FITA Chennai complaints


  2. This is a topic which is near to my heart... Thank you! Exactly where are your contact details though? itunes sign in


Post a Comment

Popular posts from this blog

Cloudera QuickStart virtual machines (VMs) Installation

Cloudera Distribution including Apache Hadoop ( CDH ) is the most popular Hadoop distribution currently available. CDH is 100% open source. Cloudera quick start VMs include everything that is needed to tryout basic package based CDH installation. This is useful to create initial deployments for proof of concept (POC) or development.

Amazon CloudSearch - Technology Review

Amazon CloudSearch is a fully managed service in the cloud that makes it easy to set up, manage, and scale a search solution. Amazon CloudSearch can search large collections of data such as web pages, document files, forum posts, or product information. CloudSearch makes it possible to search large collections of mostly textual data items called documents to quickly find the best matching results. Search requests are usually a few words of unstructured text. The returned results are ranked with the best matching, or most relevant, items listed first.