Skip to main content


Showing posts from 2015

Cloudera QuickStart virtual machines (VMs) Installation

Cloudera Distribution including Apache Hadoop ( CDH ) is the most popular Hadoop distribution currently available. CDH is 100% open source. Cloudera quick start VMs include everything that is needed to tryout basic package based CDH installation. This is useful to create initial deployments for proof of concept (POC) or development.

Datawarehouse Bigdata Integration - Proof of Concept

The objective of this proof of concept project is to evaluate the feasibility of converting a traditional ETL architecture for data warehouse load into a hybrid approach with bigdata integration.   Refer the following post for architectural details. Proof of Concept - Project Plan  The POC project has a timeline of 4 weeks. Following activities planned during this period

DW Architecture - Traditional vs Bigdata Approach

DW Flow Architecture - Traditional             Using ETL tools like Informatica and Reporting tools like OBIEE.   Source OLTP to Stage data load using ETL process. Load Dimensions using ETL process. Cache dimension keys. Load Facts using ETL process. Load Aggregates using ETL process. OBIEE connect to DW for reporting.