Data Warehouse - Basics Fundamental Notes - Part1


Data warehouse which is build on RDMS is used for analyze the data and based on the nature of data and the analysis, people take the decisions. The objective is to bring intelligence analysis which lead to make the predictive analysis call right.

Following are few fundamental notes on data warehouse.

  • Now a days, the Data Lake is being used for Big Data analysis purpose and not limited only in RDMS stored area.

  • In particular, there are three key factors are exist in data warehouse world, and they are three V's - Volume, Velocity, and Variety.

  • In data warehouse, not only structured data maintained or analysis, the data warehouse system should able to handle unstructured data like video, audio, images as well. 

  • Centralize data warehouse using Single RDMS. 

  • Keep a note that data warehouse is NOT as Data mart. Dependent DataMart variation with Corporate Information Factory(CIF).

  • There are four rule of data warehouse
    • Integrated 
    • Subject Oriented
    • Time Variant
    • Non Volatile

  • ETL(Extract, Transform and Load) layer's objective is to  concentrate on 'E'(Extract) with minimal 'T'(Transform)

  • There are two types of staging layer(Extract operation) exist
    • Non-Persistent Staging Layer - Data In and Data Out
      • Advantages - 1) Less diskspace space; 2) Data move to next layer and keep the space empty
      • Disadvantages - 1) For rework(like data corruption or any major faulty) - data needs to bring again in staging layer from source system; 2) QA needs access to source system to validate staging layer data

    • Persistent Staging Layer - Data In and Data Stay
      • Advantages 1) Can rebuild user layer from staging layer at any time and do not need to bring data from source system; 2) For data quality insurance, quickly validate from staging with user access layer and no need to visit source system
      • Disadvantages 1) More disk space is required; 2) Need implementation of data governance like access for viewing old records


  • Types of ETL - 1) Initial ETL, 2) Incremental ETL

  • Incremental(refresh data) ETL pattern - a) Append, b) In-place update, c) Complete replacement, d)Rolling append

  • Data Transformation goals
    • Uniformity
    • Restructuring
  • Common Transformation Models
    • Data value unification
    • Data type and size unification
    • De-duplication
    • Dropping columns (vertical slicing)
    • Value based row filtering (horizontal slicing)
    • Correcting known errors


Comments

  1. This blog is very helpful and informative for this particular topic. I appreciate your effort that has been taken to write this blog for us. Microsoft Certified Power Platform Fundamentals

    ReplyDelete

Post a Comment

Popular posts from this blog

How to fix Azure DevOps error MSB4126

SharePoint Admin Center

How to create Custom Visuals in Power BI – Initial few Steps