ETL & Data Warehousing for Product Sales Analytics

Description

This project was developed as part of the Data Warehousing course at RIT Croatia. Working in a team, we designed and implemented a complete data mart for a fictitious company (TPC) to support sales analytics. The project covered the full ETL pipeline: we collected and cleaned raw data, applied normalization techniques, performed dimensional modeling, and loaded the data into a cloud-based Snowflake data warehouse.

The experience helped consolidate key concepts such as data normalization, Slowly Changing Dimensions (SCD Type 6), dimensional modeling (star schema), and end-user reporting needs. It was an important hands-on project to understand how to move from raw, unstructured data to a robust analytics-ready warehouse.

Tools

  • Pentaho Data Integration (Kettle) – ETL process: data cleaning, transformation, loading

  • Snowflake – cloud-based data warehouse

  • SQL – for table creation, queries, normalization, and schema management

  • Dimensional Modeling – Star schema, SCD Type 6

  • Normalization Techniques – applied before modeling phase

  • Team Collaboration – Git & shared documents