ETL & Data Warehousing for Product Sales Analytics
Description
This project was developed as part of the Data Warehousing course at RIT Croatia. Working in a team, we designed and implemented a complete data mart for a fictitious company (TPC) to support sales analytics. The project covered the full ETL pipeline: we collected and cleaned raw data, applied normalization techniques, performed dimensional modeling, and loaded the data into a cloud-based Snowflake data warehouse.
The experience helped consolidate key concepts such as data normalization, Slowly Changing Dimensions (SCD Type 6), dimensional modeling (star schema), and end-user reporting needs. It was an important hands-on project to understand how to move from raw, unstructured data to a robust analytics-ready warehouse.
Tools
Pentaho Data Integration (Kettle) – ETL process: data cleaning, transformation, loading
Snowflake – cloud-based data warehouse
SQL – for table creation, queries, normalization, and schema management
Dimensional Modeling – Star schema, SCD Type 6
Normalization Techniques – applied before modeling phase
Team Collaboration – Git & shared documents