Dimensional data modeling organizes data into fact tables (measuring business events like sales and clicks) and dimension tables (describing the context of those events: customers, products, dates). The arrangement of these tables relative to each other defines the schema shape, with two primary patterns: the Star Schema and the Snowflake Schema.

Star Schema

In a Star Schema, a central fact table (e.g., orders) directly references multiple flat dimension tables (dim_customer, dim_product, dim_date). The name comes from the star-like diagram when fact and dimension tables are connected. Key characteristics:

Snowflake Schema

A Snowflake Schema normalizes dimension tables by splitting them into multiple related tables. The dim_customer table might contain customer_id and address_id, with address details in a separate dim_address table, and country details in a dim_country table. This creates a "snowflake" branch structure:

The Modern Lakehouse Verdict

In modern columnar lakehouse environments with MPP engines, the storage savings of snowflake schema are negligible (Parquet's columnar compression already eliminates most redundancy). The join overhead, however, is measurable in distributed query execution. Most practitioners default to Star Schema for analytical layers, keeping snowflake-style normalization only at the raw ingestion layer where data contracts with source systems require normalized structure.

Master the Agentic Lakehouse

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon