The Kimball Methodology, developed by Ralph Kimball and documented in "The Data Warehouse Toolkit" (1996, with subsequent editions), is the most widely practiced approach to dimensional data modeling. Its core philosophy is that data warehouses (and by extension, lakehouses) should be designed around business processes and organized to be queryable by business users, not just technically optimized for storage efficiency.
Core Kimball Concepts
- Fact Tables: Store measurements of business events (one row per sale, one row per web click, one row per insurance claim). Facts contain quantitative measures (revenue, quantity, duration) and foreign keys to dimension tables.
- Dimension Tables: Contain the descriptive context of business events (customer name, product category, geographic region, date attributes). Kimball recommends denormalizing dimension attributes into a single flat table (no joins within the dimension).
- Conformed Dimensions: Dimension tables shared consistently across multiple fact tables in the same data warehouse. A conformed dim_date table used by both the sales fact table and the marketing fact table enables cross-process analysis.
- Bus Architecture: The enterprise-wide collection of all fact tables connected by conformed dimensions, enabling consistent cross-functional analysis.
Kimball in Modern Lakehouses
Kimball methodology is directly applicable to Apache Iceberg lakehouses. Fact tables become large Iceberg tables partitioned by date (the most common query filter). Dimension tables are smaller Iceberg tables that load efficiently into memory. dbt (Data Build Tool) is the most popular tool for implementing Kimball models in modern lakehouses: dbt models transform Silver-layer Iceberg tables into Gold-layer Kimball fact and dimension tables using declarative SQL.

