Slowly Changing Dimensions (SCD) is a data warehousing concept describing how dimension table records (like customer, product, or employee records) change over time, and the strategies for managing those changes in a way that preserves historical analytical accuracy. The term was formalized by Ralph Kimball and remains foundational to dimensional data modeling in modern lakehouses.
The SCD Types
- Type 1 (Overwrite): The simplest approach. When a dimension attribute changes (a customer moves to a new city), the old value is overwritten with the new one. No history is preserved. Suitable for attributes where history is irrelevant (correcting a typo in a name).
- Type 2 (Add New Row): The most common approach. When an attribute changes, a new row is inserted with the new values, a new surrogate key, and effective start/end dates. The old row remains with its end date set. This preserves complete history, allowing "as-of" queries (what city did the customer live in when they made that purchase in 2023?).
- Type 3 (Add Column): Adds a "previous value" column alongside the current value column. Preserves only one level of history. Rarely used due to scalability limitations.
SCD Type 2 and Apache Iceberg
Implementing SCD Type 2 historically required complex ETL logic: identify changed rows, close existing records by setting end dates, and insert new rows with new effective dates. Iceberg's MERGE INTO statement simplifies this significantly. A single MERGE can atomically update existing rows (close them) and insert new rows (open new versions) in a single ACID transaction, eliminating the multi-step approach that was error-prone in traditional data lake implementations. Iceberg's time travel also provides an alternative to SCD Type 2 for some use cases: instead of maintaining separate rows for each version, queries can use AS OF TIMESTAMP to examine the dimension table's state at any past point in time.

