Data Vault 2.0 (DV2) is a data modeling methodology developed by Dan Linstedt that emphasizes auditability, flexibility to source system changes, and parallel loading. It is particularly well-suited for enterprise environments with strict regulatory requirements (banking, healthcare, government) where every piece of data must be traceable to its source system and every historical change must be preserved without exception.
The Three Core DV2 Components
- Hubs: Contain business keys (the natural keys that identify entities in source systems: customer_id, order_id, product_sku). Each Hub table stores only the business key, a hash key (derived from the business key), load date, and record source. Hubs never change once populated.
- Satellites: Store the descriptive attributes of Hub entities. When a customer's address changes, a new row is inserted in the customer satellite with the new address and a new load_date. The old row remains untouched. Satellites are always insert-only, preserving complete history.
- Links: Capture relationships between Hub entities (a sale links a customer Hub, a product Hub, a store Hub). Like Hubs, Links are insert-only and never updated.
Data Vault 2.0 and Apache Iceberg
Iceberg is an excellent physical storage layer for Data Vault models. The insert-only pattern of Hubs, Satellites, and Links maps naturally to Iceberg's append operations without requiring delete or update semantics. Iceberg's built-in partition pruning and column statistics accelerate the lookup queries needed when Information Marts query the raw vault to construct reporting views. dbt-vault, the open-source dbt package for DV2, supports Iceberg as a target platform, enabling automated DV2 model generation using templated SQL.

