Data Vault 2.0 (DV2) is a data modeling methodology developed by Dan Linstedt that emphasizes auditability, flexibility to source system changes, and parallel loading. It is particularly well-suited for enterprise environments with strict regulatory requirements (banking, healthcare, government) where every piece of data must be traceable to its source system and every historical change must be preserved without exception.

The Three Core DV2 Components

Data Vault 2.0 and Apache Iceberg

Iceberg is an excellent physical storage layer for Data Vault models. The insert-only pattern of Hubs, Satellites, and Links maps naturally to Iceberg's append operations without requiring delete or update semantics. Iceberg's built-in partition pruning and column statistics accelerate the lookup queries needed when Information Marts query the raw vault to construct reporting views. dbt-vault, the open-source dbt package for DV2, supports Iceberg as a target platform, enabling automated DV2 model generation using templated SQL.

Master the Agentic Lakehouse

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon