dbt (Data Build Tool) is an open-source command-line tool and cloud platform that enables data engineers and analytics engineers to transform raw data in the lakehouse into analytics-ready models using SQL, with software engineering best practices like version control, testing, documentation, and CI/CD. dbt has become the de-facto standard for the "T" (transform) in modern ELT pipelines.

How dbt Works with Iceberg

dbt models are SQL SELECT statements that define how data should be transformed. dbt compiles these models and executes them against a target database or lakehouse engine, materializing the results as tables or views. The dbt-dremio and dbt-spark adapters allow dbt models to materialize as Iceberg tables. A dbt project for an Iceberg lakehouse might have:

dbt Testing for Data Quality

dbt includes built-in data tests that run after model materializations: not_null (no nulls in required columns), unique (no duplicate primary keys), accepted_values (categorical columns contain only valid values), and relationships (foreign keys reference valid primary keys in dimension tables). These tests run as SQL queries against the Iceberg tables and fail the dbt run if quality violations are found, preventing bad data from reaching BI consumers.

dbt and Iceberg Incremental Models

dbt's incremental materialization strategy is well-suited to Iceberg. Rather than rebuilding an entire fact table daily, an incremental model processes only new and changed records since the last run. With Iceberg's MERGE INTO support (available in the dbt-spark and dbt-dremio adapters' merge strategy), dbt can apply upserts atomically, updating changed records while inserting new ones in a single ACID transaction.

Master the Agentic Lakehouse

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon