dbt (Data Build Tool) is an open-source command-line tool and cloud platform that enables data engineers and analytics engineers to transform raw data in the lakehouse into analytics-ready models using SQL, with software engineering best practices like version control, testing, documentation, and CI/CD. dbt has become the de-facto standard for the "T" (transform) in modern ELT pipelines.
How dbt Works with Iceberg
dbt models are SQL SELECT statements that define how data should be transformed. dbt compiles these models and executes them against a target database or lakehouse engine, materializing the results as tables or views. The dbt-dremio and dbt-spark adapters allow dbt models to materialize as Iceberg tables. A dbt project for an Iceberg lakehouse might have:
- Staging models: Lightweight transformations of Bronze Iceberg tables (renaming columns, casting types, filtering invalid rows)
- Intermediate models: Business logic transformations combining multiple staging tables
- Mart models: Gold-layer Kimball fact and dimension tables consumed by BI tools
dbt Testing for Data Quality
dbt includes built-in data tests that run after model materializations: not_null (no nulls in required columns), unique (no duplicate primary keys), accepted_values (categorical columns contain only valid values), and relationships (foreign keys reference valid primary keys in dimension tables). These tests run as SQL queries against the Iceberg tables and fail the dbt run if quality violations are found, preventing bad data from reaching BI consumers.
dbt and Iceberg Incremental Models
dbt's incremental materialization strategy is well-suited to Iceberg. Rather than rebuilding an entire fact table daily, an incremental model processes only new and changed records since the last run. With Iceberg's MERGE INTO support (available in the dbt-spark and dbt-dremio adapters' merge strategy), dbt can apply upserts atomically, updating changed records while inserting new ones in a single ACID transaction.

