Iceberg ACID Transactions

One of the foundational promises of the data lakehouse is bringing data warehouse reliability to data lake storage. Apache Iceberg achieves this by providing strict ACID guarantees for every table operation, even when the underlying storage is an eventually consistent system like Amazon S3.

Atomicity

Atomicity means that an operation either completes entirely or fails entirely; there is no intermediate state. Iceberg achieves this through its metadata-centric design. When a writer adds data, it writes the raw Parquet files and builds a new metadata tree, culminating in a new JSON metadata file. The data is only exposed to readers when the Iceberg Catalog successfully performs an atomic Compare-and-Swap (CAS) operation to point the table to the new JSON file. If a job fails halfway through writing data, those orphaned files are simply ignored because they were never committed to the catalog.

Consistency

Consistency ensures that a transaction brings the database from one valid state to another valid state. Iceberg enforces consistency through optimistic concurrency control. If two writers attempt to modify the same data simultaneously, the second writer's commit will be rejected by the catalog because the underlying table state changed. The second writer must reconcile its changes against the new state before trying again. This prevents data corruption or lost updates.

Isolation

Isolation determines how transaction integrity is visible to other users and systems. Iceberg provides Snapshot Isolation. When a query engine begins reading a table, it requests the current snapshot from the catalog. It then reads the data files associated strictly with that snapshot. Even if a massive ETL job is actively writing new data and committing new snapshots in the background, the reader is completely isolated and sees a consistent, unchanging view of the data as it existed the millisecond their query began.

Durability

Durability guarantees that once a transaction is committed, it will remain committed even in the case of a system failure. Iceberg achieves this by relying on the durability of the underlying object storage layer (like S3, ADLS, or GCS). Once the Parquet data files and JSON metadata files are written and the catalog pointer is updated, the data is redundantly stored across multiple physical availability zones by the cloud provider.

Atomicity

Consistency

Isolation

Durability

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone

Iceberg ACID Transactions

Atomicity

Consistency

Isolation

Durability

Related Articles

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone