Iceberg Optimistic Concurrency

In traditional databases, concurrent writers are managed using pessimistic locking. When a process wants to update a table, it requests a lock; any other process attempting to write must wait in a queue until the lock is released. While safe, this creates severe bottlenecks in a data lakehouse where massive, distributed compute engines (like Spark clusters) need to write data simultaneously. Apache Iceberg solves this using Optimistic Concurrency Control (OCC).

How Optimistic Concurrency Works

OCC operates on the assumption ("optimism") that conflicts are rare. Therefore, no writer places a lock on the data files or the table during the write phase. Here is how the process flows:

Read State: Writer A and Writer B both read the current state of the table, noting that the active metadata file is v1.json.
Write Data: Both writers independently do their heavy lifting, writing new Parquet data files to object storage. They do not block each other.
Prepare Commit: Writer A prepares a new metadata file, v2a.json, which points to its new data. Writer B prepares v2b.json.
Atomic Swap (The Check): Writer A finishes first and asks the Catalog to swap the table pointer from v1.json to v2a.json. The catalog sees the table is still at v1.json, so the swap succeeds. The table is now at v2a.json.
Conflict and Retry: Writer B finishes milliseconds later and asks the Catalog to swap from v1.json to v2b.json. The catalog rejects the request because the current state is now v2a.json.

Intelligent Retry Mechanism

Instead of throwing an error and failing Writer B's job entirely, Iceberg's client library initiates an intelligent retry. Writer B pulls the new v2a.json state and analyzes the changes made by Writer A.

If Writer A simply appended data to a completely different partition, Writer B realizes there is no logical conflict. It safely rebases its own metadata changes on top of v2a.json, creating v3.json, and successfully commits without needing to rewrite any of its Parquet data.
If there is a true conflict (e.g., both writers attempted to delete the exact same row), Iceberg will fail Writer B's job to protect data integrity, ensuring consistent ACID semantics.

Benefits for the Lakehouse

Optimistic Concurrency allows Iceberg to achieve massive scale and throughput. Data ingestion pipelines, machine learning training jobs, and user-initiated merges can all operate on the same tables concurrently, maximizing the utilization of cloud compute resources without the rigid latency associated with distributed locking systems.

How Optimistic Concurrency Works

Intelligent Retry Mechanism

Benefits for the Lakehouse

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone

Iceberg Optimistic Concurrency

How Optimistic Concurrency Works

Intelligent Retry Mechanism

Benefits for the Lakehouse

Related Articles

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone