In traditional databases, concurrent writers are managed using pessimistic locking. When a process wants to update a table, it requests a lock; any other process attempting to write must wait in a queue until the lock is released. While safe, this creates severe bottlenecks in a data lakehouse where massive, distributed compute engines (like Spark clusters) need to write data simultaneously. Apache Iceberg solves this using Optimistic Concurrency Control (OCC).

How Optimistic Concurrency Works

OCC operates on the assumption ("optimism") that conflicts are rare. Therefore, no writer places a lock on the data files or the table during the write phase. Here is how the process flows:

  1. Read State: Writer A and Writer B both read the current state of the table, noting that the active metadata file is v1.json.
  2. Write Data: Both writers independently do their heavy lifting, writing new Parquet data files to object storage. They do not block each other.
  3. Prepare Commit: Writer A prepares a new metadata file, v2a.json, which points to its new data. Writer B prepares v2b.json.
  4. Atomic Swap (The Check): Writer A finishes first and asks the Catalog to swap the table pointer from v1.json to v2a.json. The catalog sees the table is still at v1.json, so the swap succeeds. The table is now at v2a.json.
  5. Conflict and Retry: Writer B finishes milliseconds later and asks the Catalog to swap from v1.json to v2b.json. The catalog rejects the request because the current state is now v2a.json.

Intelligent Retry Mechanism

Instead of throwing an error and failing Writer B's job entirely, Iceberg's client library initiates an intelligent retry. Writer B pulls the new v2a.json state and analyzes the changes made by Writer A.

Benefits for the Lakehouse

Optimistic Concurrency allows Iceberg to achieve massive scale and throughput. Data ingestion pipelines, machine learning training jobs, and user-initiated merges can all operate on the same tables concurrently, maximizing the utilization of cloud compute resources without the rigid latency associated with distributed locking systems.

Master the Agentic Lakehouse

Start building today with free trials and authoritative resources.

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon
Apache Iceberg and Agentic AI

Apache Iceberg and Agentic AI

Buy on Amazon
Lakehouse Built for Everyone

Lakehouse Built for Everyone

Buy on Amazon