Row-Level Deletes are the foundation of the Merge-on-Read strategy in Apache Iceberg, introduced formally in the V2 Specification. Because Parquet and ORC data files sitting in cloud object storage are fundamentally immutable, deleting a single row used to require rewriting the entire file. Row-level deletes solve this problem by leaving the original data file untouched and instead writing a small, auxiliary "delete file" that tracks the invalidation.

Two Approaches to Deletion

Iceberg supports two different types of delete files, each optimizing for a different phase of the data lifecycle:

The Delete Lifecycle

In a mature lakehouse architecture, both delete mechanisms are often used in tandem. A streaming ingestion pipeline will continuously write Equality Deletes to keep latency as low as possible. In the background, a scheduled maintenance job (compaction) will occasionally wake up, read the Equality Deletes, and either rewrite the base data files to remove the rows permanently, or convert the Equality Deletes into Position Deletes to alleviate the read-time penalty for analytical engines like Dremio or Trino.

Impact on Upserts (Updates)

It is important to understand that in Iceberg, an "update" to a row is not a unique operation. An update is simply executed as a row-level delete of the old record, followed immediately by an insert of a new data file containing the updated record. The combination of these two operations within a single atomic commit creates the illusion of an in-place update for the end user.

Master the Agentic Lakehouse

Start building today with free trials and authoritative resources.

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon
Apache Iceberg and Agentic AI

Apache Iceberg and Agentic AI

Buy on Amazon
Lakehouse Built for Everyone

Lakehouse Built for Everyone

Buy on Amazon