Iceberg Row-Level Deletes

Row-Level Deletes are the foundation of the Merge-on-Read strategy in Apache Iceberg, introduced formally in the V2 Specification. Because Parquet and ORC data files sitting in cloud object storage are fundamentally immutable, deleting a single row used to require rewriting the entire file. Row-level deletes solve this problem by leaving the original data file untouched and instead writing a small, auxiliary "delete file" that tracks the invalidation.

Two Approaches to Deletion

Iceberg supports two different types of delete files, each optimizing for a different phase of the data lifecycle:

Equality Deletes (Optimized for Writing): When a streaming engine like Flink receives a Change Data Capture (CDC) event, it often does not know where the record physically resides in the data lake. An equality delete simply records the logical condition (e.g., "Delete row where order_id = 999"). This makes writing incredibly fast, but shifts a heavy computational burden onto the read engine, which must now evaluate that condition against every row it scans.
Position Deletes (Optimized for Reading): A position delete records the absolute file URI and the specific row index of the deleted data. When an engine like Spark executes a batch `MERGE` statement, it scans the data first, identifies the locations, and writes a position delete file. During a query, the read engine simply uses the indexes to instantly skip the deleted rows, bypassing any complex logical evaluation.

The Delete Lifecycle

In a mature lakehouse architecture, both delete mechanisms are often used in tandem. A streaming ingestion pipeline will continuously write Equality Deletes to keep latency as low as possible. In the background, a scheduled maintenance job (compaction) will occasionally wake up, read the Equality Deletes, and either rewrite the base data files to remove the rows permanently, or convert the Equality Deletes into Position Deletes to alleviate the read-time penalty for analytical engines like Dremio or Trino.

Impact on Upserts (Updates)

It is important to understand that in Iceberg, an "update" to a row is not a unique operation. An update is simply executed as a row-level delete of the old record, followed immediately by an insert of a new data file containing the updated record. The combination of these two operations within a single atomic commit creates the illusion of an in-place update for the end user.

Two Approaches to Deletion

The Delete Lifecycle

Impact on Upserts (Updates)

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone

Iceberg Row-Level Deletes

Two Approaches to Deletion

The Delete Lifecycle

Impact on Upserts (Updates)

Related Articles

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone