Merge-on-Read (MoR) is a data update strategy introduced in Apache Iceberg Spec v2. It is designed to solve a fundamental problem in data lakes: how to handle frequent updates and deletes without incurring the massive performance penalty of rewriting multi-megabyte Parquet files for every small change.
How Merge-on-Read Works
Under a MoR strategy, when a row is updated or deleted, the original data file containing that row is left untouched. Instead, Iceberg writes a new, small "delete file" that records the deletion. If the operation is an update, Iceberg writes the delete file to invalidate the old row, and writes a new data file containing the new row.
The "Merge" happens at read time. When a query engine reads the table, it must read the base data files and the associated delete files, merging them on the fly to produce the correct, current state of the table. It effectively subtracts the deleted rows from the base data before returning the result to the user.
When to Use Merge-on-Read
MoR optimizes for write performance at the expense of read performance. It is the architectural choice for workloads where write latency must be kept low, such as:
- Change Data Capture (CDC): Applying frequent, row-level updates from a transactional database (like Postgres or MySQL) into the lakehouse.
- Streaming Ingestion: Pipelines that need to upsert data continuously without blocking to rewrite large files.
- High-Frequency Updates: Workloads where the cost of write amplification (rewriting a 500MB file to change one 50-byte row) is unacceptable.
The Need for Compaction
The trade-off of MoR is read amplification. As updates accumulate, the number of delete files grows. At read time, the query engine must apply all these delete files, which slows down query execution. To maintain analytical performance, MoR tables require regular compaction. A scheduled maintenance job periodically reads the base files and delete files, physically merges them, and writes out fresh, clean data files (deleting the old ones), effectively resetting the read performance penalty.



