Compaction is an essential table maintenance operation in Apache Iceberg. Over time, as data is ingested (especially via streaming or micro-batches) and updated, the physical layout of an Iceberg table can degrade, leading to slower query times. Compaction restructures this data in the background to restore optimal performance.

Why is Compaction Necessary?

There are two primary reasons a table requires compaction:

Compaction Strategies

Iceberg supports different strategies for rewriting data files during compaction, allowing engineers to balance maintenance cost against read performance:

1. Bin-pack Compaction

The simplest and fastest strategy. Bin-packing takes multiple small data files and combines them into fewer, target-sized data files (e.g., merging ten 25MB files into one 250MB file). It does not change the order of the data within the files. It is the cheapest way to solve the small file problem.

2. Sort Compaction

A more expensive, but highly effective strategy. Sort compaction physically reorganizes the data within the files based on specific columns (e.g., sorting a sales table by customer_id). This ensures that similar data is grouped tightly together. As a result, the column-level min/max statistics stored in the Iceberg Manifest Files become highly precise. When a query filters by that column, the engine can "skip" large chunks of the table entirely, drastically reducing query times. Advanced multi-column sorting (like Z-ordering) can be used to optimize for queries that filter on multiple columns simultaneously.

Applying Deletes

Regardless of the strategy used, the compaction process reads the existing data files, applies any pending equality or position delete files, and writes out fresh, clean Parquet files. After the compaction job commits, the query engine no longer has to process those delete files on the fly.

In modern lakehouse architectures, compaction is often handled automatically by managed services or automated control planes (like Dremio's autonomous table optimization or AWS Glue), freeing data engineers from writing manual maintenance scripts.

Master the Agentic Lakehouse

Start building today with free trials and authoritative resources.

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon
Apache Iceberg and Agentic AI

Apache Iceberg and Agentic AI

Buy on Amazon
Lakehouse Built for Everyone

Lakehouse Built for Everyone

Buy on Amazon