One of the most persistent operational challenges in managing a data lakehouse is the Small File Problem. While this issue affects all distributed storage systems (including raw data lakes and Delta Lake), it requires specific mitigation strategies when using Apache Iceberg.
Causes of the Small File Problem
The small file problem occurs when a table accumulates thousands (or millions) of tiny Parquet files - often just a few kilobytes or megabytes in size - instead of a few optimally sized files (typically between 128MB and 512MB). This usually happens due to:
- Streaming Ingestion: Real-time pipelines (like Apache Flink or Spark Structured Streaming) commit micro-batches every few seconds or minutes to ensure low data latency. Every commit creates at least one new Parquet file.
- Over-Partitioning: If a table is partitioned by a high-cardinality column (e.g., partitioning by `hour` when there is very little data per hour), even a massive batch job will write hundreds of tiny, fragmented files.
- Frequent CDC Updates: Continuous updates and deletes in a Merge-on-Read table create many small position or equality delete files.
Why is it a Problem?
Small files severely degrade query performance for two main reasons:
- Metadata Bloat: Iceberg must track every single file in its manifest tree. If a table consists of 100,000 tiny files instead of 100 large files, the Iceberg metadata itself becomes massive, causing the query engine to spend several seconds just parsing the manifest files before the actual data scan begins.
- I/O Overhead: Cloud object storage (like Amazon S3) thrives on reading large, sequential blocks of data. Opening a network connection to read a 10KB file takes almost the same amount of overhead as opening a connection to read a 128MB file. Reading thousands of small files results in network bottlenecks and throttled requests.
How to Solve It
Data engineers solve the small file problem primarily through Compaction (Rewriting Data Files). By running a scheduled job (e.g., via Spark SQL CALL catalog.system.rewrite_data_files('my_table')), Iceberg reads all the tiny, fragmented files and consolidates them into optimally sized Parquet blocks. Additionally, tweaking table properties to ensure a target file size (e.g., `write.target-file-size-bytes`) and avoiding overly granular partition strategies will prevent the problem from recurring too quickly.



