Every snapshot in an Apache Iceberg table is associated with exactly one Manifest List. If the underlying Manifest Files are the detailed index of the data, the Manifest List is the "table of contents." It sits in the middle of Iceberg's metadata hierarchy, acting as a crucial optimization layer between the high-level JSON metadata file and the granular Avro manifest files.
Purpose and Structure
Like manifest files, the manifest list is stored in the Avro format. Its primary purpose is to list all the manifest files that make up the current snapshot. However, it does not just provide a list of URIs; it provides summary statistics about the contents of each manifest file.
For each manifest file it references, the manifest list records:
- The path and length of the manifest file.
- The
partition_spec_idthat the manifest file was written under. - Whether the manifest tracks
DATAfiles orDELETESfiles. - Statistics on the number of added, existing, and deleted files within that manifest.
- Partition Summaries: The upper and lower bounds for the partition columns of all the data files contained within that manifest.
Manifest Pruning
The partition summaries are what make the manifest list critical for performance. When a query planner receives a query with a partition filter (e.g., year = 2024), it first reads the single manifest list for the snapshot. It looks at the partition lower and upper bounds for each manifest file listed.
If a manifest file's partition summary indicates it only contains data from 2022 and 2023, the query planner can completely ignore that manifest file. It does not need to download the manifest file, nor does it need to scan the thousands of individual data file entries inside it. This process is called manifest pruning.
By preventing the engine from downloading and parsing irrelevant metadata, the manifest list ensures that query planning remains fast (typically well under a second) even for tables containing petabytes of data spread across millions of files and thousands of manifests.



