In the Apache Iceberg architecture, the Manifest File is the workhorse of query planning. It is the lowest level of the metadata hierarchy, sitting just above the actual Parquet, ORC, or Avro data files. While higher-level metadata files (like the Manifest List) track groups of files, the Manifest File provides the granular, file-by-file detail necessary to decide exactly which object storage blobs the compute engine needs to read.
Structure and Format
Manifest files are stored in the Avro format. Avro was chosen because it supports fast, schema-based sequential reads, making it ideal for the query planner to scan through thousands of file entries quickly. Each entry in a manifest file represents either a data file or a delete file (introduced in Spec v2).
For each data file, the manifest records:
- Status: Whether the file was
ADDEDin the current snapshot,EXISTINGfrom a previous snapshot, orDELETEDin the current snapshot. - File location: The exact absolute URI to the Parquet/ORC file in object storage.
- Format and Metrics: The file format, file size in bytes, and the total record count.
- Partition Data: The specific partition values associated with the file.
Column-Level Statistics for Data Skipping
The most critical role of the manifest file is storing column-level statistics. For the columns in the data file, the manifest tracks:
lower_boundsandupper_bounds(min and max values)value_counts(total values)null_value_countsandnan_value_counts
These statistics are the secret to Iceberg's performance. When a user runs a query like SELECT * FROM sales WHERE amount > 1000, the query engine scans the manifest file, reads the upper_bounds for the amount column, and instantly skips any data file where the maximum amount is less than 1000. It does this without ever downloading or opening the Parquet files. This process, known as data skipping or file pruning, drastically reduces I/O and speeds up query execution.
Handling Type Evolution
Iceberg stores lower and upper bounds as serialized bytes rather than typed values. This is a deliberate design choice to support schema evolution. If a column's type is promoted (for example, from a 32-bit integer to a 64-bit long), the query engine infers the original type based on the length of the serialized bytes (4 bytes vs 8 bytes) to decode the historical bounds correctly, ensuring that data skipping continues to work across schema changes.



