In the Apache Iceberg architecture, the Manifest File is the workhorse of query planning. It is the lowest level of the metadata hierarchy, sitting just above the actual Parquet, ORC, or Avro data files. While higher-level metadata files (like the Manifest List) track groups of files, the Manifest File provides the granular, file-by-file detail necessary to decide exactly which object storage blobs the compute engine needs to read.

Structure and Format

Manifest files are stored in the Avro format. Avro was chosen because it supports fast, schema-based sequential reads, making it ideal for the query planner to scan through thousands of file entries quickly. Each entry in a manifest file represents either a data file or a delete file (introduced in Spec v2).

For each data file, the manifest records:

Column-Level Statistics for Data Skipping

The most critical role of the manifest file is storing column-level statistics. For the columns in the data file, the manifest tracks:

These statistics are the secret to Iceberg's performance. When a user runs a query like SELECT * FROM sales WHERE amount > 1000, the query engine scans the manifest file, reads the upper_bounds for the amount column, and instantly skips any data file where the maximum amount is less than 1000. It does this without ever downloading or opening the Parquet files. This process, known as data skipping or file pruning, drastically reduces I/O and speeds up query execution.

Handling Type Evolution

Iceberg stores lower and upper bounds as serialized bytes rather than typed values. This is a deliberate design choice to support schema evolution. If a column's type is promoted (for example, from a 32-bit integer to a 64-bit long), the query engine infers the original type based on the length of the serialized bytes (4 bytes vs 8 bytes) to decode the historical bounds correctly, ensuring that data skipping continues to work across schema changes.

Master the Agentic Lakehouse

Start building today with free trials and authoritative resources.

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon
Apache Iceberg and Agentic AI

Apache Iceberg and Agentic AI

Buy on Amazon
Lakehouse Built for Everyone

Lakehouse Built for Everyone

Buy on Amazon