The Metadata File is the root of an Apache Iceberg table. Stored in JSON format, this file contains the comprehensive definition of the table. When a query engine asks the Catalog for a table, the Catalog responds with the URI of the table's current metadata file. From there, the query engine has everything it needs to plan and execute the query.

What is in the Metadata File?

A new metadata file is created every time the table state changes - whether data is added, the schema is altered, or a table property is modified. The file contains several critical arrays and identifiers:

The Atomic Commit Process

The metadata file is the linchpin of Iceberg's ACID transactions. Because the file is written in JSON and contains the entire table state, a commit operation in Iceberg simply means writing a new JSON metadata file and asking the Catalog to atomically swap its pointer from the old file to the new file.

If two writers try to commit at the same time, they both read the current metadata file, create their own new metadata files, and attempt the atomic swap. The Catalog enforces Optimistic Concurrency Control: the first writer succeeds, while the second writer's swap fails. The second writer must then re-read the newly updated metadata, re-apply its changes, and try again. This ensures that the table state never becomes corrupted or inconsistent.

Master the Agentic Lakehouse

Start building today with free trials and authoritative resources.

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon
Apache Iceberg and Agentic AI

Apache Iceberg and Agentic AI

Buy on Amazon
Lakehouse Built for Everyone

Lakehouse Built for Everyone

Buy on Amazon