Every committed write operation against an Apache Iceberg table produces a new snapshot: an immutable, self-contained record of the complete set of data files that constitute the table at that point in time. Snapshots are the foundation of Iceberg's time travel capability, its safe concurrent read semantics, and its rollback mechanism. They are not copies of the data; they are references to the data files, recorded in a manifest list stored in the metadata layer.

Each snapshot has a unique snapshot ID (a long integer), a timestamp of when the commit was completed, a summary of what changed (files added, files deleted, number of records affected), and a reference to the snapshot's manifest list. The metadata file maintains the full sequence of snapshots, forming the table's complete write history.

Snapshot Isolation for Concurrent Reads

Iceberg provides snapshot isolation: every read operation is tied to the current snapshot at the moment the query starts. If a writer commits a new snapshot while a long-running query is in progress, the query continues reading from its original snapshot. The new data files added by the writer are invisible to the query in progress. This is serializable isolation behavior that eliminates dirty reads and non-repeatable reads without any locking.

This property matters significantly for AI agent workloads. An agent running a multi-step analytical investigation that spans several minutes will see a consistent view of the table throughout its work, even if a pipeline is writing new data concurrently. The agent does not need to worry about table state changing mid-investigation.

Snapshot-Based Operations

Several Iceberg operations work at the snapshot level:

Snapshot Cost Considerations

Snapshots accumulate over time. A table that receives 10 pipeline writes per day will accumulate 300 snapshots per month. Each snapshot maintains references to all the data files active in that snapshot. Until expire_snapshots runs and removes old snapshots (and any data files exclusively referenced by those old snapshots), the full history of files is retained in object storage. Most organizations set a retention window of 7-30 days, balancing debugging flexibility against storage cost.

Master the Agentic Lakehouse

Start building today with free trials and authoritative resources.

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon
Apache Iceberg and Agentic AI

Apache Iceberg and Agentic AI

Buy on Amazon
Lakehouse Built for Everyone

Lakehouse Built for Everyone

Buy on Amazon