To understand Apache Iceberg, you have to understand the specific engineering problem it was built to solve. Before Iceberg, the de facto standard for tracking large datasets on distributed storage was Apache Hive. Hive defined tables using a directory-based layout: a table was simply a folder in a filesystem (like HDFS or S3), and partitions were subfolders.
This folder-based abstraction worked passably for on-premise Hadoop clusters, but it collapsed at scale on cloud object storage. Object stores like S3 are not true filesystems; they are key-value stores. Running a command like `LIST /sales/year=2026/` across millions of files is computationally expensive and incredibly slow. Worse, because Hive had no concept of atomic, file-level transactions, if a job failed halfway through writing to a folder, readers would suddenly start pulling corrupt, partial data. There was no isolation, no consistency, and no safe way to mutate data concurrently.
Netflix, managing petabytes of operational data, hit the physical limits of the Hive architecture. Their solution was to abandon folder-based tracking entirely and build a format that tracks data at the file level. That project became Apache Iceberg.
The Core Abstraction: Metadata over Folders
Apache Iceberg is an open table format for huge analytic datasets. It provides a layer of metadata that sits between your compute engines (like Spark, Trino, or Dremio) and your raw data files (typically Parquet, ORC, or Avro) stored in object storage.
Instead of defining a table by asking "What files are in this folder?", Iceberg defines a table by reading a strict hierarchy of metadata files that maintain an exact list of every single data file that currently belongs to the table. This architectural shift enables warehouse-grade features on open data lakes.
1. The Metadata Tree
The Iceberg architecture is defined by a tree of metadata. When a query engine reads an Iceberg table, it traverses this tree from the top down:
- The Catalog: The highest level. The catalog (such as Apache Polaris, Project Nessie, or AWS Glue) holds a pointer to the current Metadata JSON file for the table. It is the central concurrency bottleneck that ensures atomic swaps.
- Metadata JSON: This file defines the current state of the table. It contains the schema, the partition spec, and an array of historical snapshots, along with an explicit pointer to the current snapshot.
- Manifest Lists: A snapshot points to a Manifest List. This is an Avro file that lists all the Manifest Files that make up the current state of the table. Crucially, it includes partition bounds for each manifest, allowing the engine to skip reading manifests that don't match the query filter.
- Manifest Files: Also Avro files, these track the actual data files (Parquet). They store column-level min/max statistics, file sizes, and row counts, enabling massive data skipping before the engine ever touches a Parquet file.
Key Architectural Benefits
Optimistic Concurrency and ACID Transactions
Because Iceberg tracks state via immutable metadata files, writers never modify existing files. When an ingestion job runs, it writes new Parquet data files, creates new Manifest files, and generates a new Metadata JSON file representing the proposed new state.
To commit, the writer asks the Catalog to atomically swap the table's pointer from the old Metadata JSON to the new one. If two writers attempt to commit at the exact same millisecond, Iceberg uses optimistic concurrency control: the first writer succeeds, and the second writer catches an exception, reads the new current state, and retries its commit on top of the new state. This guarantees serializable isolation without locking.
Schema Evolution
In the Hive era, changing a column name or type often required rewriting the entire historical dataset—an expensive and dangerous operation. Iceberg implements true schema evolution. Columns are tracked by unique IDs, not by their names. If you rename a column, Iceberg simply updates the mapping in the Metadata JSON. Old data files are read seamlessly using the new schema definition, requiring zero data rewrites.
Hidden Partitioning
Partitioning data by date (e.g., extracting the month from a timestamp) is standard practice. Historically, users had to provide the exact partition column in their SQL queries (e.g., `WHERE date >= '2026-01-01' AND partition_month = '01'`). If the user forgot the partition clause, the engine would trigger a devastating full-table scan.
Iceberg introduces Hidden Partitioning. The table's partition spec is defined in the metadata, not the user's queries. A user simply queries the timestamp column (`WHERE timestamp >= '2026-01-01'`), and Iceberg automatically translates this filter to prune files based on the underlying partition layout. If the data engineering team later changes the partition granularity from monthly to daily (Partition Evolution), the metadata handles the transition seamlessly across both old and new data, and the user's queries do not need to change.
Iceberg in the Agentic Lakehouse
As organizations shift towards Agentic Analytics—where AI agents autonomously query enterprise data to answer complex business questions—Iceberg becomes a mandatory architectural component.
AI agents require reliable, deterministic access to data. They cannot navigate corrupt Hive partitions or deal with locking issues caused by concurrent writes. By standardizing on Apache Iceberg, agents interact with a consistent, transactional metadata layer. When combined with a semantic layer (like Dremio) and an open catalog (like Apache Polaris), Iceberg provides the governed, high-performance foundation required for autonomous AI execution over the data lake.