Iceberg Schema Evolution

Iceberg Schema Evolution is the ability to alter a table's schema (add columns, drop columns, rename columns, reorder columns, or promote column data types) without rewriting the existing data files and without breaking existing queries or pipelines that reference the old schema. This is one of Iceberg's most operationally significant features, addressing a painful limitation of older data lake approaches where any schema change required a full table rewrite.

The mechanism that makes safe schema evolution possible is Iceberg's use of stable, immutable column IDs. Rather than tracking columns by their names (as Parquet's native format does), Iceberg assigns each column a unique integer ID at creation time. This ID never changes, even if the column is renamed. When the query engine reads an old Parquet file that has a column named price with ID 7, and the current schema shows column ID 7 is now named unit_price, the engine correctly maps the renamed column by ID rather than by name. Old data files remain readable without modification.

Supported Evolution Operations

Add column: New columns are added to the schema. Existing data files return null for the new column (or the column's default value in Spec v2 with a default value defined). No data files are rewritten.
Drop column: A column is removed from the current schema. Existing data files that contain the dropped column are still readable; the dropped column's data is simply ignored. No data files are rewritten.
Rename column: A column name changes. Because the underlying column ID is stable, both old and new data files continue to work correctly. Any query using the old name will fail (the name changed), but the data itself is intact.
Reorder columns: The logical order of columns in the schema can be changed. Because Parquet reads by column ID rather than position, reordering does not require rewriting files.
Type promotion: Certain type widening operations are safe: int to long, float to double, decimal precision widening. These can be applied without rewriting data. Narrowing (long to int) is not supported because it can cause data loss.

Spec v2 Required Fields

Iceberg Spec v2 introduced explicit required vs. optional field tracking. When a column is added with a default value (Spec v2 feature), the schema records that the column is required and specifies the default. Old data files that do not contain the column are handled by returning the default value at read time. This allows pipeline code to treat the column as non-nullable even for historical data that predates the column's addition.

Schema Evolution for AI Agents

Safe schema evolution is particularly valuable in AI agent contexts where the agent may be querying tables that change over time. If the underlying table schema changes (a column is renamed or a new field is added), Iceberg's evolution guarantees that historical data remains queryable under the new schema. An agent that caches a table's schema for context should still verify schema currency against the catalog before generating SQL for production queries.

Supported Evolution Operations

Spec v2 Required Fields

Schema Evolution for AI Agents

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone

Iceberg Schema Evolution

Supported Evolution Operations

Spec v2 Required Fields

Schema Evolution for AI Agents

Related Articles

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone