A data lake storing raw Parquet files in S3 is just organized file storage. Without a table format layer, there are no ACID transaction guarantees, no consistent view of the data for concurrent readers and writers, no schema enforcement, and no efficient mechanism for time travel or rollback. Open Table Formats are the specification layer that adds all of these capabilities above the raw data files without moving the files into a proprietary system.

The three major open table formats are Apache Iceberg, Delta Lake, and Apache Hudi. All three solve the same core problem: giving object-storage data files the reliability properties of a traditional database. Each took a different architectural approach and ended up with different strengths.

The Problem They Solve

Before open table formats, data lakes suffered from predictable problems. Without transaction control, a writer crashing midway through an update left the table in a partially written, inconsistent state. Without a versioning mechanism, there was no way to query the data as it existed at a point in the past. Schema changes required either full table rewrites or careful backward-compatibility management by hand. Partition management required analysts to know the physical partition structure and include it in their queries to avoid full scans.

Each of these problems has a real operational cost. Inconsistent table state after failed writes causes downstream pipeline failures. Lack of time travel makes debugging data quality issues harder. Manual schema management creates fragile pipelines that break on upstream changes. Open table formats address all of these through metadata management above the Parquet layer.

The Three Main Implementations

The Convergence Trend

By 2025, Apache Iceberg had achieved broad enough adoption that Delta Lake and Hudi both added compatibility layers for reading Iceberg-format tables or exposing their tables through Iceberg-compatible metadata. The industry is converging on Iceberg's catalog interface (the Iceberg REST Catalog specification) as the standard for engine-to-catalog communication, even for tables stored in other formats.

Master the Agentic Lakehouse

Start building today with free trials and authoritative resources.

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon
Apache Iceberg and Agentic AI

Apache Iceberg and Agentic AI

Buy on Amazon
Lakehouse Built for Everyone

Lakehouse Built for Everyone

Buy on Amazon