A data lake storing raw Parquet files in S3 is just organized file storage. Without a table format layer, there are no ACID transaction guarantees, no consistent view of the data for concurrent readers and writers, no schema enforcement, and no efficient mechanism for time travel or rollback. Open Table Formats are the specification layer that adds all of these capabilities above the raw data files without moving the files into a proprietary system.
The three major open table formats are Apache Iceberg, Delta Lake, and Apache Hudi. All three solve the same core problem: giving object-storage data files the reliability properties of a traditional database. Each took a different architectural approach and ended up with different strengths.
The Problem They Solve
Before open table formats, data lakes suffered from predictable problems. Without transaction control, a writer crashing midway through an update left the table in a partially written, inconsistent state. Without a versioning mechanism, there was no way to query the data as it existed at a point in the past. Schema changes required either full table rewrites or careful backward-compatibility management by hand. Partition management required analysts to know the physical partition structure and include it in their queries to avoid full scans.
Each of these problems has a real operational cost. Inconsistent table state after failed writes causes downstream pipeline failures. Lack of time travel makes debugging data quality issues harder. Manual schema management creates fragile pipelines that break on upstream changes. Open table formats address all of these through metadata management above the Parquet layer.
The Three Main Implementations
- Apache Iceberg is governed by the Apache Software Foundation. Its architecture uses a hierarchical metadata tree (snapshots pointing to manifest lists, which point to manifest files, which list data files). It was designed for engine neutrality from the start, and as of 2024-2025 it is supported by more engines than any other format, including Spark, Flink, Dremio, Trino, Snowflake, BigQuery, and others. Iceberg's hidden partitioning and partition evolution features are significant usability advantages.
- Delta Lake is governed by the Linux Foundation, with Databricks as the primary contributor. It uses a transaction log (the
_delta_logdirectory) of JSON and Parquet checkpoint files to track table state. It integrates most deeply with Apache Spark and the Databricks platform. Databricks' UniForm feature allows Delta tables to be read by Iceberg-compatible engines. - Apache Hudi is governed by the Apache Software Foundation. It has the strongest support for record-level upserts and deletes, making it a common choice for Change Data Capture pipelines that need to apply individual record changes at high frequency. Hudi's timeline-based architecture and flexible indexing strategies are distinct from Iceberg's snapshot approach.
The Convergence Trend
By 2025, Apache Iceberg had achieved broad enough adoption that Delta Lake and Hudi both added compatibility layers for reading Iceberg-format tables or exposing their tables through Iceberg-compatible metadata. The industry is converging on Iceberg's catalog interface (the Iceberg REST Catalog specification) as the standard for engine-to-catalog communication, even for tables stored in other formats.



