Cloud-Native Data Warehouse

The Cloud-Native Data Warehouse represents the first generation of truly cloud-built analytical platforms. Unlike on-premise warehouses (Teradata, Oracle) that were lifted-and-shifted to virtual machines in the cloud, cloud-native warehouses were architected from the ground up for cloud infrastructure. They separated compute from storage internally, offered pay-per-query or pay-per-second billing, eliminated all cluster management from the user's operational responsibility, and delivered dramatically better price-performance ratios than the systems they replaced.

Snowflake (founded 2012, generally available 2015) and Google BigQuery (launched 2011) pioneered this category. Both implemented disaggregated storage with elastic compute, handled all infrastructure scaling automatically, and offered a familiar SQL interface with no specialized operational skills required. Their success ended the era of on-premise warehouse appliances for most enterprise analytics use cases.

The Architecture

Cloud-native data warehouses store data in proprietary columnar formats in cloud object storage. This is a form of compute-storage separation, but the storage format is controlled by the warehouse vendor. Users load data in (via COPY commands or native connectors), the warehouse stores it in its proprietary format, and queries run against that proprietary store. Data is accessible from outside the warehouse only through the warehouse's own APIs, export features, or (more recently) native open format support like Snowflake's Iceberg Tables feature.

The query engines in these platforms are mature, highly optimized MPP systems. Snowflake's virtual warehouse concept allows multiple isolated compute clusters to query the same storage concurrently, which was a significant workload isolation innovation. BigQuery's slot-based resource model abstracts individual machines entirely, allocating compute capacity to query execution in a way that is invisible to the user.

Where the Open Lakehouse Fits

The primary limitations of cloud-native data warehouses are vendor lock-in and cost at scale for unstructured or semi-structured data. All data must live inside the warehouse's storage layer to be queryable at full performance. This creates a compulsion to move all data into the warehouse, which is expensive for large raw datasets that only a fraction of queries actually touch, and it makes it difficult to use open-source ML and AI tools that work natively with files in object storage.

The open Data Lakehouse (Apache Iceberg on S3 with Dremio as the query engine) retains the cloud-native warehouse's compute-storage separation and elastic scaling model while replacing the proprietary storage format with open Parquet files. Data stored in Iceberg format is readable by any engine that supports the Iceberg spec, with no vendor lock-in. This openness comes with slightly higher operational responsibility for table management (compaction, snapshot expiration) that the proprietary warehouses handle automatically.