Zero-ETL

Every data pipeline is a failure point. It has a maintenance burden, a latency tax, and a storage cost for the intermediate copy it creates. The Zero-ETL philosophy challenges the assumption that data must be moved into the analytical system before it can be queried. Instead, it asks: can the query engine reach the data where it already lives?

Zero-ETL is not a single technology. It describes a design philosophy realized through two distinct technical approaches: federated query and native database replication.

Approach 1: Federated Query

In the federated query approach, the analytical execution engine (like Dremio) maintains live source connectors to operational databases (PostgreSQL, MySQL, MongoDB), SaaS applications (Salesforce, Marketo), and other cloud data stores. When an analyst or AI agent queries data from these sources, the engine translates the SQL into the source system's native query language, retrieves only the columns and rows required by the predicate, and returns the results directly. No staging copy is created.

This approach is ideal for low-volume lookups against operational systems where data freshness within seconds matters. It is not appropriate for large-scale aggregations over billions of rows in a source database, as the query load would interfere with the operational system's performance.

Approach 2: Native Database Integrations

Several cloud database vendors offer native Zero-ETL integrations that automatically and continuously sync data into an analytical destination without requiring a separate ETL pipeline. Amazon Aurora's native Zero-ETL integration with Amazon Redshift is a well-known example. The database engine itself manages the replication, eliminating the ETL middleware layer entirely while still providing a copy in the analytical system for heavy aggregation workloads.

Zero-ETL and the Lakehouse

The Lakehouse context adds a third Zero-ETL mechanism: writing operational databases directly to Iceberg via Change Data Capture. Tools like Debezium stream database change events to Kafka, and a Flink job writes those events as Iceberg rows with near-zero lag. The operational source never needs to export data; the change stream flows continuously into the lakehouse. AI agents querying the Iceberg table see data that is seconds old rather than waiting for a nightly batch pipeline to run.

Approach 1: Federated Query

Approach 2: Native Database Integrations

Zero-ETL and the Lakehouse

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone

Zero-ETL

Approach 1: Federated Query

Approach 2: Native Database Integrations

Zero-ETL and the Lakehouse

Related Articles

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone