ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) both move data from source systems into an analytical destination. The difference is the order of the transform step, and that ordering difference has architectural consequences that reach from pipeline complexity to cost to data quality.
ETL: Transform Before Loading
In the ETL pattern, data is extracted from the source, cleaned and transformed by a dedicated middleware engine (traditionally an on-premise ETL tool like Informatica or IBM DataStage), and only then loaded into the destination database in its final, clean form. The destination database only ever sees pre-processed data.
The primary problem with ETL is that the raw source data is consumed and discarded in the transformation step. If a business requirement changes and analysts need a field that was filtered out three months ago, the data is gone. Re-running the pipeline from source might not be possible if the source system does not retain history. ETL also required expensive dedicated middleware servers to perform the transformation compute.
ELT: Load Raw, Transform in Place
ELT inverts the order. Data is extracted from the source and loaded immediately into the analytical destination in raw form, with no transformation applied during transit. Transformation logic then runs inside the destination system using the destination's own compute engine. In a lakehouse context, this means raw Parquet files land in the bronze tier of an Iceberg lakehouse, and dbt or Spark transformation jobs run inside Dremio or Spark to produce the silver and gold tier tables.
ELT became practical when cloud compute became cheap enough that running transformations inside a massively parallel query engine cost less than maintaining dedicated middleware servers. The raw data is always available in the bronze tier, so re-processing with updated business logic is always possible.
The Lakehouse and ELT
The Data Lakehouse is built for ELT. The Medallion Architecture (bronze, silver, gold tiers as separate Iceberg table namespaces) is a direct implementation of ELT principles. Engineers load raw data once, then run successive transformation queries to produce higher-quality derived tables. At each tier, the output is an Iceberg table that is independently queryable, auditable via snapshots, and accessible to any compatible query engine.
For AI agents, this means the raw source data is always available for retrieval alongside the curated analytical assets. An agent investigating a data quality anomaly can query the bronze tier to examine the exact raw records that produced a suspicious gold-tier aggregate, using Iceberg time travel to inspect the state of both tables at the exact moment the anomaly occurred.



