Ephemeral compute clusters are compute resources that exist only for the duration of a specific workload. Rather than maintaining always-on infrastructure, an ephemeral cluster is provisioned at job start, performs its work, and is completely terminated after completion. This model is central to modern lakehouse cost optimization because the open Iceberg storage layer decouples data persistence from compute lifetime.

Why Ephemerality Is Possible

Ephemeral clusters are only viable because of compute-storage separation. If data were stored on the compute nodes (the "shared-everything" model of traditional databases), destroying the nodes would destroy the data. With Iceberg on S3, the data persists durably in object storage regardless of whether any compute engine is running. A Spark cluster can be spun up for a 2-hour nightly ETL job and terminated immediately after, with all results safely written to Iceberg. The next Spark cluster, started a week later, finds all the data exactly where the previous cluster left it.

Managed Ephemeral Compute

Cloud services like AWS Glue, Amazon EMR Serverless, Azure HDInsight on demand, and Databricks SQL Serverless all implement ephemeral compute transparently. Users submit jobs or queries, and the platform provisions compute, executes the work, and scales to zero automatically. The user pays only for actual compute-seconds consumed, with no idle costs.

Ephemeral Compute and Apache Iceberg

Iceberg's ACID commit model is critical for ephemeral compute safety. If an ephemeral cluster fails mid-job (for example, a spot instance interruption on AWS), Iceberg's uncommitted snapshot is simply discarded. The table remains in its last-committed, consistent state. No data corruption occurs, and the job can be safely re-run on a new ephemeral cluster without risk of duplicate writes or partial data.

Master the Agentic Lakehouse

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon