Cross-Cloud Data Sharing

Cross-cloud data sharing is the capability to query data that resides in one cloud provider's object storage (e.g., AWS S3) from a query engine in a different cloud provider (e.g., Azure or GCP), without copying data between clouds. This is increasingly important as organizations adopt multi-cloud strategies, acquire companies running on different clouds, or partner with organizations on different cloud platforms.

Apache Iceberg as the Cross-Cloud Bridge

Apache Iceberg's architecture is cloud-agnostic by design. Iceberg tables in S3 use the same format as Iceberg tables in ADLS Gen2 or Google Cloud Storage. The Iceberg REST Catalog API standardizes how any engine connects to any catalog regardless of cloud provider. This means a Dremio Cloud engine on AWS can query an Iceberg table cataloged in a GCS-backed Polaris deployment, or a Spark cluster on Azure can read from an S3-backed Iceberg catalog, using standard Iceberg REST Catalog connectivity. The main practical challenge is cross-cloud data egress costs and latency, which make cross-cloud queries suitable for metadata-light, infrequent analytical use cases rather than high-volume operational pipelines that should co-locate data and compute.

Apache Iceberg as the Cross-Cloud Bridge

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Cross-Cloud Data Sharing

Apache Iceberg as the Cross-Cloud Bridge

Related Articles

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse