Federated Analytics is the ability of a query engine to execute SQL queries that span multiple, physically separate data sources simultaneously, returning a unified result set as if all the data lived in a single database. This is distinct from data consolidation approaches (like ETL into a central warehouse) because data stays in place at each source and is queried directly.
The Federated Query Lifecycle
When a federated query arrives at an engine like Dremio or Trino, the optimizer identifies which portions of the query reference which data sources. It then:
- Pushes source-specific filter predicates down to each individual source (a SQL push to PostgreSQL, a manifest scan to Iceberg, a REST call to a SaaS API)
- Retrieves only the filtered result sets from each source across the network
- Performs the join, aggregation, or union in the central query engine's memory using the results from all sources
Use Cases
Federated analytics unlocks several important patterns that were previously expensive or impossible without massive ETL investment:
- Operational + Historical Joins: Joining live OLTP data (current inventory from PostgreSQL) with historical Iceberg lakehouse data (sales trends from the past 3 years) in a single query.
- Cross-Cloud Analytics: Querying data that lives in AWS S3, Azure Data Lake, and Google Cloud Storage simultaneously without copying data between clouds.
- SaaS Data Integration: Joining Salesforce CRM data with your internal Iceberg revenue tables without building a nightly ETL pipeline.
Federated Analytics and AI Agents
Federated analytics is particularly valuable for agentic AI systems. An AI agent tasked with answering "why did US revenue decline last quarter?" needs to simultaneously access CRM data, financial records, and marketing attribution data across multiple systems. A federated query engine gives the agent a single SQL interface to all data sources, dramatically simplifying the agent's reasoning loop.

