A traditional SQL engine takes a structured query and returns a result set. That contract has not changed since the 1970s. An AI Query Engine extends this contract in both directions: it accepts unstructured natural language as input and can return synthesized, contextually enriched output. The query engine remains the execution workhorse, but its interface and output capabilities are substantially broader.

The term is sometimes used loosely to describe any database that integrates with an LLM. In practice, a genuine AI Query Engine satisfies a more specific set of engineering criteria.

Native AI Functions

A core requirement of an AI Query Engine is the ability to invoke AI models directly within SQL. Dremio exposes three native AI SQL functions that can be called inline within a standard SELECT statement:

These functions embed AI processing into SQL without requiring a separate Python pipeline. An analyst can write a single SELECT customer_id, AI_CLASSIFY(support_notes, 'sentiment') FROM support_tickets and receive classification results alongside structured columns, all within one query execution. Dremio also ships a built-in AI agent and an MCP (Model Context Protocol) server that allows external AI assistants to connect to the lakehouse directly and query it using the semantic layer.

Schema-Aware Context Provision

When an AI agent sends a natural language question to the query engine, the engine must be able to self-describe its schema landscape. A schema-aware AI Query Engine exposes a catalog introspection API that returns table descriptions, column types, sample values, and business metadata tags. The AI agent reads this context before generating SQL, dramatically reducing the likelihood of a hallucinated table name or column reference.

Federated Execution

An AI Query Engine in the lakehouse context rarely reads from a single source. It federate queries across Apache Iceberg tables in object storage, live operational databases, and cloud data warehouses simultaneously. The AI agent does not need to know where each dataset physically lives. It queries through the unified catalog, and the engine routes the sub-queries to the appropriate source, merges the results, and returns a single coherent response.

High-Throughput Protocol Support

AI agents often require low-latency, high-throughput data access that traditional JDBC connections cannot support at scale. Modern AI Query Engines expose Apache Arrow Flight SQL endpoints. Arrow Flight uses columnar binary serialization over gRPC, which can transfer large result sets to an AI agent's Python process an order of magnitude faster than a JDBC result set. This protocol support is what makes large-scale agentic data analysis practically feasible within a reasonable response window.

Master the Agentic Lakehouse

Start building today with free trials and authoritative resources.

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon
Apache Iceberg and Agentic AI

Apache Iceberg and Agentic AI

Buy on Amazon
Lakehouse Built for Everyone

Lakehouse Built for Everyone

Buy on Amazon