AI Query Engine

A traditional SQL engine takes a structured query and returns a result set. That contract has not changed since the 1970s. An AI Query Engine extends this contract in both directions: it accepts unstructured natural language as input and can return synthesized, contextually enriched output. The query engine remains the execution workhorse, but its interface and output capabilities are substantially broader.

The term is sometimes used loosely to describe any database that integrates with an LLM. In practice, a genuine AI Query Engine satisfies a more specific set of engineering criteria.

Native AI Functions

A core requirement of an AI Query Engine is the ability to invoke AI models directly within SQL. Dremio exposes three native AI SQL functions that can be called inline within a standard SELECT statement:

AI_GENERATE - generates structured output from unstructured inputs, such as extracting fields from free-text or documents
AI_CLASSIFY - classifies input data into categories based on a prompt, enabling sentiment tagging, topic labeling, or intent detection directly in SQL
AI_COMPLETE - generates text responses, summaries, or descriptions from input data using an LLM endpoint

These functions embed AI processing into SQL without requiring a separate Python pipeline. An analyst can write a single SELECT customer_id, AI_CLASSIFY(support_notes, 'sentiment') FROM support_tickets and receive classification results alongside structured columns, all within one query execution. Dremio also ships a built-in AI agent and an MCP (Model Context Protocol) server that allows external AI assistants to connect to the lakehouse directly and query it using the semantic layer.

Schema-Aware Context Provision

When an AI agent sends a natural language question to the query engine, the engine must be able to self-describe its schema landscape. A schema-aware AI Query Engine exposes a catalog introspection API that returns table descriptions, column types, sample values, and business metadata tags. The AI agent reads this context before generating SQL, dramatically reducing the likelihood of a hallucinated table name or column reference.

Federated Execution

An AI Query Engine in the lakehouse context rarely reads from a single source. It federate queries across Apache Iceberg tables in object storage, live operational databases, and cloud data warehouses simultaneously. The AI agent does not need to know where each dataset physically lives. It queries through the unified catalog, and the engine routes the sub-queries to the appropriate source, merges the results, and returns a single coherent response.

High-Throughput Protocol Support

AI agents often require low-latency, high-throughput data access that traditional JDBC connections cannot support at scale. Modern AI Query Engines expose Apache Arrow Flight SQL endpoints. Arrow Flight uses columnar binary serialization over gRPC, which can transfer large result sets to an AI agent's Python process an order of magnitude faster than a JDBC result set. This protocol support is what makes large-scale agentic data analysis practically feasible within a reasonable response window.

Native AI Functions

Schema-Aware Context Provision

Federated Execution

High-Throughput Protocol Support

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone

AI Query Engine

Native AI Functions

Schema-Aware Context Provision

Federated Execution

High-Throughput Protocol Support

Related Articles

Master the Agentic Lakehouse

Start Your Free Dremio Trial

Architecting an Apache Iceberg Lakehouse

The AI Lakehouse

Apache Iceberg and Agentic AI

Lakehouse Built for Everyone