The Agentic Lakehouse is not a single product but an architectural pattern composed of four distinct layers. Each layer addresses a specific capability gap that prevents traditional data lakehouses from serving autonomous AI agents effectively. The diagram below shows all four layers and how they interact.

Agentic Lakehouse Reference Architecture LAYER 0 — AI AGENTS & APPLICATIONS LangChain Agent Claude / GPT Agent LlamaIndex Agent Custom AI Workflow MCP / REST API LAYER 1 — AGENT INTERFACE (MODEL CONTEXT PROTOCOL) MCP Server (Dremio) Arrow Flight SQL / REST ODBC / JDBC (BI Tools) Governed SQL LAYER 2 — SEMANTIC LAYER + GOVERNED QUERY ENGINE Semantic Layer Metrics · Entities · Wikis Query Engine (Dremio) RBAC · ABAC · Reflections Governance Engine Row/Column Security · Audit Iceberg REST Catalog LAYER 3 — APACHE ICEBERG OPEN STORAGE (S3 / ADLS / GCS) Bronze Tables Raw · CDC · Ingestion Silver Tables Cleansed · Validated · dbt Gold Tables Kimball · Data Vault · Wide Iceberg Catalog Apache Polaris · Nessie · Glue agenticlakehouse.com — Free to use with attribution

Free to embed with attribution to agenticlakehouse.com

Understanding Each Layer

0

AI Agents & Applications

The consumers of the Agentic Lakehouse. LLM-based agents (LangChain, LlamaIndex, Claude, GPT-based), custom agentic workflows, and BI tools all sit at this layer. Each interacts with the platform through the Agent Interface layer rather than directly with storage or the query engine.

1

Agent Interface — Model Context Protocol (MCP)

The Model Context Protocol provides structured, typed function-call APIs for agent-to-platform communication. Instead of generating raw SQL strings, agents call MCP functions (list_datasets, get_schema, execute_query) and receive typed responses. BI tools connect via ODBC/JDBC or Arrow Flight SQL.

2

Semantic Layer + Governed Query Engine

The semantic layer provides business context (metric definitions, entity mappings, column descriptions) that grounds agent reasoning in verified business logic. The query engine (Dremio) enforces RBAC, ABAC, row-level security, and column masking transparently at query time.

3

Apache Iceberg Open Storage

All data is stored as Apache Iceberg tables in cloud object storage (S3, ADLS, GCS). The Medallion architecture organizes data into Bronze (raw), Silver (cleansed), and Gold (analytical) layers. Apache Polaris, Project Nessie, or AWS Glue serve as the Iceberg catalog, providing the REST Catalog API that all engines and the governance layer use to discover and access tables.

Build This Architecture Today

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon