Apache Arrow Flight SQL is a client-server protocol for interacting with SQL databases built on top of Apache Arrow Flight (an RPC framework built on gRPC) and the Apache Arrow columnar format. It was designed to replace legacy connectivity protocols like JDBC and ODBC for analytical workloads where data transfer performance is the primary bottleneck.
The Problem with JDBC and ODBC
JDBC and ODBC were designed decades ago for transactional, row-oriented databases. When a query engine (like Spark or Dremio) returns results to a client, traditional drivers convert the data from an internal columnar format into serialized row-based records. The client receives these records and then often converts them back into a columnar format for use with Python, R, or a BI tool. This round-trip serialization and deserialization can consume 60-90% of total query transfer time on large result sets.
How Flight SQL Works
Flight SQL maintains data in Apache Arrow's native in-memory columnar format throughout the entire query lifecycle. When a client submits a SQL query, the server streams results back as Arrow record batches directly over gRPC. Because the format is standardized, the client can consume these batches directly into Pandas DataFrames, Polars DataFrames, or any Arrow-compatible tool without any intermediate conversion. Benchmarks consistently show 20-50x throughput improvements over JDBC for analytical result sets.
Flight SQL in the Lakehouse
Arrow Flight SQL is natively supported by Dremio, allowing Python-based AI agents and data applications to submit SQL queries and receive results in Arrow format at maximum speed. DuckDB supports attaching Flight SQL endpoints as "remote tables," enabling federated queries across multiple Flight SQL services from within a single DuckDB session. As AI agents increasingly need to query large lakehouse tables rapidly, Flight SQL's combination of low latency and high throughput makes it the preferred connectivity protocol for agentic data workloads.

