Vectorized Query Execution is the CPU-level optimization technique that underpins the performance of modern analytical databases and query engines, including DuckDB, Apache DataFusion, Dremio, and Databricks Photon. Understanding why it matters requires understanding how traditional, row-based execution models waste CPU resources.
The Problem with Row-Based Execution
Traditional database execution models (the "Volcano" or "Iterator" model) process data row by row. To compute the SUM of a column across 100 million rows, the engine calls a "get next row" function 100 million times, extracting individual values and accumulating them. Each function call introduces overhead, and the CPU's branch predictor and instruction cache work inefficiently when hopping between diverse data types within each row.
How Vectorized Execution Works
Vectorized execution fundamentally changes this model. Instead of processing one row at a time, the engine processes large batches (vectors) of a single column at once. To compute a SUM, the engine pulls an entire batch of 1,024 integer values for the target column directly into the CPU's L1 or L2 cache, then executes the addition in tight, highly predictable loops. Modern CPUs use SIMD (Single Instruction, Multiple Data) instructions to evaluate multiple values per CPU clock cycle in this predictable, tight loop.
Why Columnar Formats Enable Vectorization
This is precisely why columnar file formats like Apache Parquet are so critical to lakehouse performance. In Parquet, all values for a single column are stored contiguously on disk. When the engine reads a column for analysis, it can load a large contiguous block of homogeneous data directly into the CPU cache, perfectly aligned with vectorized execution's requirement for contiguous, same-type arrays. A row-based storage format (like CSV or traditional RDBMS heap files) would require the engine to skip over irrelevant columns to extract the values it needs, destroying cache efficiency.
Engines Using Vectorized Execution
All major modern query engines have adopted vectorized execution: DuckDB uses it natively in-process, Apache DataFusion uses it with Arrow batches, Dremio uses it with its Apache Arrow execution runtime, and Databricks Photon adds vectorized C++ execution on top of Spark's query planning.

