Query pushdown is the query optimization technique of evaluating filter predicates and column projections as early as possible in the query execution pipeline, ideally at the point of data reading, to minimize the amount of data that flows through the system. It is one of the single most impactful performance optimizations in data lakehouse architectures.

Predicate Pushdown

Consider the query SELECT name FROM orders WHERE country = 'US' AND year = 2024. Without predicate pushdown, a naive engine would read all rows from the orders table, ship them to the computing layer, and then apply the country and year filters. With predicate pushdown, the engine pushes those filters directly to the Parquet file reader, which uses:

Projection Pushdown

Projection pushdown (also called column pruning) means only reading the columns needed by the query from the Parquet file. Since Parquet stores data column-by-column, reading a query that references 3 of 100 columns only requires reading approximately 3% of the file's data bytes. Combined with predicate pushdown, a well-structured Iceberg table query can scan less than 0.1% of the underlying data to answer a specific analytical question.

Master the Agentic Lakehouse

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon