Query pushdown is the query optimization technique of evaluating filter predicates and column projections as early as possible in the query execution pipeline, ideally at the point of data reading, to minimize the amount of data that flows through the system. It is one of the single most impactful performance optimizations in data lakehouse architectures.
Predicate Pushdown
Consider the query SELECT name FROM orders WHERE country = 'US' AND year = 2024. Without predicate pushdown, a naive engine would read all rows from the orders table, ship them to the computing layer, and then apply the country and year filters. With predicate pushdown, the engine pushes those filters directly to the Parquet file reader, which uses:
- Partition Pruning: Iceberg's partition metadata immediately eliminates any Parquet file that doesn't contain data for year=2024, avoiding reading those files entirely.
- Row Group Pruning: Within remaining Parquet files, the embedded column statistics (min/max per row group) allow the reader to skip entire row groups where the country column max value is less than 'US' or min value is greater than 'US'.
- Page-Level Filtering: At the finest granularity, Bloom filters embedded in Parquet pages can confirm that a specific value (like a user ID) does not exist in a page without reading the page's actual data.
Projection Pushdown
Projection pushdown (also called column pruning) means only reading the columns needed by the query from the Parquet file. Since Parquet stores data column-by-column, reading a query that references 3 of 100 columns only requires reading approximately 3% of the file's data bytes. Combined with predicate pushdown, a well-structured Iceberg table query can scan less than 0.1% of the underlying data to answer a specific analytical question.

