Query caching encompasses a family of techniques that store the results of expensive computations so they can be reused when the same (or similar) queries arrive again, without re-reading and re-processing the underlying data. In a data lakehouse, caching operates at multiple layers of the query execution stack.
Result Cache
The simplest form of caching is the result cache. When a query completes, the engine stores the complete result set (the returned rows and columns) in memory or fast SSD storage. When the identical query arrives again, the engine returns the cached result without accessing any data. Result caches are effective for static dashboards where the same SQL is executed repeatedly (e.g., every time a dashboard tab is opened). The key challenge is cache invalidation: the engine must detect when the underlying Iceberg table has been updated and discard stale cached results.
Columnar File Cache (C3 Pattern)
Dremio pioneered a more sophisticated caching layer through its C3 (Columnar Cloud Cache) system. Rather than caching query results, C3 caches frequently accessed raw Parquet column chunks from S3 on local NVMe SSDs attached to executor nodes. When a query references a previously read column chunk, C3 serves it from NVMe at 10-100x the speed of reading from S3 over the network. This columnar cache is more flexible than a result cache because it accelerates all queries that touch cached columns, not just identical queries.
Metadata Caching
For Iceberg tables with thousands of Parquet files, the catalog lookup (reading manifests, manifest lists, and metadata files) can itself add latency. Modern query engines maintain in-memory metadata caches that store recently accessed Iceberg metadata, reducing the number of S3 object reads required just to plan a query. Iceberg's immutable snapshot model makes metadata caching safe: a cached snapshot's manifest files never change, so the cache entry remains valid until a new snapshot is created.

