Sub-second analytics refers to the ability of a query engine to return results for complex analytical queries (aggregations, joins, filters across billions of rows) in less than one second. This was once considered achievable only by proprietary, fully managed cloud data warehouses like Snowflake or BigQuery. Modern open lakehouse architectures with purpose-built query engines like Dremio have made sub-second performance over raw object storage a reality.
The Layers of Sub-Second Performance
Achieving sub-second latency on petabyte-scale data requires multiple complementary techniques working in concert:
- Aggressive Data Skipping: Iceberg's manifest-level statistics allow the query planner to eliminate the vast majority of data files before the scan begins. A well-partitioned, Z-ordered table might eliminate 99% of files before a single byte is read.
- Columnar Caching: Engines like Dremio cache frequently-accessed Parquet file chunks on local NVMe SSDs in the execution nodes. Repeated dashboard queries read from cache at NVMe speeds rather than S3 network latency.
- Vectorized Execution: Processing data in columnar batches using SIMD CPU instructions dramatically increases per-CPU throughput compared to row-by-row processing.
- Pre-aggregation (Reflections/MVs): For known, high-frequency BI queries, pre-computed Data Reflections or Materialized Views eliminate the need to scan raw data entirely, returning results from a small, optimized aggregate table.
- Massive Parallelism: MPP engines distribute the remaining scan work across many nodes simultaneously, so even if 1% of data must be read, it is read in parallel across the entire cluster.
Why Sub-Second Matters
Research in UX design consistently shows that users perceive responses faster than 100ms as instantaneous and responses under 1 second as fluid. Dashboards and reports that load in sub-second times receive dramatically higher engagement than those that take 10-30 seconds. For AI agents querying lakehouses, sub-second latency directly enables real-time agentic decision loops where the agent can iterate rapidly over multiple analytical queries before taking action.

