Massively Parallel Processing (MPP) is the architectural pattern used by all major distributed query engines to achieve the scale necessary for querying petabyte-scale data lakehouses. Instead of executing a query on a single, powerful server (SMP, or Symmetric Multi-Processing), MPP distributes the work across tens, hundreds, or even thousands of independent compute nodes simultaneously.

How MPP Query Execution Works

When a query like SELECT country, SUM(revenue) FROM global_sales GROUP BY country arrives at an MPP engine, a coordinator node parses the SQL and generates a distributed execution plan. This plan splits the data scan across all available worker nodes, with each worker responsible for reading a subset of the underlying Parquet files from object storage. Each worker computes partial aggregations (local SUM per country) on its assigned data, then sends those partial results back to the coordinator, which merges them into the final result set.

Shared-Nothing Architecture

Modern cloud-native MPP systems (like Dremio, Trino, and Presto) use a "shared-nothing" architecture where each worker node operates independently with its own CPU and memory. Nodes do not share state or memory with each other. Communication happens only through shuffling intermediate results (like joining data across partitions). This architecture scales horizontally: adding more worker nodes increases throughput linearly for most query patterns.

MPP in the Open Lakehouse

MPP is what makes querying petabyte-scale Apache Iceberg tables practical. A single node might take hours to scan 10TB of Parquet files. An MPP engine distributes that scan across 100 nodes, each scanning 100GB, completing the same scan in minutes. Iceberg's metadata layer complements MPP beautifully: the Iceberg manifest files give the coordinator the precise file list and statistics needed to distribute the scan evenly and implement aggressive data skipping before any worker reads a single byte.

Master the Agentic Lakehouse

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon