Originally developed at Facebook (now Meta) in 2012, Presto (specifically the PrestoDB project) is a distributed SQL query engine optimized for interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. In the modern data stack, Presto is highly favored for its ability to provide high-concurrency BI performance directly on top of open table formats like Apache Iceberg.
Presto in the Lakehouse Architecture
While engines like Apache Spark are optimized for heavy, fault-tolerant batch processing and ETL (Extract, Transform, Load), Presto was designed from day one for interactive queries. This makes it a perfect complement within an interoperable lakehouse:
- Separation of Compute and Storage: Presto is purely a compute engine. It does not store data itself. It connects to external catalogs (like Hive Metastore or Apache Polaris) and executes SQL queries against data living in object storage (S3).
- Federated Querying: Presto can query data where it lives. A single SQL query can join an Apache Iceberg table with a live MySQL database, allowing analysts to perform complex, cross-system analytics without moving data.
The Presto Native Engine (Velox)
A major milestone for Presto in the 2025/2026 era is the ongoing transition to the Presto Native Engine. Historically, Presto execution was constrained by the Java Virtual Machine (JVM). By integrating with Meta's open-source Velox library, Presto is migrating its execution engine to heavily vectorized C++. This provides massive performance leaps, better CPU efficiency, and significantly reduced memory overhead when scanning massive Iceberg tables.
Presto vs. Trino
It is important to distinguish between the two forks of the original project:
- PrestoDB (Presto): Maintained by the Presto Foundation and heavily driven by organizations operating at massive scale (like Meta and Uber). Its recent focus has been on pushing the boundaries of extreme performance via the Native Engine.
- Trino (formerly PrestoSQL): A fork created by the original founders. Trino has a massive open-source community, a wide array of enterprise connectors, and rapid feature adoption.
Both engines are first-class citizens in the Apache Iceberg ecosystem, ensuring that organizations can choose the engine that best fits their operational needs without locking away their underlying data.



