Photon is Databricks' proprietary, native C++ query execution engine that runs alongside Apache Spark on the Databricks Lakehouse Platform. While Spark's traditional execution engine is JVM-based, Photon replaces hot code paths within the execution plan with a native, vectorized C++ implementation that leverages SIMD (Single Instruction, Multiple Data) CPU instructions, achieving significant performance gains without changing a single line of user SQL or code.

How Photon Works

When Photon is enabled on a Databricks cluster, Spark's query planner automatically identifies operations that Photon can accelerate, including SQL aggregations, joins, filtering, and Parquet reading. Photon processes data in large columnar batches, allowing modern CPUs to evaluate multiple values per CPU cycle. This eliminates the per-row overhead inherent in JVM object instantiation and garbage collection that limits traditional Spark performance.

Photon and Apache Iceberg (2026)

As of 2026, Photon fully accelerates Apache Iceberg workloads on Databricks through several mechanisms:

The Databricks + Iceberg Value Proposition

Organizations that need Databricks' platform capabilities (collaborative notebooks, MLflow integration, Unity Catalog governance) but also require open-format Iceberg data for compatibility with external engines like Trino or Dremio can now use both simultaneously. Photon accelerates the Databricks-side queries, while the open Iceberg format ensures the same data is freely accessible to any other engine in the organization's data stack.

Master the Agentic Lakehouse

Architecting an Apache Iceberg Lakehouse

Architecting an Apache Iceberg Lakehouse

Buy on Manning
The AI Lakehouse

The AI Lakehouse

Buy on Amazon