Photon is Databricks' proprietary, native C++ query execution engine that runs alongside Apache Spark on the Databricks Lakehouse Platform. While Spark's traditional execution engine is JVM-based, Photon replaces hot code paths within the execution plan with a native, vectorized C++ implementation that leverages SIMD (Single Instruction, Multiple Data) CPU instructions, achieving significant performance gains without changing a single line of user SQL or code.
How Photon Works
When Photon is enabled on a Databricks cluster, Spark's query planner automatically identifies operations that Photon can accelerate, including SQL aggregations, joins, filtering, and Parquet reading. Photon processes data in large columnar batches, allowing modern CPUs to evaluate multiple values per CPU cycle. This eliminates the per-row overhead inherent in JVM object instantiation and garbage collection that limits traditional Spark performance.
Photon and Apache Iceberg (2026)
As of 2026, Photon fully accelerates Apache Iceberg workloads on Databricks through several mechanisms:
- Vectorized Parquet Writing: Photon's native Parquet writer accelerates the ingestion and compaction of Iceberg tables, improving write throughput compared to the JVM-based writer.
- Iceberg v3 Integration: Databricks introduced support for Iceberg v3 features (including deletion vectors and row lineage) in April 2026. Deletion vectors enable merge-on-read semantics for high-velocity updates, significantly reducing write amplification for CDC and MERGE operations on Iceberg tables.
- Predictive Optimization: Unity Catalog automatically handles compaction, file sizing, and Liquid Clustering for Iceberg tables, ensuring Photon always operates on optimally-structured data files.
The Databricks + Iceberg Value Proposition
Organizations that need Databricks' platform capabilities (collaborative notebooks, MLflow integration, Unity Catalog governance) but also require open-format Iceberg data for compatibility with external engines like Trino or Dremio can now use both simultaneously. Photon accelerates the Databricks-side queries, while the open Iceberg format ensures the same data is freely accessible to any other engine in the organization's data stack.

