The next-generation query engine for the lakehouse

Photon is a new execution engine on the Databricks Lakehouse platform that provides extremely fast query performance at low cost for SQL workloads, directly on your data lake. With Photon, most analytics workloads can meet or exceed data warehouse performance without actually moving any data into a data warehouse. Photon is compatible with Apache Spark™ APIs, so getting started is as easy as turning it on.

Faster query performance

Built for the fastest performance on real-world applications, Photon provides best-in-class performance for your SQL workloads, directly on your data lake.

No code changes

Designed to be compatible with Apache Spark APIs, Photon will work with your existing code — no rewrite required.

Broad language support

Photon currently supports SQL workloads but will ultimately accelerate all your data use cases — from streaming to batch workloads — using SQL, Python, R, Scala and Java.

Why Photon?

Query performance on Databricks has steadily increased over the years, powered by Apache Spark and thousands of optimizations packaged as part of the Databricks Runtimes (DBR). Photon — a new native vectorized engine entirely written in C++ — provides an additional 2x speedup per the TPC-DS 1TB benchmark, and customers have observed 2x–4x speedups on average based on their workloads compared to the latest DBR versions.

Use cases

SQL-based jobs

Accelerate large-scale production jobs on SQL and Spark DataFrames.

IoT applications

Faster time-series analysis using Photon compared to Spark and traditional Databricks Runtime.

Data privacy and compliance

Query petabytes-scale datasets to identify and delete records without duplicating data with Delta Lake,production jobs and Photon.

Loading data into Delta Lake and Parquet

Photon’s vectorized I/O speeds up data loads for Delta Lake and Parquet tables, lowering overall runtime and costs of data engineering jobs.

How does it work?

Best price/performance for analytics in the cloud

Written from the ground up in C++, Photon takes advantage of modern hardware for faster queries, providing up to 6x better price/performance compared to other cloud data warehouses — all natively on your data lake.

Works with your existing code and avoids vendor lock-in

Photon is designed to be compatible with the Apache Spark DataFrame and SQL APIs to ensure workloads run seamlessly without code changes. All you have to do to benefit from Photon is turn it on. Photon will seamlessly coordinate work and resources and transparently accelerate portions of your SQL and Spark queries. No tuning or user intervention required.

Optimizing for all data use cases and workloads

While the new engine is designed to accelerate all workloads, during preview, Photon is focused on running SQL workloads faster, while reducing your total cost per workload. Ultimately, Photon will support all data and machine learning use cases as well.

Ready to get started?

try databricks for free



Tech talk