Announcing General Availability of Databricks SQL
Today, we are thrilled to announce that Databricks SQL is Generally Available (GA)! This follows the announcement earlier this month about Databrick SQL’s world record-setting performance for data warehousing workloads, and adoption of standard ANSI SQL. With GA, you can expect the highest level of stability, support and enterprise-readiness from Databricks for mission-critical workloads on...
Introducing Data Profiles in the Databricks Notebook
Before a data scientist can write a report on analytics or train a machine learning (ML) model, they need to understand the shape and content of their data. This exploratory data analysis is iterative, with each stage of the cycle often involving the same basic techniques: visualizing data distributions and computing summary statistics like row...
Evolution of the SQL language at Databricks: ANSI standard by default and easier migrations from data warehouses
Today, we are excited to announce that Databricks SQL will use the ANSI standard SQL dialect by default. This follows the announcement earlier this month about Databricks SQL’s record-setting performance and marks a major milestone in our quest to support open standards. This blog post discusses how this update makes it easier to migrate your...
Now Generally Available: Simplify Data and Machine Learning Pipelines With Jobs Orchestration
We are excited to announce the general availability of Jobs orchestration, a new capability that lets Databricks customers easily build data and machine learning pipelines consisting of multiple, dependent tasks. Today, data pipelines are frequently defined as a sequence of dependent tasks to simplify some of their complexity. But, they still demand heavy lifting from...
Introducing SQL User-Defined Functions
A user-defined function (UDF) is a means for a user to extend the native capabilities of Apache Spark™ SQL. SQL on Databricks has supported external user-defined functions written in Scala, Java, Python and R programming languages since 1.3.0. While external UDFs are very powerful, they also come with a few caveats: Security. A UDF written...
Databricks Repos Is Now Generally Available – New ‘Files’ Feature in Public Preview
Thousands of Databricks customers have adopted Databricks Repos since its public preview and have standardized on it for their development and production workflows. Today, we are happy to announce that Databricks Repos is now generally available. Databricks Repos was created to solve a persistent problem for data teams: most tools used by data engineering/machine learning...
Databricks SQL: Delivering a Production SQL Development Experience on the Data Lake
Databricks SQL (DB SQL) is a simple and powerful SQL analytics platform for creating and sharing insights at a fraction of the cost of cloud data warehouses. Data analysts can either connect business intelligence (BI) tools of their choice to SQL endpoints, leverage the built-in analytics capabilities (SQL query editor, visualizations and dashboards), or some...
Part 1: Implementing CI/CD on Databricks Using Databricks Notebooks and Azure DevOps
Discussed code can be found here. This is the first part of a two-part series of blog posts that show how to configure and build end-to-end MLOps solutions on Databricks with notebooks and Repos API. This post presents a CI/CD framework on Databricks, which is based on Notebooks. The pipeline integrates with the Microsoft Azure...
Announcing Public Preview of Low Shuffle Merge
Today, we are excited to announce the public preview of Low Shuffle Merge in Delta Lake, available on AWS, Azure, and Google Cloud. This new and improved MERGE algorithm is substantially faster and provides huge cost savings for our customers, especially with common use cases like updating a small number of rows in a given...
5 Steps to Implementing Intelligent Data Pipelines With Delta Live Tables
Many IT organizations are familiar with the traditional extract, transform and load (ETL) process - as a series of steps defined to move and transform data from source to traditional data warehouses and data marts for reporting purposes. However, as organizations morph to become more and more data-driven, the vast and various amounts of data,...