Let me tell you, in the last sprint we were working on a project where we had a new requirement where we wanted to expose tables from the Databricks catalog to an external service.
Normally this process is done from Databricks to databricks, but this requirement is new to this project.
Solution, Delta Sharing, Let’s talk about this protocol before showing you how it is solved.
In today’s data-driven world, secure and seamless data sharing between organizations and platforms is critical. Delta Sharing is an open protocol developed by Databricks to meet this need by enabling secure and efficient data sharing. The protocol allows data providers to share instant data directly with consumers without the need for complex data pipelines or data replication.
Delta Sharing leverages the power of Delta Lake to ensure shared data is always up to date and consistent. It supports multiple data formats and integrates seamlessly with various data tools and platforms, making it a versatile solution for modern data collaboration.
In this article, we’ll explore the main features of Delta Sharing, its benefits, and how to start implementing it in a Databricks environment. Whether you are a data provider looking to share datasets, or a data consumer looking to easily access shared data, Delta Sharing offers powerful and scalable solutions to meet your needs.
Now what is the purpose of our coming.
First we have to create a share:
CREATE SHARE IF NOT EXISTS recipiente_share;
Once created, we can see everything created using this code:
SHOW SHARES
Then you need to create a recipient:
CREATE RECIPIENT IF NOT EXISTS BigQueryDataConsumer
COMMENT "delta Sharing With BigQuery"
We can see all recipients created:
SHOW RECIPIENTS;
It is necessary to grant query permission to this recipient:
GRANT SELECT
ON SHARE recipiente_share
TO RECIPIENT BigQueryDataConsumer
After creating the recipient and obtaining the necessary permissions, we can see its details:
DESCRIBE RECIPIENT bigquerydataconsumer
It logs the details there, but for practice the most important is “activation_link”:
This url will give us an archive containing the token and the endpoint to the table:
We will use this information to connect different services.
Thanks!