The Azure Data Share service simplifies the process to share structured & unstructured data from the Azure cloud to your partner organization. The data from different services (Storage account, Azure Data Lake Storage, Azure SQL database warehouse, and Azure SQL databases) in the Azure cloud could be shared with external vendors or organizations. The data can be on from multiple sources, in any format, and size can be shared. The key highlights of the Azure Data share services:
- No infrastructure setup or management (Serverless) required to enable the data share.
- The services use Azure underlying security measures.
- An interface for governing and manage the data sharing.
- No limits on the source of data size.
Terminology and concepts used in Azure Data Share
- Dataset: A dataset is the specific data (source data) that is used to be shared to others. A dataset can only include resources from one Azure Data store. For example, a dataset can be an Azure blob container, a blob folder, a blob, Azure Data Lake Storage (“ADLS”) Gen2 file system, a SQL table or a SQL view etc.
- Data provider: Its typically the data provider who owns the source data and wants to share to other consumers.
- Data consumer: The person or organization who receives the data from data provider.
- Dataset Snapshot: The snapshot is used to move the data from Source to Destination Azure Service. It supports two types of Snapshots, full and incremental.
- Share Subscription: It is created when a data consumer accepts the data from provider via sharing.
- Invitation and Recipient: The data provider send the invites to the recipient for sharing the data. Once the recipient of an invitation accepts the invitation, they become a data consumer.
The below picture illustrates the concepts of Data share services. The data shares offer two types: Snapshot-based offering and in-place sharing.
The snapshot offering moves the data from data providers’ subscriptions to consumer subscriptions. The data provider provision a data share and sends the invitation to the recipients for the data share. Data consumers receive an invitation to data share via e-mail. Once a data consumer accepts the invitation, they can trigger a complete snapshot of the shared data. This data is received into the data consumer’s storage account. Data consumers can receive regular, incremental updates to the data transmitted with them to always have the latest version of the data.
With in-place sharing, the data can be shared without copy/moving the data from the Data provider. After sharing relationship is established through the invitation flow, a symbolic link is created between the data provider’s source data store and the data consumer’s target data store. A data consumer can read and query the data in real-time using its data store. Changes to the source datastore are available to the data consumer immediately. In-place sharing is currently available for Azure Data Explorer.
Please see the article for more information on the supported data stores in Azure data share.
The Azure Data Share service does not have to be available in your region to leverage the service. For example, suppose you have data stored in an Azure Storage account located in a region where Azure Data Share is unavailable. In that case, you can still leverage the service to share your data.
Security:
The Data share leverages the underlying security services of Azure, data in transit using TLS1.2, data is encrypted at rest. The service uses Azure Managed identity to access the data store/data sets used in the data sharing.
Data share supports the storage account turned with a firewall by enabling the Allow trusted Microsoft services in the storage account.
Role-based access controls can be set on the Data Share resource level to ensure authorized users access it. The owner and Contributor of a Data Share resource can share data, receive shares, and change existing shares. The reader of a Data Share resource can view shares but cannot make changes.
Pricing:
Data Share will charge for Dataset Snapshots and Snapshot Execution. Dataset Snapshot is the operation to move a dataset from its source to a destination. Snapshot Execution includes the resources required to carry a dataset from the source to the destination.
Type | Price |
Dataset Snapshots | $0.05 per dataset-snapshot |
Snapshot Execution | $0.50 per vCore-hour |
In addition, there is an incur network data transfer charges, the cost for data ingress is free, and egress billed depending on where the source and destination are located.
Santhosh has over 15 years of experience in the IT organization. Working as a Cloud Infrastructure Architect and has a wide range of expertise in Microsoft technologies, with a specialization in public & private cloud services for enterprise customers. My varied background includes work in cloud computing, virtualization, storage, networks, automation and DevOps.