Manage and find data with Blob Index for Azure Storage—now in preview

Manage and find data with Blob Index for Azure Storage—now in preview

 

Blob Index—a managed secondary index, allowing you to store multi-dimensional object attributes to describe your data objects for Azure Blob storage—is now available in preview. Built on top of blob storage, Blob Index offers consistent reliability, availability, and performance for all your workloads. Blob Index provides native object management and filtering capabilities, which allows you to categorize and find data based on attribute tags set on the data.

Manage and find data with Blob Index

As datasets get larger, finding specific related objects in a sea of data can be difficult and frustrating. Previously, clients used the ListBlobs API to retrieve 5000 lexicographical records at a time, parse through the list, and repeat until you found the blobs you wanted. Some users also resorted to managing a separate lookup table to find specific objects. These separate tables can get out-of-sync—increasing cost, complexity, and frustration. Customers should not have to worry about data organization or index table management, and instead focus on building powerful applications to grow their business.

Blob Index alleviates the data management and querying problem with support for all blob types (Block Blob, Append Blob, and Page Blob). Blob Index is exposed through a familiar blob storage endpoint and APIs, allowing you to easily store and access both your data and classification indices on the same service to reduce application complexity.

To populate the blob index, you define key-value tag attributes on your data, either on new data during upload or on existing data already in your storage account. These blob index tags are stored alongside your underlying blob data. The blob indexing engine then automatically reads the new tags, indexes them, and exposes them to a user-queryable blob index. Using the Azure portal, REST APIs, or SDKs, you can then issue a FindBlobsByTags API call specify a set of criteria. Blob storage will return a filtered result set consisting only of the blobs that met the match criteria.

The below scenario is an example of how Blob Index works:

  1. In a storage account container with a million blobs, a user uploads a new blob “B2” with the following blob index tags: < Status = Unprocessed, Quality = 8K, Source = RAW >.
  2. The blob and its blob index tags are persisted to the storage account and the account indexing engine exposes the new blob index shortly after.
  3. Later on, an encoding application wants to find all unprocessed media files that are at least 4K resolution quality. It issues a FindBlobs API call to find all blobs that match the following criteria: < Status = Unprocessed AND Quality >= 4K AND Status == RAW>.
  4. The blob index quickly returns just blob “B2,” the sole blob out of one million blobs that matches the specified criteria. The encoding application can quickly start its processing job, saving idle compute time and money.

 

Blob Index overview example.

Platform feature integrations with Blob Index

Blob Index not only helps you categorize, manage, and find your blob data but also provides integrations with other Blob service features, such as Lifecycle management.

Using the new blobIndexMatch as a filter, you can move data to cooler tiers or delete data based on the tags applied to your blobs. This allows you to be more granular in your rules and only move or delete data if they match your specified criteria.

The following sample lifecycle management policy applies to block blobs in the “videofiles” container and tiers objects to archive storage after one day only if the blobs match the blob index tag of Status = ‘Processed’ and Source = ‘RAW’.

Lifecycle management rule with blobIndexMatch example.

Lifecycle management integration with Blob Index is just the beginning. We will be adding more integrations with other blob platform features soon!

Conditional blob operations with Blob Index tags

In REST versions 2019-10-10 and higher, most blob service APIs now support a new conditional header, x-ms-if-tags, so that the operation will only succeed if the specified blob index tags condition is met. If the condition is not met, the operation will fail, thus not modifying the blob. This functionality by Blob Index can help ensure data operations only occur on explicitly tagged blobs and can protect against inadvertent deletion or modification by multi-threaded applications.

How to get started

To enroll in the Blog Index preview, submit a request to register this feature to your subscription by running the following PowerShell or CLI commands:

Register by using PowerShell

Register-AzProviderFeature -FeatureName BlobIndex -ProviderNamespace Microsoft.Storage

Register-AzResourceProvider -ProviderNamespace Microsoft.Storage

Register by using Azure CLI

az feature register --namespace Microsoft.Storage --name BlobIndex

​az provider register --namespace 'Microsoft.Storage'

After your request is approved, any existing or new General-purpose v2 (GPv2) storage accounts in France Central and France South can leverage Blob Index’s capabilities. As with most previews, we recommend that this feature should not be used for production workloads until it reaches general availability.

Build it, use it, and tell us about it!

Once you’re registered and approved for the preview, you can start leveraging all that Blob Index has to offer by setting tags on new or existing data, finding data based on tags, and setting rich lifecycle management policies with tag filters. For more information, please see Manage and find data on Azure Blob Storage with Blob Index.

Note, customers are charged for the total number of Blob Index tags within a storage account, averaged over the month. Requests to SetBlobTags, GetBlobTags, and FindBlobsByTags are charged in accordance to their respective operation types. There is no cost for the indexing engine. See Block Blob pricing to learn more.

We will continue to improve our feature capabilities and are looking forward to hearing your feedback regarding Blob Index or other features through email at BlobIndexPreview@microsoft.com. As a reminder, we love hearing all of your ideas and suggestions about Azure Storage, which you can post at Azure Storage feedback forum.