Skip to content
Docs Try Aspire
Azure Data Lake Storage logo

Azure Data Lake Storage Gen2 is a set of capabilities built on Azure Blob Storage for big data analytics. The Aspire Azure Data Lake Storage integration enables you to connect to Azure Data Lake Storage instances from your applications.

The Azure Data Lake Storage hosting integration models a Data Lake resource as a child of an Azure Storage resource. To add a Data Lake resource, install the 📦 Aspire.Hosting.Azure.Storage NuGet package in your AppHost project:

Aspire CLI — Add Aspire.Hosting.Azure.Storage package
aspire add azure-storage

The Aspire CLI is interactive, be sure to select the appropriate search result when prompted:

Aspire CLI — Example output prompt
Select an integration to add:
> azure-storage (Aspire.Hosting.Azure.Storage)
> Other results listed as selectable options...

In your AppHost project, call AddDataLake on an IResourceBuilder<AzureStorageResource> to add a Data Lake resource:

C# — AppHost.cs
var builder = DistributedApplication.CreateBuilder(args);
var storage = builder.AddAzureStorage("storage");
var dataLake = storage.AddDataLake("datalake");
builder.AddProject<Projects.ExampleProject>()
.WithReference(dataLake);
builder.Build().Run();

The preceding code:

  • Adds an Azure Storage resource named storage.
  • Adds a Data Lake resource named datalake as a child of the storage resource.
  • Passes a reference to the Data Lake resource to the ExampleProject.

You can also add a Data Lake file system resource directly from the storage resource:

C# — AppHost.cs
var builder = DistributedApplication.CreateBuilder(args);
var storage = builder.AddAzureStorage("storage");
var dataLake = storage.AddDataLake("datalake");
var fileSystem = storage.AddDataLakeFileSystem("analytics", "analytics-data");
builder.AddProject<Projects.ExampleProject>()
.WithReference(dataLake)
.WithReference(fileSystem);
builder.Build().Run();

The AddDataLakeFileSystem method takes:

  • name: The resource name used in Aspire
  • dataLakeFileSystemName (optional): The actual file system name in Azure. Defaults to the resource name if not specified.

The Data Lake resource is part of the Azure Storage resource, which is a subclass of AzureProvisioningResource. You can customize the generated Bicep using the ConfigureInfrastructure API on the storage resource. For example, you can configure the storage SKU, access tier, and other properties:

AppHost.cs
var builder = DistributedApplication.CreateBuilder(args);
var storage = builder.AddAzureStorage("storage")
.ConfigureInfrastructure(infra =>
{
var storageAccount = infra.GetProvisionableResources()
.OfType<StorageAccount>()
.Single();
storageAccount.Sku = new StorageSku { Name = StorageSkuName.PremiumLrs };
storageAccount.Tags.Add("workload", "analytics");
});
var dataLake = storage.AddDataLake("datalake");
builder.AddProject<Projects.ExampleProject>()
.WithReference(dataLake);

For more information on customizing Azure Storage provisioning, see Azure Blob Storage: Customize provisioning infrastructure.

To get started with the Aspire Azure Data Lake Storage client integration, install the 📦 Aspire.Azure.Storage.Files.DataLake NuGet package in your client-consuming project:

.NET CLI — Add Aspire.Azure.Storage.Files.DataLake package
dotnet add package Aspire.Azure.Storage.Files.DataLake

In the Program.cs file of your client-consuming project, call AddAzureDataLakeServiceClient to register a DataLakeServiceClient for dependency injection:

builder.AddAzureDataLakeServiceClient("datalake");

You can then retrieve the DataLakeServiceClient instance using dependency injection:

public class ExampleService(DataLakeServiceClient client)
{
// Use client...
}

You can also register a DataLakeFileSystemClient for accessing a specific file system:

builder.AddAzureDataLakeFileSystemClient("analytics");

You can then retrieve the DataLakeFileSystemClient instance using dependency injection:

public class ExampleService(DataLakeFileSystemClient client)
{
// Use client...
}

Both client methods have keyed variants for registering multiple clients:

builder.AddKeyedAzureDataLakeServiceClient("datalake1");
builder.AddKeyedAzureDataLakeServiceClient("datalake2");
builder.AddKeyedAzureDataLakeFileSystemClient("analytics");
builder.AddKeyedAzureDataLakeFileSystemClient("archive");

The Azure Data Lake Storage client integration supports multiple configuration approaches.

Provide the connection name when calling AddAzureDataLakeServiceClient:

builder.AddAzureDataLakeServiceClient("datalake");

The connection string is retrieved from the ConnectionStrings section. Two formats are supported:

Service URI (recommended):

{
"ConnectionStrings": {
"datalake": "https://{account_name}.dfs.core.windows.net/"
}
}

When using a service URI, the DefaultAzureCredential is used for authentication.

For file system clients, include the file system name:

{
"ConnectionStrings": {
"analytics": "https://{account_name}.dfs.core.windows.net/;FileSystemName=analytics-data"
}
}

Azure Storage connection string:

{
"ConnectionStrings": {
"datalake": "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net"
}
}

The integration loads settings from the Aspire:Azure:Storage:Files:DataLake configuration section:

{
"Aspire": {
"Azure": {
"Storage": {
"Files": {
"DataLake": {
"ServiceUri": "https://{account_name}.dfs.core.windows.net/",
"DisableHealthChecks": false,
"DisableTracing": false
}
}
}
}
}
}

Configure settings programmatically:

builder.AddAzureDataLakeServiceClient(
"datalake",
settings => settings.DisableHealthChecks = true);

Configure client options:

builder.AddAzureDataLakeServiceClient(
"datalake",
configureClientBuilder: clientBuilder =>
clientBuilder.ConfigureOptions(
options => options.Diagnostics.ApplicationId = "myapp"));

By default, the integration adds a health check that verifies connectivity to Azure Data Lake Storage. The health check:

  • Is enabled when DisableHealthChecks is false (the default)
  • Integrates with the /health HTTP endpoint

The integration uses these log categories:

  • Azure.Core
  • Azure.Identity

The integration emits OpenTelemetry tracing activities:

  • Azure.Storage.Files.DataLake.DataLakeServiceClient
  • Azure.Storage.Files.DataLake.DataLakeFileSystemClient

The Azure SDK for Data Lake Storage doesn’t currently emit metrics.