Azure Data Lake Storage
Azure Data Lake Storage Gen2 is a set of capabilities built on Azure Blob Storage for big data analytics. The Aspire Azure Data Lake Storage integration enables you to connect to Azure Data Lake Storage instances from your applications.
Hosting integration
Section titled “Hosting integration”The Azure Data Lake Storage hosting integration models a Data Lake resource as a child of an Azure Storage resource. To add a Data Lake resource, install the 📦 Aspire.Hosting.Azure.Storage NuGet package in your AppHost project:
aspire add azure-storageThe Aspire CLI is interactive, be sure to select the appropriate search result when prompted:
Select an integration to add:
> azure-storage (Aspire.Hosting.Azure.Storage)> Other results listed as selectable options...#:package Aspire.Hosting.Azure.Storage@*<PackageReference Include="Aspire.Hosting.Azure.Storage" Version="*" />Add Azure Data Lake resource
Section titled “Add Azure Data Lake resource”In your AppHost project, call AddDataLake on an IResourceBuilder<AzureStorageResource> to add a Data Lake resource:
var builder = DistributedApplication.CreateBuilder(args);
var storage = builder.AddAzureStorage("storage");var dataLake = storage.AddDataLake("datalake");
builder.AddProject<Projects.ExampleProject>() .WithReference(dataLake);
builder.Build().Run();The preceding code:
- Adds an Azure Storage resource named
storage. - Adds a Data Lake resource named
datalakeas a child of the storage resource. - Passes a reference to the Data Lake resource to the
ExampleProject.
Add Azure Data Lake file system resource
Section titled “Add Azure Data Lake file system resource”You can also add a Data Lake file system resource directly from the storage resource:
var builder = DistributedApplication.CreateBuilder(args);
var storage = builder.AddAzureStorage("storage");var dataLake = storage.AddDataLake("datalake");var fileSystem = storage.AddDataLakeFileSystem("analytics", "analytics-data");
builder.AddProject<Projects.ExampleProject>() .WithReference(dataLake) .WithReference(fileSystem);
builder.Build().Run();The AddDataLakeFileSystem method takes:
name: The resource name used in AspiredataLakeFileSystemName(optional): The actual file system name in Azure. Defaults to the resource name if not specified.
Customize provisioning infrastructure
Section titled “Customize provisioning infrastructure”The Data Lake resource is part of the Azure Storage resource, which is a subclass of AzureProvisioningResource. You can customize the generated Bicep using the ConfigureInfrastructure API on the storage resource. For example, you can configure the storage SKU, access tier, and other properties:
var builder = DistributedApplication.CreateBuilder(args);
var storage = builder.AddAzureStorage("storage") .ConfigureInfrastructure(infra => { var storageAccount = infra.GetProvisionableResources() .OfType<StorageAccount>() .Single();
storageAccount.Sku = new StorageSku { Name = StorageSkuName.PremiumLrs }; storageAccount.Tags.Add("workload", "analytics"); });
var dataLake = storage.AddDataLake("datalake");
builder.AddProject<Projects.ExampleProject>() .WithReference(dataLake);For more information on customizing Azure Storage provisioning, see Azure Blob Storage: Customize provisioning infrastructure.
Client integration
Section titled “Client integration”To get started with the Aspire Azure Data Lake Storage client integration, install the 📦 Aspire.Azure.Storage.Files.DataLake NuGet package in your client-consuming project:
dotnet add package Aspire.Azure.Storage.Files.DataLake#:package Aspire.Azure.Storage.Files.DataLake@*<PackageReference Include="Aspire.Azure.Storage.Files.DataLake" Version="*" />Add Azure Data Lake service client
Section titled “Add Azure Data Lake service client”In the Program.cs file of your client-consuming project, call AddAzureDataLakeServiceClient to register a DataLakeServiceClient for dependency injection:
builder.AddAzureDataLakeServiceClient("datalake");You can then retrieve the DataLakeServiceClient instance using dependency injection:
public class ExampleService(DataLakeServiceClient client){ // Use client...}Add Azure Data Lake file system client
Section titled “Add Azure Data Lake file system client”You can also register a DataLakeFileSystemClient for accessing a specific file system:
builder.AddAzureDataLakeFileSystemClient("analytics");You can then retrieve the DataLakeFileSystemClient instance using dependency injection:
public class ExampleService(DataLakeFileSystemClient client){ // Use client...}Keyed services
Section titled “Keyed services”Both client methods have keyed variants for registering multiple clients:
builder.AddKeyedAzureDataLakeServiceClient("datalake1");builder.AddKeyedAzureDataLakeServiceClient("datalake2");
builder.AddKeyedAzureDataLakeFileSystemClient("analytics");builder.AddKeyedAzureDataLakeFileSystemClient("archive");Configuration
Section titled “Configuration”The Azure Data Lake Storage client integration supports multiple configuration approaches.
Use a connection string
Section titled “Use a connection string”Provide the connection name when calling AddAzureDataLakeServiceClient:
builder.AddAzureDataLakeServiceClient("datalake");The connection string is retrieved from the ConnectionStrings section. Two formats are supported:
Service URI (recommended):
{ "ConnectionStrings": { "datalake": "https://{account_name}.dfs.core.windows.net/" }}When using a service URI, the DefaultAzureCredential is used for authentication.
For file system clients, include the file system name:
{ "ConnectionStrings": { "analytics": "https://{account_name}.dfs.core.windows.net/;FileSystemName=analytics-data" }}Azure Storage connection string:
{ "ConnectionStrings": { "datalake": "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net" }}Use configuration providers
Section titled “Use configuration providers”The integration loads settings from the Aspire:Azure:Storage:Files:DataLake configuration section:
{ "Aspire": { "Azure": { "Storage": { "Files": { "DataLake": { "ServiceUri": "https://{account_name}.dfs.core.windows.net/", "DisableHealthChecks": false, "DisableTracing": false } } } } }}Use inline delegates
Section titled “Use inline delegates”Configure settings programmatically:
builder.AddAzureDataLakeServiceClient( "datalake", settings => settings.DisableHealthChecks = true);Configure client options:
builder.AddAzureDataLakeServiceClient( "datalake", configureClientBuilder: clientBuilder => clientBuilder.ConfigureOptions( options => options.Diagnostics.ApplicationId = "myapp"));Client integration health checks
Section titled “Client integration health checks”By default, the integration adds a health check that verifies connectivity to Azure Data Lake Storage. The health check:
- Is enabled when
DisableHealthChecksisfalse(the default) - Integrates with the
/healthHTTP endpoint
Observability and telemetry
Section titled “Observability and telemetry”Logging
Section titled “Logging”The integration uses these log categories:
Azure.CoreAzure.Identity
Tracing
Section titled “Tracing”The integration emits OpenTelemetry tracing activities:
Azure.Storage.Files.DataLake.DataLakeServiceClientAzure.Storage.Files.DataLake.DataLakeFileSystemClient
Metrics
Section titled “Metrics”The Azure SDK for Data Lake Storage doesn’t currently emit metrics.