Azure Container Apps GPU Support - 2025 Features
Complete knowledge base for Azure Container Apps with GPU support, serverless capabilities, and Dapr integration (2025 GA features).
Overview
Azure Container Apps is a serverless container platform with native GPU support, Dapr integration, and scale-to-zero capabilities for cost-efficient AI/ML workloads.
Key 2025 Features (Build Announcements)
- Serverless GPU (GA)
-
Automatic scaling: Scale GPU workloads based on demand
-
Scale-to-zero: Pay only when GPU is actively used
-
Per-second billing: Granular cost control
-
Optimized cold start: Fast initialization for AI models
-
Reduced operational overhead: No infrastructure management
- Dedicated GPU (GA)
-
Consistent performance: Dedicated GPU resources
-
Simplified AI deployment: Easy model hosting
-
Long-running workloads: Ideal for training and continuous inference
-
Multiple GPU types: NVIDIA A100, T4, and more
- Dynamic Sessions with GPU (Early Access)
-
Sandboxed execution: Run untrusted AI-generated code
-
Hyper-V isolation: Enhanced security
-
GPU-powered Python interpreter: Handle compute-intensive AI workloads
-
Scale at runtime: Dynamic resource allocation
- Foundry Models Integration
-
Deploy AI models directly: During container app creation
-
Ready-to-use models: Pre-configured inference endpoints
-
Azure AI Foundry: Seamless integration
- Workflow with Durable Task Scheduler (Preview)
-
Long-running workflows: Reliable orchestration
-
State management: Automatic persistence
-
Event-driven: Trigger workflows from events
- Native Azure Functions Support
-
Functions runtime: Run Azure Functions in Container Apps
-
Consistent development: Same code, serverless execution
-
Event triggers: All Functions triggers supported
- Dapr Integration (GA)
-
Service discovery: Built-in DNS-based discovery
-
State management: Distributed state stores
-
Pub/sub messaging: Reliable messaging patterns
-
Service invocation: Resilient service-to-service calls
-
Observability: Integrated tracing and metrics
Creating Container Apps with GPU
Basic Container App with Serverless GPU
Create Container Apps environment
az containerapp env create
--name myenv
--resource-group MyRG
--location eastus
--logs-workspace-id <workspace-id>
--logs-workspace-key <workspace-key>
Create Container App with GPU
az containerapp create
--name myapp-gpu
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/ai-model:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--ingress external
--target-port 8080
Production-Ready Container App with GPU
az containerapp create
--name myapp-gpu-prod
--resource-group MyRG
--environment myenv
\
Container configuration
--image myregistry.azurecr.io/ai-model:latest
--registry-server myregistry.azurecr.io
--registry-identity system
\
Resources
--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
\
Scaling
--min-replicas 0
--max-replicas 20
--scale-rule-name http-scaling
--scale-rule-type http
--scale-rule-http-concurrency 10
\
Networking
--ingress external
--target-port 8080
--transport http2
--exposed-port 8080
\
Security
--registry-identity system
--env-vars "AZURE_CLIENT_ID=secretref:client-id"
\
Monitoring
--dapr-app-id myapp
--dapr-app-port 8080
--dapr-app-protocol http
--enable-dapr
\
Identity
--system-assigned
Container Apps Environment Configuration
Environment with Zone Redundancy
az containerapp env create
--name myenv-prod
--resource-group MyRG
--location eastus
--logs-workspace-id <workspace-id>
--logs-workspace-key <workspace-key>
--zone-redundant true
--enable-workload-profiles true
Workload Profiles (Dedicated GPU)
Create environment with workload profiles
az containerapp env create
--name myenv-gpu
--resource-group MyRG
--location eastus
--enable-workload-profiles true
Add GPU workload profile
az containerapp env workload-profile add
--name myenv-gpu
--resource-group MyRG
--workload-profile-name gpu-profile
--workload-profile-type GPU-A100
--min-nodes 0
--max-nodes 10
Create container app with GPU profile
az containerapp create
--name myapp-dedicated-gpu
--resource-group MyRG
--environment myenv-gpu
--workload-profile-name gpu-profile
--image myregistry.azurecr.io/training-job:latest
--cpu 8
--memory 16Gi
--min-replicas 1
--max-replicas 5
GPU Scaling Rules
Custom Prometheus Scaling
az containerapp create
--name myapp-gpu-prometheus
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/ai-model:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--scale-rule-name gpu-utilization
--scale-rule-type custom
--scale-rule-custom-type prometheus
--scale-rule-metadata
serverAddress=http://prometheus.monitoring.svc.cluster.local:9090
metricName=gpu_utilization
threshold=80
query="avg(nvidia_gpu_utilization{app='myapp'})"
Queue-Based Scaling (Azure Service Bus)
az containerapp create
--name myapp-queue-processor
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/batch-processor:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-t4
--gpu-count 1
--min-replicas 0
--max-replicas 50
--scale-rule-name queue-scaling
--scale-rule-type azure-servicebus
--scale-rule-metadata
queueName=ai-jobs
namespace=myservicebus
messageCount=5
--scale-rule-auth connection=servicebus-connection
Dapr Integration
Enable Dapr on Container App
az containerapp create
--name myapp-dapr
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/myapp:latest
--enable-dapr
--dapr-app-id myapp
--dapr-app-port 8080
--dapr-app-protocol http
--dapr-http-max-request-size 4
--dapr-http-read-buffer-size 4
--dapr-log-level info
--dapr-enable-api-logging true
Dapr State Store (Azure Cosmos DB)
Create Dapr component for state store
apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: statestore spec: type: state.azure.cosmosdb version: v1 metadata: - name: url value: "https://mycosmosdb.documents.azure.com:443/" - name: masterKey secretRef: cosmosdb-key - name: database value: "mydb" - name: collection value: "state"
Create the component
az containerapp env dapr-component set
--name myenv
--resource-group MyRG
--dapr-component-name statestore
--yaml component.yaml
Dapr Pub/Sub (Azure Service Bus)
apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: pubsub spec: type: pubsub.azure.servicebus.topics version: v1 metadata: - name: connectionString secretRef: servicebus-connection - name: consumerID value: "myapp"
Service-to-Service Invocation
Python example using Dapr SDK
from dapr.clients import DaprClient
with DaprClient() as client: # Invoke another service response = client.invoke_method( app_id='other-service', method_name='process', data='{"input": "data"}' )
# Save state
client.save_state(
store_name='statestore',
key='mykey',
value='myvalue'
)
# Publish message
client.publish_event(
pubsub_name='pubsub',
topic_name='orders',
data='{"orderId": "123"}'
)
AI Model Deployment Patterns
OpenAI-Compatible Endpoint
Dockerfile for vLLM model serving
FROM vllm/vllm-openai:latest
ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" ENV GPU_MEMORY_UTILIZATION=0.9 ENV MAX_MODEL_LEN=4096
CMD ["--model", "${MODEL_NAME}",
"--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}",
"--max-model-len", "${MAX_MODEL_LEN}",
"--port", "8080"]
Deploy vLLM model
az containerapp create
--name llama-inference
--resource-group MyRG
--environment myenv
--image vllm/vllm-openai:latest
--cpu 8
--memory 32Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 1
--max-replicas 5
--target-port 8080
--ingress external
--env-vars
MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
GPU_MEMORY_UTILIZATION="0.9"
HF_TOKEN=secretref:huggingface-token
Stable Diffusion Image Generation
az containerapp create
--name stable-diffusion
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/stable-diffusion:latest
--cpu 4
--memory 16Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--target-port 7860
--ingress external
--scale-rule-name http-scaling
--scale-rule-type http
--scale-rule-http-concurrency 1
Batch Processing Job
az containerapp job create
--name batch-training-job
--resource-group MyRG
--environment myenv
--trigger-type Manual
--image myregistry.azurecr.io/training:latest
--cpu 8
--memory 32Gi
--gpu-type nvidia-a100
--gpu-count 2
--parallelism 1
--replica-timeout 7200
--replica-retry-limit 3
--env-vars
DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv"
MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/"
EPOCHS="100"
Execute job
az containerapp job start
--name batch-training-job
--resource-group MyRG
Monitoring and Observability
Application Insights Integration
az containerapp create
--name myapp-monitored
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/myapp:latest
--env-vars
APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection
Query Logs
Stream logs
az containerapp logs show
--name myapp-gpu
--resource-group MyRG
--follow
Query with Log Analytics
az monitor log-analytics query
--workspace <workspace-id>
--analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"
Metrics and Alerts
Create metric alert for GPU usage
az monitor metrics alert create
--name high-gpu-usage
--resource-group MyRG
--scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--condition "avg Requests > 100"
--window-size 5m
--evaluation-frequency 1m
--action <action-group-id>
Security Best Practices
Managed Identity
Create with system-assigned identity
az containerapp create
--name myapp-identity
--resource-group MyRG
--environment myenv
--system-assigned
--image myregistry.azurecr.io/myapp:latest
Get identity principal ID
IDENTITY_ID=$(az containerapp show -g MyRG -n myapp-identity --query identity.principalId -o tsv)
Assign role to access Key Vault
az role assignment create
--assignee $IDENTITY_ID
--role "Key Vault Secrets User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.KeyVault/vaults/mykeyvault
Use user-assigned identity
az identity create --name myapp-identity --resource-group MyRG IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)
az containerapp create
--name myapp-user-identity
--resource-group MyRG
--environment myenv
--user-assigned $IDENTITY_RESOURCE_ID
--image myregistry.azurecr.io/myapp:latest
Secret Management
Add secrets
az containerapp secret set
--name myapp-gpu
--resource-group MyRG
--secrets
huggingface-token="<token>"
api-key="<key>"
Reference secrets in environment variables
az containerapp update
--name myapp-gpu
--resource-group MyRG
--set-env-vars
HF_TOKEN=secretref:huggingface-token
API_KEY=secretref:api-key
Cost Optimization
Scale-to-Zero Configuration
az containerapp create
--name myapp-scale-zero
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/myapp:latest
--min-replicas 0
--max-replicas 10
--scale-rule-name http-scaling
--scale-rule-type http
--scale-rule-http-concurrency 10
Cost savings: Pay only when requests are being processed. GPU costs are per-second when active.
Right-Sizing Resources
Start with minimal resources
--cpu 2 --memory 4Gi --gpu-count 1
Monitor and adjust based on actual usage
az monitor metrics list
--resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--metric "CpuPercentage,MemoryPercentage"
Use Spot/Preemptible GPUs (Future Feature)
When available, configure spot instances for non-critical workloads to save up to 80% on GPU costs.
Troubleshooting
Check Revision Status
az containerapp revision list
--name myapp-gpu
--resource-group MyRG
--output table
View Revision Details
az containerapp revision show
--name <revision-name>
--app myapp-gpu
--resource-group MyRG
Restart Container App
az containerapp update
--name myapp-gpu
--resource-group MyRG
--force-restart
GPU Not Available
If GPU is not provisioning:
-
Check region availability: Not all regions support GPU
-
Verify quota: Request quota increase if needed
-
Check workload profile: Ensure GPU workload profile is created
Best Practices
✓ Use scale-to-zero for intermittent workloads ✓ Implement health probes (liveness and readiness) ✓ Use managed identities for authentication ✓ Store secrets in Azure Key Vault ✓ Enable Dapr for microservices patterns ✓ Configure appropriate scaling rules ✓ Monitor GPU utilization and adjust resources ✓ Use Container Apps jobs for batch processing ✓ Implement retry logic for transient failures ✓ Use Application Insights for observability
References
-
Container Apps GPU Documentation
-
Dapr Integration
-
Scaling Rules
-
Build 2025 Announcements
Azure Container Apps with GPU support provides the ultimate serverless platform for AI/ML workloads!