container-apps-gpu-2025

Azure Container Apps GPU Support - 2025 Features

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "container-apps-gpu-2025" with this command: npx skills add josiahsiegel/claude-plugin-marketplace/josiahsiegel-claude-plugin-marketplace-container-apps-gpu-2025

Azure Container Apps GPU Support - 2025 Features

Complete knowledge base for Azure Container Apps with GPU support, serverless capabilities, and Dapr integration (2025 GA features).

Overview

Azure Container Apps is a serverless container platform with native GPU support, Dapr integration, and scale-to-zero capabilities for cost-efficient AI/ML workloads.

Key 2025 Features (Build Announcements)

  1. Serverless GPU (GA)
  • Automatic scaling: Scale GPU workloads based on demand

  • Scale-to-zero: Pay only when GPU is actively used

  • Per-second billing: Granular cost control

  • Optimized cold start: Fast initialization for AI models

  • Reduced operational overhead: No infrastructure management

  1. Dedicated GPU (GA)
  • Consistent performance: Dedicated GPU resources

  • Simplified AI deployment: Easy model hosting

  • Long-running workloads: Ideal for training and continuous inference

  • Multiple GPU types: NVIDIA A100, T4, and more

  1. Dynamic Sessions with GPU (Early Access)
  • Sandboxed execution: Run untrusted AI-generated code

  • Hyper-V isolation: Enhanced security

  • GPU-powered Python interpreter: Handle compute-intensive AI workloads

  • Scale at runtime: Dynamic resource allocation

  1. Foundry Models Integration
  • Deploy AI models directly: During container app creation

  • Ready-to-use models: Pre-configured inference endpoints

  • Azure AI Foundry: Seamless integration

  1. Workflow with Durable Task Scheduler (Preview)
  • Long-running workflows: Reliable orchestration

  • State management: Automatic persistence

  • Event-driven: Trigger workflows from events

  1. Native Azure Functions Support
  • Functions runtime: Run Azure Functions in Container Apps

  • Consistent development: Same code, serverless execution

  • Event triggers: All Functions triggers supported

  1. Dapr Integration (GA)
  • Service discovery: Built-in DNS-based discovery

  • State management: Distributed state stores

  • Pub/sub messaging: Reliable messaging patterns

  • Service invocation: Resilient service-to-service calls

  • Observability: Integrated tracing and metrics

Creating Container Apps with GPU

Basic Container App with Serverless GPU

Create Container Apps environment

az containerapp env create
--name myenv
--resource-group MyRG
--location eastus
--logs-workspace-id <workspace-id>
--logs-workspace-key <workspace-key>

Create Container App with GPU

az containerapp create
--name myapp-gpu
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/ai-model:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--ingress external
--target-port 8080

Production-Ready Container App with GPU

az containerapp create
--name myapp-gpu-prod
--resource-group MyRG
--environment myenv
\

Container configuration

--image myregistry.azurecr.io/ai-model:latest
--registry-server myregistry.azurecr.io
--registry-identity system
\

Resources

--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
\

Scaling

--min-replicas 0
--max-replicas 20
--scale-rule-name http-scaling
--scale-rule-type http
--scale-rule-http-concurrency 10
\

Networking

--ingress external
--target-port 8080
--transport http2
--exposed-port 8080
\

Security

--registry-identity system
--env-vars "AZURE_CLIENT_ID=secretref:client-id"
\

Monitoring

--dapr-app-id myapp
--dapr-app-port 8080
--dapr-app-protocol http
--enable-dapr
\

Identity

--system-assigned

Container Apps Environment Configuration

Environment with Zone Redundancy

az containerapp env create
--name myenv-prod
--resource-group MyRG
--location eastus
--logs-workspace-id <workspace-id>
--logs-workspace-key <workspace-key>
--zone-redundant true
--enable-workload-profiles true

Workload Profiles (Dedicated GPU)

Create environment with workload profiles

az containerapp env create
--name myenv-gpu
--resource-group MyRG
--location eastus
--enable-workload-profiles true

Add GPU workload profile

az containerapp env workload-profile add
--name myenv-gpu
--resource-group MyRG
--workload-profile-name gpu-profile
--workload-profile-type GPU-A100
--min-nodes 0
--max-nodes 10

Create container app with GPU profile

az containerapp create
--name myapp-dedicated-gpu
--resource-group MyRG
--environment myenv-gpu
--workload-profile-name gpu-profile
--image myregistry.azurecr.io/training-job:latest
--cpu 8
--memory 16Gi
--min-replicas 1
--max-replicas 5

GPU Scaling Rules

Custom Prometheus Scaling

az containerapp create
--name myapp-gpu-prometheus
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/ai-model:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--scale-rule-name gpu-utilization
--scale-rule-type custom
--scale-rule-custom-type prometheus
--scale-rule-metadata
serverAddress=http://prometheus.monitoring.svc.cluster.local:9090
metricName=gpu_utilization
threshold=80
query="avg(nvidia_gpu_utilization{app='myapp'})"

Queue-Based Scaling (Azure Service Bus)

az containerapp create
--name myapp-queue-processor
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/batch-processor:latest
--cpu 4
--memory 8Gi
--gpu-type nvidia-t4
--gpu-count 1
--min-replicas 0
--max-replicas 50
--scale-rule-name queue-scaling
--scale-rule-type azure-servicebus
--scale-rule-metadata
queueName=ai-jobs
namespace=myservicebus
messageCount=5
--scale-rule-auth connection=servicebus-connection

Dapr Integration

Enable Dapr on Container App

az containerapp create
--name myapp-dapr
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/myapp:latest
--enable-dapr
--dapr-app-id myapp
--dapr-app-port 8080
--dapr-app-protocol http
--dapr-http-max-request-size 4
--dapr-http-read-buffer-size 4
--dapr-log-level info
--dapr-enable-api-logging true

Dapr State Store (Azure Cosmos DB)

Create Dapr component for state store

apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: statestore spec: type: state.azure.cosmosdb version: v1 metadata: - name: url value: "https://mycosmosdb.documents.azure.com:443/" - name: masterKey secretRef: cosmosdb-key - name: database value: "mydb" - name: collection value: "state"

Create the component

az containerapp env dapr-component set
--name myenv
--resource-group MyRG
--dapr-component-name statestore
--yaml component.yaml

Dapr Pub/Sub (Azure Service Bus)

apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: pubsub spec: type: pubsub.azure.servicebus.topics version: v1 metadata: - name: connectionString secretRef: servicebus-connection - name: consumerID value: "myapp"

Service-to-Service Invocation

Python example using Dapr SDK

from dapr.clients import DaprClient

with DaprClient() as client: # Invoke another service response = client.invoke_method( app_id='other-service', method_name='process', data='{"input": "data"}' )

# Save state
client.save_state(
    store_name='statestore',
    key='mykey',
    value='myvalue'
)

# Publish message
client.publish_event(
    pubsub_name='pubsub',
    topic_name='orders',
    data='{"orderId": "123"}'
)

AI Model Deployment Patterns

OpenAI-Compatible Endpoint

Dockerfile for vLLM model serving

FROM vllm/vllm-openai:latest

ENV MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" ENV GPU_MEMORY_UTILIZATION=0.9 ENV MAX_MODEL_LEN=4096

CMD ["--model", "${MODEL_NAME}",
"--gpu-memory-utilization", "${GPU_MEMORY_UTILIZATION}",
"--max-model-len", "${MAX_MODEL_LEN}",
"--port", "8080"]

Deploy vLLM model

az containerapp create
--name llama-inference
--resource-group MyRG
--environment myenv
--image vllm/vllm-openai:latest
--cpu 8
--memory 32Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 1
--max-replicas 5
--target-port 8080
--ingress external
--env-vars
MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
GPU_MEMORY_UTILIZATION="0.9"
HF_TOKEN=secretref:huggingface-token

Stable Diffusion Image Generation

az containerapp create
--name stable-diffusion
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/stable-diffusion:latest
--cpu 4
--memory 16Gi
--gpu-type nvidia-a100
--gpu-count 1
--min-replicas 0
--max-replicas 10
--target-port 7860
--ingress external
--scale-rule-name http-scaling
--scale-rule-type http
--scale-rule-http-concurrency 1

Batch Processing Job

az containerapp job create
--name batch-training-job
--resource-group MyRG
--environment myenv
--trigger-type Manual
--image myregistry.azurecr.io/training:latest
--cpu 8
--memory 32Gi
--gpu-type nvidia-a100
--gpu-count 2
--parallelism 1
--replica-timeout 7200
--replica-retry-limit 3
--env-vars
DATASET_URL="https://mystorage.blob.core.windows.net/datasets/train.csv"
MODEL_OUTPUT="https://mystorage.blob.core.windows.net/models/"
EPOCHS="100"

Execute job

az containerapp job start
--name batch-training-job
--resource-group MyRG

Monitoring and Observability

Application Insights Integration

az containerapp create
--name myapp-monitored
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/myapp:latest
--env-vars
APPLICATIONINSIGHTS_CONNECTION_STRING=secretref:appinsights-connection

Query Logs

Stream logs

az containerapp logs show
--name myapp-gpu
--resource-group MyRG
--follow

Query with Log Analytics

az monitor log-analytics query
--workspace <workspace-id>
--analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == 'myapp-gpu' | take 100"

Metrics and Alerts

Create metric alert for GPU usage

az monitor metrics alert create
--name high-gpu-usage
--resource-group MyRG
--scopes $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--condition "avg Requests > 100"
--window-size 5m
--evaluation-frequency 1m
--action <action-group-id>

Security Best Practices

Managed Identity

Create with system-assigned identity

az containerapp create
--name myapp-identity
--resource-group MyRG
--environment myenv
--system-assigned
--image myregistry.azurecr.io/myapp:latest

Get identity principal ID

IDENTITY_ID=$(az containerapp show -g MyRG -n myapp-identity --query identity.principalId -o tsv)

Assign role to access Key Vault

az role assignment create
--assignee $IDENTITY_ID
--role "Key Vault Secrets User"
--scope /subscriptions/<sub-id>/resourceGroups/MyRG/providers/Microsoft.KeyVault/vaults/mykeyvault

Use user-assigned identity

az identity create --name myapp-identity --resource-group MyRG IDENTITY_RESOURCE_ID=$(az identity show -g MyRG -n myapp-identity --query id -o tsv)

az containerapp create
--name myapp-user-identity
--resource-group MyRG
--environment myenv
--user-assigned $IDENTITY_RESOURCE_ID
--image myregistry.azurecr.io/myapp:latest

Secret Management

Add secrets

az containerapp secret set
--name myapp-gpu
--resource-group MyRG
--secrets
huggingface-token="<token>"
api-key="<key>"

Reference secrets in environment variables

az containerapp update
--name myapp-gpu
--resource-group MyRG
--set-env-vars
HF_TOKEN=secretref:huggingface-token
API_KEY=secretref:api-key

Cost Optimization

Scale-to-Zero Configuration

az containerapp create
--name myapp-scale-zero
--resource-group MyRG
--environment myenv
--image myregistry.azurecr.io/myapp:latest
--min-replicas 0
--max-replicas 10
--scale-rule-name http-scaling
--scale-rule-type http
--scale-rule-http-concurrency 10

Cost savings: Pay only when requests are being processed. GPU costs are per-second when active.

Right-Sizing Resources

Start with minimal resources

--cpu 2 --memory 4Gi --gpu-count 1

Monitor and adjust based on actual usage

az monitor metrics list
--resource $(az containerapp show -g MyRG -n myapp-gpu --query id -o tsv)
--metric "CpuPercentage,MemoryPercentage"

Use Spot/Preemptible GPUs (Future Feature)

When available, configure spot instances for non-critical workloads to save up to 80% on GPU costs.

Troubleshooting

Check Revision Status

az containerapp revision list
--name myapp-gpu
--resource-group MyRG
--output table

View Revision Details

az containerapp revision show
--name <revision-name>
--app myapp-gpu
--resource-group MyRG

Restart Container App

az containerapp update
--name myapp-gpu
--resource-group MyRG
--force-restart

GPU Not Available

If GPU is not provisioning:

  • Check region availability: Not all regions support GPU

  • Verify quota: Request quota increase if needed

  • Check workload profile: Ensure GPU workload profile is created

Best Practices

✓ Use scale-to-zero for intermittent workloads ✓ Implement health probes (liveness and readiness) ✓ Use managed identities for authentication ✓ Store secrets in Azure Key Vault ✓ Enable Dapr for microservices patterns ✓ Configure appropriate scaling rules ✓ Monitor GPU utilization and adjust resources ✓ Use Container Apps jobs for batch processing ✓ Implement retry logic for transient failures ✓ Use Application Insights for observability

References

  • Container Apps GPU Documentation

  • Dapr Integration

  • Scaling Rules

  • Build 2025 Announcements

Azure Container Apps with GPU support provides the ultimate serverless platform for AI/ML workloads!

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

tailwindcss-advanced-layouts

No summary provided by upstream source.

Repository SourceNeeds Review
General

tailwindcss-animations

No summary provided by upstream source.

Repository SourceNeeds Review
General

tailwindcss-mobile-first

No summary provided by upstream source.

Repository SourceNeeds Review
General

docker-best-practices

No summary provided by upstream source.

Repository SourceNeeds Review