Grafana Tempo Skill
Comprehensive guide for Grafana Tempo - the cost-effective, high-scale distributed tracing backend designed for OpenTelemetry.
What is Tempo?
Tempo is a high-scale distributed tracing backend that:
-
Trace-ID lookup model - No indexing of every attribute, keeps ingestion fast and storage costs low
-
OpenTelemetry native - First-class support for OTLP protocol
-
Object storage backed - Stores traces in affordable S3, GCS, or Azure Blob Storage
-
TraceQL query language - Powerful query language inspired by PromQL and LogQL
-
Apache Parquet format - 5-10x less data pulled per query vs legacy formats
-
Multi-tenant by default - Built-in tenant isolation via X-Scope-OrgID header
Architecture Overview
Core Components
Component Purpose
Distributor Entry point for trace data, routes to ingesters via consistent hash ring
Ingester Buffers traces in memory, creates Parquet blocks, flushes to storage
Query Frontend Query orchestration, shards blockID space, coordinates queriers
Querier Locates traces in ingesters or storage using bloom filters
Compactor Compresses blocks, deduplicates data, manages retention
Metrics Generator Optional: derives metrics from traces
Data Flow
Write Path:
Applications → Collector → Distributor → Ingester → Object Storage ↓ Consistent Hash Ring (routes by traceID)
Read Path:
Query Request → Query Frontend → Queriers → Ingesters (recent data) ↓ ↓ Block Sharding Object Storage (historical data) ↓ ↓ Parallel Querier Work Bloom Filters + Indexes
Deployment Modes
- Monolithic Mode (-target=all )
-
All components in single process
-
Best for: Local testing, small-scale deployments
-
Cannot horizontally scale component count
-
Scale by increasing replicas
- Scalable Monolithic (-target=scalable-single-binary )
-
All components in one process with horizontal scaling
-
Each instance runs all components
-
Good for development with scaling needs
- Microservices Mode (Distributed) - Recommended for Production
Using tempo-distributed Helm chart
distributor: replicas: 3
ingester: replicas: 3
querier: replicas: 2
queryFrontend: replicas: 2
compactor: replicas: 1
Helm Deployment
Add Repository
helm repo add grafana https://grafana.github.io/helm-charts helm repo update
Install Distributed Tempo
helm install tempo grafana/tempo-distributed
--namespace monitoring
--values values.yaml
Production Values Example
Storage configuration
storage: trace: backend: azure # or s3, gcs azure: container_name: tempo-traces storage_account_name: mystorageaccount use_federated_token: true # Workload Identity
Distributor
distributor: replicas: 3 resources: requests: cpu: 500m memory: 2Gi limits: memory: 4Gi
Ingester
ingester: replicas: 3 resources: requests: cpu: 1000m memory: 2Gi limits: memory: 8Gi # Spikes to 8GB periodically persistence: enabled: true size: 20Gi
Querier
querier: replicas: 2 resources: requests: cpu: 100m memory: 256Mi limits: memory: 4Gi
Query Frontend
queryFrontend: replicas: 2 resources: requests: cpu: 100m memory: 100Mi limits: memory: 2Gi
Compactor
compactor: replicas: 1 resources: requests: cpu: 500m memory: 2Gi limits: memory: 6Gi
Block retention
compactor: compaction: block_retention: 336h # 14 days
Gateway for external access
gateway: enabled: true replicas: 1
Metrics Generator (optional)
metricsGenerator: enabled: false
Storage Configuration
Azure Blob Storage (Recommended for Azure)
storage: trace: backend: azure azure: container_name: tempo-traces storage_account_name: <storage-account-name> # Option 1: Workload Identity (Recommended) use_federated_token: true # Option 2: User-Assigned Managed Identity use_managed_identity: true user_assigned_id: <identity-client-id> # Option 3: Account Key (Dev only) # storage_account_key: <account-key> endpoint_suffix: blob.core.windows.net hedge_requests_at: 400ms hedge_requests_up_to: 2
AWS S3
storage: trace: backend: s3 s3: bucket: my-tempo-bucket region: us-east-1 endpoint: s3.us-east-1.amazonaws.com # Use IAM roles or access keys access_key: <access-key> secret_key: <secret-key>
Google Cloud Storage
storage: trace: backend: gcs gcs: bucket_name: my-tempo-bucket # Uses Workload Identity or service account
TraceQL Query Language
Basic Queries
Simplest query - all spans
{ }
Filter by service
{ resource.service.name = "frontend" }
Filter by operation
{ span:name = "GET /api/orders" }
Filter by status
{ span:status = error }
Filter by duration
{ span:duration > 500ms }
Multiple conditions
{ resource.service.name = "api" && span:status = error }
Structural Operators
Direct parent-child relationship
{ resource.service.name = "frontend" } > { resource.service.name = "api" }
Ancestor-descendant relationship
{ span:name = "GET /api/products" } >> { span.db.system = "postgresql" }
Sibling relationship
{ span:name = "span-a" } ~ { span:name = "span-b" }
Aggregation Functions
Count spans
{ } | count() > 10
Average duration
{ } | avg(span:duration) > 20ms
Max duration
{ span:status = error } | max(span:duration)
Metrics Functions
Rate of errors
{ span:status = error } | rate()
Count over time
{ span:name = "GET /:endpoint" } | count_over_time()
Percentile latency
{ span:name = "GET /:endpoint" } | quantile_over_time(span:duration, .99)
Group by service
{ span:status = error } | rate() by(resource.service.name)
Top 10 by error rate
{ span:status = error } | rate() by(resource.service.name) | topk(10)
Trace Structure
Intrinsic Fields (colon separator)
Field Description
span:name
Operation name
span:duration
Elapsed time (e.g., "10ms", "1.5s")
span:status
ok , error , or unset
span:kind
server , client , producer , consumer , internal
trace:duration
Total trace duration
trace:rootName
Root span name
trace:rootService
Root span service
Attribute Scopes (period separator)
Scope Example Description
span.
span.http.method
Span-level attributes
resource.
resource.service.name
Resource attributes
event.
event.exception.message
Event attributes
link.
link.traceID
Link attributes
Receiver Endpoints
Protocol Port Endpoint
OTLP gRPC 4317 /v1/traces
OTLP HTTP 4318 /v1/traces
Jaeger gRPC 14250
Jaeger Thrift HTTP 14268 /api/traces
Jaeger Thrift Compact 6831 UDP
Jaeger Thrift Binary 6832 UDP
Zipkin 9411 /api/v2/spans
Multi-Tenancy
Enable multi-tenancy
multitenancy_enabled: true
All requests must include X-Scope-OrgID header
Example:
curl -H "X-Scope-OrgID: tenant-1" http://tempo:3200/api/traces/<traceID>
Azure Identity Configuration
Workload Identity Federation (Recommended)
- Enable Workload Identity on AKS:
az aks update
--name <aks-cluster>
--resource-group <rg>
--enable-oidc-issuer
--enable-workload-identity
- Create User-Assigned Managed Identity:
az identity create
--name tempo-identity
--resource-group <rg>
IDENTITY_CLIENT_ID=$(az identity show --name tempo-identity --resource-group <rg> --query clientId -o tsv)
- Assign Storage Permission:
az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
- Create Federated Credential:
az identity federated-credential create
--name tempo-federated
--identity-name tempo-identity
--resource-group <rg>
--issuer <aks-oidc-issuer-url>
--subject system:serviceaccount:monitoring:tempo
--audiences api://AzureADTokenExchange
- Configure Helm Values:
serviceAccount: annotations: azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>
podLabels: azure.workload.identity/use: "true"
storage: trace: azure: use_federated_token: true
Troubleshooting
Common Issues
- Container Not Found (Azure)
az storage container create --name tempo-traces --account-name <storage>
- Authorization Failure (Azure)
Verify RBAC assignment
az role assignment list --scope <storage-scope>
Assign if missing
az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>
- Ingester OOM
ingester: resources: limits: memory: 16Gi # Increase from 8Gi
- Query Timeout
querier: query_timeout: 5m max_concurrent_queries: 20
Diagnostic Commands
Check pod status
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
Check distributor logs
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100
Check ingester logs
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100
Verify readiness
kubectl exec -it <tempo-pod> -n monitoring -- wget -qO- http://localhost:3200/ready
Check ring status
kubectl port-forward svc/tempo-distributor 3200:3200 -n monitoring curl http://localhost:3200/distributor/ring
API Reference
Trace Retrieval
Get trace by ID
GET /api/traces/<traceID>
Search traces (TraceQL)
GET /api/search?q={resource.service.name="api"}
Search tags
GET /api/search/tags GET /api/search/tag/<tag>/values
Health
GET /ready GET /metrics
Reference Documentation
For detailed configuration by topic:
-
Storage Configuration: Object stores, retention, caching
-
TraceQL Reference: Query syntax and examples
-
Configuration Reference: Full configuration manifest
External Resources
-
Official Tempo Documentation
-
Tempo Helm Chart
-
TraceQL Documentation
-
Tempo GitHub Repository