Grafana Tempo Skill

Comprehensive guide for Grafana Tempo - the cost-effective, high-scale distributed tracing backend designed for OpenTelemetry.

What is Tempo?

Tempo is a high-scale distributed tracing backend that:

Trace-ID lookup model - No indexing of every attribute, keeps ingestion fast and storage costs low
OpenTelemetry native - First-class support for OTLP protocol
Object storage backed - Stores traces in affordable S3, GCS, or Azure Blob Storage
TraceQL query language - Powerful query language inspired by PromQL and LogQL
Apache Parquet format - 5-10x less data pulled per query vs legacy formats
Multi-tenant by default - Built-in tenant isolation via X-Scope-OrgID header

Architecture Overview

Core Components

Component Purpose

Distributor Entry point for trace data, routes to ingesters via consistent hash ring

Ingester Buffers traces in memory, creates Parquet blocks, flushes to storage

Query Frontend Query orchestration, shards blockID space, coordinates queriers

Querier Locates traces in ingesters or storage using bloom filters

Compactor Compresses blocks, deduplicates data, manages retention

Metrics Generator Optional: derives metrics from traces

Data Flow

Write Path:

Applications → Collector → Distributor → Ingester → Object Storage ↓ Consistent Hash Ring (routes by traceID)

Read Path:

Query Request → Query Frontend → Queriers → Ingesters (recent data) ↓ ↓ Block Sharding Object Storage (historical data) ↓ ↓ Parallel Querier Work Bloom Filters + Indexes

Deployment Modes

Monolithic Mode (-target=all )

All components in single process
Best for: Local testing, small-scale deployments
Cannot horizontally scale component count
Scale by increasing replicas

Scalable Monolithic (-target=scalable-single-binary )

All components in one process with horizontal scaling
Each instance runs all components
Good for development with scaling needs

Microservices Mode (Distributed) - Recommended for Production

Using tempo-distributed Helm chart

distributor: replicas: 3

ingester: replicas: 3

querier: replicas: 2

queryFrontend: replicas: 2

compactor: replicas: 1

Helm Deployment

Add Repository

helm repo add grafana https://grafana.github.io/helm-charts helm repo update

Install Distributed Tempo

helm install tempo grafana/tempo-distributed
--namespace monitoring
--values values.yaml

Production Values Example

Storage configuration

storage: trace: backend: azure # or s3, gcs azure: container_name: tempo-traces storage_account_name: mystorageaccount use_federated_token: true # Workload Identity

Distributor

distributor: replicas: 3 resources: requests: cpu: 500m memory: 2Gi limits: memory: 4Gi

Ingester

ingester: replicas: 3 resources: requests: cpu: 1000m memory: 2Gi limits: memory: 8Gi # Spikes to 8GB periodically persistence: enabled: true size: 20Gi

Querier

querier: replicas: 2 resources: requests: cpu: 100m memory: 256Mi limits: memory: 4Gi

Query Frontend

queryFrontend: replicas: 2 resources: requests: cpu: 100m memory: 100Mi limits: memory: 2Gi

Compactor

compactor: replicas: 1 resources: requests: cpu: 500m memory: 2Gi limits: memory: 6Gi

Block retention

compactor: compaction: block_retention: 336h # 14 days

Gateway for external access

gateway: enabled: true replicas: 1

Metrics Generator (optional)

metricsGenerator: enabled: false

Storage Configuration

Azure Blob Storage (Recommended for Azure)

storage: trace: backend: azure azure: container_name: tempo-traces storage_account_name: <storage-account-name> # Option 1: Workload Identity (Recommended) use_federated_token: true # Option 2: User-Assigned Managed Identity use_managed_identity: true user_assigned_id: <identity-client-id> # Option 3: Account Key (Dev only) # storage_account_key: <account-key> endpoint_suffix: blob.core.windows.net hedge_requests_at: 400ms hedge_requests_up_to: 2

AWS S3

storage: trace: backend: s3 s3: bucket: my-tempo-bucket region: us-east-1 endpoint: s3.us-east-1.amazonaws.com # Use IAM roles or access keys access_key: <access-key> secret_key: <secret-key>

Google Cloud Storage

storage: trace: backend: gcs gcs: bucket_name: my-tempo-bucket # Uses Workload Identity or service account

TraceQL Query Language

Basic Queries

Simplest query - all spans

{ }

Filter by service

{ resource.service.name = "frontend" }

Filter by operation

{ span:name = "GET /api/orders" }

Filter by status

{ span:status = error }

Filter by duration

{ span:duration > 500ms }

Multiple conditions

{ resource.service.name = "api" && span:status = error }

Structural Operators

Direct parent-child relationship

{ resource.service.name = "frontend" } > { resource.service.name = "api" }

Ancestor-descendant relationship

{ span:name = "GET /api/products" } >> { span.db.system = "postgresql" }

Sibling relationship

{ span:name = "span-a" } ~ { span:name = "span-b" }

Aggregation Functions

Count spans

{ } | count() > 10

Average duration

{ } | avg(span:duration) > 20ms

Max duration

{ span:status = error } | max(span:duration)

Metrics Functions

Rate of errors

{ span:status = error } | rate()

Count over time

{ span:name = "GET /:endpoint" } | count_over_time()

Percentile latency

{ span:name = "GET /:endpoint" } | quantile_over_time(span:duration, .99)

Group by service

{ span:status = error } | rate() by(resource.service.name)

Top 10 by error rate

{ span:status = error } | rate() by(resource.service.name) | topk(10)

Trace Structure

Intrinsic Fields (colon separator)

Field Description

span:name

Operation name

span:duration

Elapsed time (e.g., "10ms", "1.5s")

span:status

ok , error , or unset

span:kind

server , client , producer , consumer , internal

trace:duration

Total trace duration

trace:rootName

Root span name

trace:rootService

Root span service

Attribute Scopes (period separator)

Scope Example Description

span.

span.http.method

Span-level attributes

resource.

resource.service.name

Resource attributes

event.

event.exception.message

Event attributes

link.

link.traceID

Link attributes

Receiver Endpoints

Protocol Port Endpoint

OTLP gRPC 4317 /v1/traces

OTLP HTTP 4318 /v1/traces

Jaeger gRPC 14250

Jaeger Thrift HTTP 14268 /api/traces

Jaeger Thrift Compact 6831 UDP

Jaeger Thrift Binary 6832 UDP

Zipkin 9411 /api/v2/spans

Multi-Tenancy

Enable multi-tenancy

multitenancy_enabled: true

All requests must include X-Scope-OrgID header

Example:

curl -H "X-Scope-OrgID: tenant-1" http://tempo:3200/api/traces/<traceID>

Azure Identity Configuration

Workload Identity Federation (Recommended)

Enable Workload Identity on AKS:

az aks update
--name <aks-cluster>
--resource-group <rg>
--enable-oidc-issuer
--enable-workload-identity

Create User-Assigned Managed Identity:

az identity create
--name tempo-identity
--resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name tempo-identity --resource-group <rg> --query clientId -o tsv)

Assign Storage Permission:

az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

Create Federated Credential:

az identity federated-credential create
--name tempo-federated
--identity-name tempo-identity
--resource-group <rg>
--issuer <aks-oidc-issuer-url>
--subject system:serviceaccount:monitoring:tempo
--audiences api://AzureADTokenExchange

Configure Helm Values:

serviceAccount: annotations: azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels: azure.workload.identity/use: "true"

storage: trace: azure: use_federated_token: true

Troubleshooting

Common Issues

Container Not Found (Azure)

az storage container create --name tempo-traces --account-name <storage>

Authorization Failure (Azure)

Verify RBAC assignment

az role assignment list --scope <storage-scope>

Assign if missing

az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>

Ingester OOM

ingester: resources: limits: memory: 16Gi # Increase from 8Gi

Query Timeout

querier: query_timeout: 5m max_concurrent_queries: 20

Diagnostic Commands

Check pod status

kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo

Check distributor logs

kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100

Check ingester logs

kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100

Verify readiness

kubectl exec -it <tempo-pod> -n monitoring -- wget -qO- http://localhost:3200/ready

Check ring status

kubectl port-forward svc/tempo-distributor 3200:3200 -n monitoring curl http://localhost:3200/distributor/ring

API Reference

Trace Retrieval

Get trace by ID

GET /api/traces/<traceID>

Search traces (TraceQL)

GET /api/search?q={resource.service.name="api"}

Search tags

GET /api/search/tags GET /api/search/tag/<tag>/values

Health

GET /ready GET /metrics

Reference Documentation

For detailed configuration by topic:

Storage Configuration: Object stores, retention, caching
TraceQL Reference: Query syntax and examples
Configuration Reference: Full configuration manifest

External Resources

Official Tempo Documentation
Tempo Helm Chart
TraceQL Documentation
Tempo GitHub Repository