OpenTelemetry Implementation Guide
Overview
OpenTelemetry (OTel) is a vendor-neutral observability framework for instrumenting, generating, collecting, and exporting telemetry data (traces, metrics, logs). This skill provides guidance for implementing OTEL in Kubernetes environments.
Quick Start
Deploy OTEL Collector on Kubernetes
Add Helm repo
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts helm repo update
Install with basic config
helm install otel-collector open-telemetry/opentelemetry-collector
--namespace monitoring --create-namespace
--set mode=daemonset
Send Test Data via OTLP
gRPC endpoint: 4317, HTTP endpoint: 4318
curl -X POST http://otel-collector:4318/v1/traces
-H "Content-Type: application/json"
-d '{"resourceSpans":[]}'
Core Concepts
Signals: Three types of telemetry data:
-
Traces: Distributed request flows across services
-
Metrics: Numerical measurements (counters, gauges, histograms)
-
Logs: Event records with structured/unstructured data
Collector Components:
-
Receivers: Accept data (OTLP, Prometheus, Jaeger, Zipkin)
-
Processors: Transform data (batch, memory_limiter, k8sattributes)
-
Exporters: Send data (prometheusremotewrite, loki, otlp)
-
Extensions: Add capabilities (health_check, pprof, zpages)
Collector Configuration
Basic Pipeline Structure
config: receivers: otlp: protocols: grpc: endpoint: ${env:MY_POD_IP}:4317 http: endpoint: ${env:MY_POD_IP}:4318
processors: batch: timeout: 10s send_batch_size: 1024 memory_limiter: check_interval: 5s limit_percentage: 80 spike_limit_percentage: 25
exporters: prometheusremotewrite: endpoint: "http://prometheus:9090/api/v1/write" loki: endpoint: "http://loki:3100/loki/api/v1/push"
service: pipelines: metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [prometheusremotewrite] logs: receivers: [otlp] processors: [memory_limiter, batch] exporters: [loki] traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp/tempo]
Kubernetes Attributes Enrichment
processors: k8sattributes: auth_type: "serviceAccount" passthrough: false filter: node_from_env_var: ${env:K8S_NODE_NAME} extract: metadata: - k8s.pod.name - k8s.namespace.name - k8s.deployment.name - k8s.node.name
Deployment Modes
Mode Use Case Pros Cons
DaemonSet Node-level collection Full coverage, host metrics Higher resource usage
Deployment Centralized gateway Scalable, easier management Single point of failure
Sidecar Per-pod collection Isolated, fine-grained Resource overhead per pod
Common Patterns
Development Environment
-
Enable debug exporter for visibility
-
Lower resource limits (250m CPU, 512Mi memory)
-
Include spot instance tolerations for cost savings
Production Environment
-
Implement sampling (10-50% for traces)
-
Higher batch sizes (2048-4096)
-
Enable autoscaling and PodDisruptionBudget
-
Use TLS for all endpoints
Detailed References
For in-depth guidance, see:
-
Collector Configuration: COLLECTOR.md
-
Kubernetes Deployment: KUBERNETES.md
-
Troubleshooting: TROUBLESHOOTING.md
-
Instrumentation: INSTRUMENTATION.md
Validation Commands
Check collector pods
kubectl get pods -n monitoring -l app.kubernetes.io/name=otel-collector
View collector logs
kubectl logs -n monitoring -l app.kubernetes.io/name=otel-collector --tail=100
Test OTLP endpoint
kubectl run test-otlp --image=curlimages/curl:latest --rm -it --
curl -v http://otel-collector.monitoring:4318/v1/traces
Validate config syntax
otelcol validate --config=config.yaml
Key Helm Chart Values
mode: "daemonset" # or "deployment" presets: logsCollection: enabled: true hostMetrics: enabled: true kubernetesAttributes: enabled: true kubeletMetrics: enabled: true useGOMEMLIMIT: true resources: limits: cpu: 500m memory: 1Gi requests: cpu: 100m memory: 256Mi