Observability
Implement the three pillars of observability: logs, metrics, and traces.
The Three Pillars
Pillar Purpose Key Question
Logs Discrete events with context What happened?
Metrics Aggregated measurements How much/many?
Traces Request flow across services Where did time go?
Quick Pick
-
Debug specific request? → Logs + Traces
-
Alert on thresholds? → Metrics
-
Understand system health? → All three
-
Starting from zero? → Logs first, then metrics, then traces
Key Principles
-
Use structured logging (JSON) with correlation IDs across all services
-
Instrument the four golden signals: latency, traffic, errors, saturation
-
Define SLIs/SLOs before building dashboards or alerts
-
Alert on symptoms (user impact), not causes (CPU usage)
Quick Start Checklist
-
Set up structured logger (Pino recommended for Node.js)
-
Add request correlation IDs (middleware)
-
Instrument key metrics (RED: Rate, Errors, Duration)
-
Configure distributed tracing (OpenTelemetry)
-
Create dashboards for golden signals
-
Set up alerts with appropriate severity levels
References
Reference Description
logging-patterns.md Structured logging, log levels, Pino/Winston setup
metrics-guide.md Prometheus, counters/gauges/histograms, golden signals
tracing-basics.md OpenTelemetry, distributed tracing, span design
alerting-guide.md Alert design, SLIs/SLOs, severity levels, dashboards