observability-guidelines

Observability guidelines for distributed systems using OpenTelemetry, tracing, metrics, and structured logging

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "observability-guidelines" with this command: npx skills add mindrally/skills/mindrally-skills-observability-guidelines

Observability Guidelines

Apply these observability principles to ensure comprehensive visibility into distributed systems and microservices.

Core Observability Principles

  • Guide the development of idiomatic, maintainable, and high-performance code with built-in observability
  • Enforce modular design and separation of concerns through Clean Architecture
  • Promote test-driven development and robust observability from the start

OpenTelemetry Integration

  • Use OpenTelemetry for distributed tracing, metrics, and structured logging
  • Start and propagate tracing spans across all service boundaries
  • Use otel.Tracer for creating spans and otel.Meter for collecting metrics
  • Export data to OpenTelemetry Collector, Jaeger, or Prometheus
  • Configure appropriate sampling rates for production environments

Distributed Tracing

  • Trace all incoming requests and propagate context through internal calls
  • Use middleware to instrument HTTP and gRPC endpoints automatically
  • Include trace context in all downstream service calls
  • Create child spans for significant operations within a service
  • Add relevant attributes to spans for debugging and analysis

Metrics Collection

Monitor these key metrics across all services:

  • Request latency: Track p50, p90, p95, and p99 percentiles
  • Throughput: Measure requests per second by endpoint
  • Error rate: Track 4xx and 5xx responses separately
  • Resource usage: Monitor CPU, memory, disk, and network utilization
  • Custom business metrics: Track domain-specific KPIs

Structured Logging

  • Include unique request IDs and trace context in all logs for correlation
  • Use structured logging formats (JSON) for machine parseability
  • Include relevant context: timestamp, service name, trace ID, span ID
  • Log at appropriate levels: DEBUG, INFO, WARN, ERROR
  • Avoid logging sensitive information (PII, credentials)

Architecture Patterns

  • Apply Clean Architecture with handlers, services, repositories, and domain models
  • Use domain-driven design principles for clear boundaries
  • Prioritize interface-driven development with explicit dependency injection
  • Prefer composition over inheritance; favor small, purpose-specific interfaces

Correlation and Context

  • Propagate context through the entire request lifecycle
  • Use correlation IDs for request tracking across services
  • Include service version and deployment information in telemetry
  • Tag traces with relevant business context for filtering
  • Enable trace-to-log and log-to-trace correlation

Alerting and Dashboards

  • Create dashboards for service health and business metrics
  • Set up alerts based on SLOs and error budgets
  • Use anomaly detection for proactive issue identification
  • Document runbooks for common alert scenarios
  • Review and tune alerts regularly to reduce noise

Instrumentation Best Practices

  • Instrument at service boundaries (entry/exit points)
  • Add custom spans for database operations and external calls
  • Include relevant attributes (user ID, request type, etc.)
  • Avoid over-instrumentation that creates noise
  • Use semantic conventions for consistent attribute naming

Production Considerations

  • Configure appropriate sampling rates to balance visibility and cost
  • Use head-based sampling for consistent trace capture
  • Implement tail-based sampling for capturing errors
  • Set retention policies based on debugging needs
  • Monitor observability infrastructure health

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

accessibility-a11y

No summary provided by upstream source.

Repository SourceNeeds Review
General

mysql-best-practices

No summary provided by upstream source.

Repository SourceNeeds Review
General

redis-best-practices

No summary provided by upstream source.

Repository SourceNeeds Review
General

web-scraping

No summary provided by upstream source.

Repository SourceNeeds Review