monitoring-observability

Monitoring & Observability

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "monitoring-observability" with this command: npx skills add yonatangross/orchestkit/yonatangross-orchestkit-monitoring-observability

Monitoring & Observability

Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

Category Rules Impact When to Use

Infrastructure Monitoring 3 CRITICAL Prometheus metrics, Grafana dashboards, alerting rules

LLM Observability 3 HIGH Langfuse tracing, cost tracking, evaluation scoring

Drift Detection 3 HIGH Statistical drift, quality regression, drift alerting

Silent Failures 3 HIGH Tool skipping, quality degradation, loop/token spike alerting

Total: 12 rules across 4 categories

Quick Start

Prometheus metrics with RED method

from prometheus_client import Counter, Histogram

http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status']) http_duration = Histogram('http_request_duration_seconds', 'Request latency', buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])

Langfuse LLM tracing

from langfuse import observe, get_client

@observe() async def analyze_content(content: str): get_client().update_current_trace( user_id="user_123", session_id="session_abc", tags=["production", "orchestkit"], ) return await llm.generate(content)

PSI drift detection

import numpy as np

psi_score = calculate_psi(baseline_scores, current_scores) if psi_score >= 0.25: alert("Significant quality drift detected!")

Infrastructure Monitoring

Prometheus metrics, Grafana dashboards, and alerting for application health.

Rule File Key Pattern

Prometheus Metrics rules/monitoring-prometheus.md

RED method, counters, histograms, cardinality

Grafana Dashboards rules/monitoring-grafana.md

Golden Signals, SLO/SLI, health checks

Alerting Rules rules/monitoring-alerting.md

Severity levels, grouping, escalation, fatigue prevention

LLM Observability

Langfuse-based tracing, cost tracking, and evaluation for LLM applications.

Rule File Key Pattern

Langfuse Traces rules/llm-langfuse-traces.md

@observe decorator, OTEL spans, agent graphs

Cost Tracking rules/llm-cost-tracking.md

Token usage, spend alerts, Metrics API

Eval Scoring rules/llm-eval-scoring.md

Custom scores, evaluator tracing, quality monitoring

Drift Detection

Statistical and quality drift detection for production LLM systems.

Rule File Key Pattern

Statistical Drift rules/drift-statistical.md

PSI, KS test, KL divergence, EWMA

Quality Drift rules/drift-quality.md

Score regression, baseline comparison, canary prompts

Drift Alerting rules/drift-alerting.md

Dynamic thresholds, correlation, anti-patterns

Silent Failures

Detection and alerting for silent failures in LLM agents.

Rule File Key Pattern

Tool Skipping rules/silent-tool-skipping.md

Expected vs actual tool calls, Langfuse traces

Quality Degradation rules/silent-degraded-quality.md

Heuristics + LLM-as-judge, z-score baselines

Silent Alerting rules/silent-alerting.md

Loop detection, token spikes, escalation workflow

Key Decisions

Decision Recommendation Rationale

Metric methodology RED method (Rate, Errors, Duration) Industry standard, covers essential service health

Log format Structured JSON Machine-parseable, supports log aggregation

Tracing OpenTelemetry Vendor-neutral, auto-instrumentation, broad ecosystem

LLM observability Langfuse (not LangSmith) Open-source, self-hosted, built-in prompt management

LLM tracing API @observe

  • get_client()

OTEL-native, automatic span creation

Drift method PSI for production, KS for small samples PSI is stable for large datasets, KS more sensitive

Threshold strategy Dynamic (95th percentile) over static Reduces alert fatigue, context-aware

Alert severity 4 levels (Critical, High, Medium, Low) Clear escalation paths, appropriate response times

Detailed Documentation

Resource Description

${CLAUDE_SKILL_DIR}/references/

Logging, metrics, tracing, Langfuse, drift analysis guides

${CLAUDE_SKILL_DIR}/checklists/

Implementation checklists for monitoring and Langfuse setup

${CLAUDE_SKILL_DIR}/examples/

Real-world monitoring dashboard and trace examples

${CLAUDE_SKILL_DIR}/scripts/

Templates: Prometheus, OpenTelemetry, health checks, Langfuse

Related Skills

  • defense-in-depth

  • Layer 8 observability as part of security architecture

  • devops-deployment

  • Observability integration with CI/CD and Kubernetes

  • resilience-patterns

  • Monitoring circuit breakers and failure scenarios

  • llm-evaluation

  • Evaluation patterns that integrate with Langfuse scoring

  • caching

  • Caching strategies that reduce costs tracked by Langfuse

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ui-components

No summary provided by upstream source.

Repository SourceNeeds Review
General

responsive-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

domain-driven-design

No summary provided by upstream source.

Repository SourceNeeds Review
General

dashboard-patterns

No summary provided by upstream source.

Repository SourceNeeds Review