observability-lgtm

Set up a full local LGTM observability stack (Loki + Grafana + Tempo + Prometheus + Alloy) for FastAPI apps. One Docker Compose, one Python import, unified dashboards.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "observability-lgtm" with this command: npx skills add nissan/observability-lgtm

observability-lgtm

Set up a full local observability stack (Loki + Grafana + Tempo + Prometheus + Alloy) for FastAPI apps on macOS (Apple Silicon) or Linux. One command to start, one import to instrument any app. Logs → Loki, metrics → Prometheus, traces → Tempo, all unified in Grafana.

When to use

  • User is building a FastAPI web app and wants logs, metrics, and traces
  • User wants a local Grafana dashboard without setting up ELK (too heavy)
  • User wants to correlate logs ↔ traces ↔ metrics in one UI
  • User has multiple local apps and wants universal observability

When NOT to use

  • Production cloud deployments (use managed Grafana Cloud or Datadog instead)
  • Non-Python apps (the Python lib only works for FastAPI; the stack itself is language-agnostic)
  • When Docker is not available

Prerequisites

  • Docker + Docker Compose v2 installed
  • Python 3.10+ (for the instrumentation lib)
  • FastAPI app to instrument

What gets installed

ServicePortPurpose
Grafana3000Dashboards — no login in dev mode
Prometheus9091Metrics scraping (avoids 9090 if MinIO running)
Loki3300Log storage (avoids 3100 if Langfuse running)
Tempo gRPC4317OTLP trace receiver
Tempo HTTP4318OTLP HTTP alternative
Alloy UI12345Agent status

Steps

Step 1 — Check for port conflicts

lsof -iTCP -sTCP:LISTEN -n -P 2>/dev/null | grep -E ":(3000|3300|9091|4317|4318|12345)" | awk '{print $9, $1}'

If any of the ports above are in use, update the relevant port in docker-compose.yml and the matching url: in config/grafana/provisioning/datasources/datasources.yml. Common conflicts: Langfuse on 3100, MinIO on 9090.

Step 2 — Copy the stack

Copy these files from the skill directory into a projects/observability/ folder in the workspace:

  • assets/docker-compose.yml
  • assets/config/ (entire directory tree)
  • assets/lib/observability.py
  • assets/scripts/register_app.sh
mkdir -p projects/observability
cp -r SKILL_DIR/assets/* projects/observability/
mkdir -p projects/observability/logs
touch projects/observability/logs/.gitkeep
chmod +x projects/observability/scripts/register_app.sh

Step 3 — Start the stack

cd projects/observability
docker compose up -d

Wait ~15 seconds for all services to start, then verify:

curl -s -o /dev/null -w "Grafana: %{http_code}\n"    http://localhost:3000/api/health
curl -s -o /dev/null -w "Prometheus: %{http_code}\n" http://localhost:9091/-/healthy
curl -s -o /dev/null -w "Loki: %{http_code}\n"       http://localhost:3300/ready
curl -s -o /dev/null -w "Tempo: %{http_code}\n"      http://localhost:4318/ready

All should return 200. If Loki or Tempo return 503, wait 10 more seconds and retry (they have a slower startup than Grafana/Prometheus).

Step 4 — Install Python deps for the app

pip install \
  "prometheus-fastapi-instrumentator>=7.0.0" \
  "opentelemetry-sdk>=1.25.0" \
  "opentelemetry-exporter-otlp-proto-grpc>=1.25.0" \
  "opentelemetry-instrumentation-fastapi>=0.46b0" \
  "python-json-logger>=2.0.7"

Step 5 — Instrument the FastAPI app

Add to the app's app.py (or main.py), just after app = FastAPI(...):

import sys
sys.path.insert(0, "path/to/projects/observability/lib")
from observability import setup_observability
logger = setup_observability(app, service_name="my-service-name")

That's it. The app now:

  • Exposes /metrics for Prometheus
  • Writes JSON logs to projects/observability/logs/my-service-name/app.log
  • Sends traces to Tempo on localhost:4317

Step 6 — Register with Prometheus

cd projects/observability
./scripts/register_app.sh my-service-name <port>
# e.g.: ./scripts/register_app.sh image-gen-studio 7860

Prometheus hot-reloads the target within 30 seconds. Verify:

curl -s "http://localhost:9091/api/v1/targets" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for t in data['data']['activeTargets']:
    svc = t['labels'].get('service', '')
    print(svc, '->', t['health'])
"

Step 7 — Open Grafana

Open http://localhost:3000

The FastAPI — App Overview dashboard is pre-loaded. Select your service from the dropdown at the top. You'll see:

  • Request rate (req/s)
  • Error rate (%)
  • Latency p50/p95/p99
  • Requests by endpoint
  • HTTP status codes
  • Live log panel (Loki)

To jump from a log line to its trace: click the trace_id link in the log detail panel. It opens the full trace in Tempo automatically (datasource pre-wired).

Step 8 — Import additional dashboards (optional)

In Grafana → Dashboards → Import:

  • 16110 — FastAPI Observability (richer alternative to the built-in)
  • 13407 — Loki Logs Overview
  • 16112 — Tempo Service Graph (service dependency map)

Useful commands

# Reload Prometheus config after registering a new app:
curl -s -X POST http://localhost:9091/-/reload

# Restart a single service without losing data:
docker compose -f projects/observability/docker-compose.yml restart grafana

# Stop everything (data volumes preserved):
docker compose -f projects/observability/docker-compose.yml down

# Nuclear reset (wipes all stored data):
docker compose -f projects/observability/docker-compose.yml down -v

# Check Alloy log shipping status:
open http://localhost:12345

Manual tracing (optional)

from observability import get_tracer
tracer = get_tracer(__name__)

@app.get("/expensive-endpoint")
async def handler():
    with tracer.start_as_current_span("db-query") as span:
        span.set_attribute("db.table", "users")
        result = await db.query(...)
    return result

Log/trace correlation

The OTel instrumentation injects trace_id into every log record. Grafana Loki is pre-configured with a derived field that turns "trace_id":"abc123" into a clickable link to the Tempo trace.

To manually include trace context in your own log calls:

from opentelemetry import trace

def trace_ctx() -> dict:
    ctx = trace.get_current_span().get_span_context()
    return {"trace_id": format(ctx.trace_id, "032x")} if ctx.is_valid else {}

logger.info("Processing request", extra=trace_ctx())

Notes

  • Logs are written to projects/observability/logs/<service>/app.log as JSON. Alloy tails these files and ships to Loki — no code changes needed beyond setup_observability().
  • All observability is local — no data leaves the machine.
  • data_classification: LOCAL_ONLY is the default for all traces/logs.
  • The Alloy config drops DEBUG-level logs by default. Edit config/alloy/config.alloy to remove the stage.drop block if you need debug logs.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Browser Web Search

一行命令搜遍全网 — 55 个平台 91+ 个命令,头条、知乎、豆瓣、YouTube、GitHub、Reddit、Hacker News 等。专为 OpenClaw 设计,复用浏览器登录态,返回结构化 JSON,天然适配 AI Agent 工具调用。

Registry SourceRecently Updated
Coding

Nox Influencer - Creator Discovery & Influencer Marketing

Runs NoxInfluencer creator and marketing-ops workflows via CLI, including creator discovery for influencer marketing, creator marketing, UGC, social media ma...

Registry SourceRecently Updated
Coding

AntV Skills

Generate G2 v5 chart code. Use when user asks for G2 charts, bar charts, line charts, pie charts, scatter plots, area charts, or any data visualization with...

Registry SourceRecently Updated
1851lxfu1
Coding

TCM Clinic - English Edition

A full-featured management tool for solo Traditional Chinese Medicine (TCM) practitioners. Manages patient records, medical charts (Four Diagnostic Methods,...

Registry SourceRecently Updated
1090slamw