Telemetry Skill

Access the CTO observability stack for logs, metrics, and dashboards.

When to Use

Querying pod logs via Loki
Checking metrics via Prometheus
Viewing dashboards in Grafana
Debugging agent failures
Monitoring Play workflow health

Stack Overview

Service Port Purpose

Prometheus 9090 Metrics collection and querying

Loki 3100 Log aggregation (like Prometheus for logs)

Grafana 3000 Dashboards and visualization

Port Forwards (Required for Local Access)

kubectl port-forward svc/prometheus-server -n observability 9090:80 kubectl port-forward svc/loki-gateway -n observability 3100:80 kubectl port-forward svc/grafana -n observability 3000:80

MCP Tools Available

Prometheus Tools

Tool Purpose

prometheus_query

Instant query (current value)

prometheus_query_range

Range query (time series)

prometheus_labels

List all label names

prometheus_series

Find series matching labels

Loki Tools

Tool Purpose

loki_query

Query logs with LogQL

loki_labels

List all label names

loki_label_values

Get values for a label

Grafana Tools

Tool Purpose

grafana_search_dashboards

Find dashboards by name

grafana_get_dashboard

Get dashboard definition

grafana_query_prometheus

Query Prometheus via Grafana

grafana_query_loki_logs

Query Loki via Grafana

grafana_list_alert_rules

List configured alerts

Loki (Logs)

LogQL Basics

All logs from CTO namespace

{namespace="cto"}

Filter by pod name

{namespace="cto", pod=~"coderun-.*"}

Search for errors

{namespace="cto"} |= "error"

JSON parsing

{namespace="cto"} | json | level="error"

Regex filter

{namespace="cto"} |~ "tool.*mismatch"

Common Queries for CTO

All CodeRun pod logs

{namespace="cto", app="coderun"}

Morgan intake logs

{namespace="cto", pod=~"intake-.*"}

Play workflow logs

{namespace="cto", pod=~"play-.*"}

Errors only

{namespace="cto"} |= "error" | json

Tool inventory issues (A10)

{namespace="cto"} |~ "tool.*(mismatch|missing)"

MCP initialization failures (A12)

{namespace="cto"} |~ "mcp.*failed"

Config issues (A11)

{namespace="cto"} |~ "cto-config.*(missing|invalid)"

Via MCP Tool

loki_query(query='{namespace="cto"} |= "error"', limit=100)

Via curl

curl -G "http://localhost:3100/loki/api/v1/query_range"
--data-urlencode 'query={namespace="cto"} |= "error"'
--data-urlencode 'limit=100' | jq

Prometheus (Metrics)

PromQL Basics

Current CPU usage

container_cpu_usage_seconds_total{namespace="cto"}

Memory usage

container_memory_usage_bytes{namespace="cto"}

Rate of requests

rate(http_requests_total{namespace="cto"}[5m])

Pod restarts

kube_pod_container_status_restarts_total{namespace="cto"}

Common Queries for CTO

CodeRun pod count

count(kube_pod_info{namespace="cto", pod=~"coderun-.*"})

Memory by pod

container_memory_usage_bytes{namespace="cto", container!=""}

CPU by pod

rate(container_cpu_usage_seconds_total{namespace="cto"}[5m])

OOM killed containers

kube_pod_container_status_last_terminated_reason{namespace="cto", reason="OOMKilled"}

Pod restart count

sum(kube_pod_container_status_restarts_total{namespace="cto"}) by (pod)

Via MCP Tool

prometheus_query(query='count(kube_pod_info{namespace="cto"})')

Via curl

curl "http://localhost:9090/api/v1/query"
--data-urlencode 'query=kube_pod_info{namespace="cto"}' | jq

Grafana (Dashboards)

Access

URL: http://localhost:3000
Default credentials: admin/admin (or configured)

Common Dashboards

Dashboard Purpose

Kubernetes / Pods Pod resource usage

Loki / Logs Log explorer

CTO Overview Platform health (if configured)

Via MCP Tool

grafana_search_dashboards(query="kubernetes") grafana_get_dashboard(uid="abc123")

kubectl Alternatives

When MCP tools aren't available, use kubectl directly:

Stream Logs

All CTO pods

kubectl logs -n cto -l app.kubernetes.io/part-of=cto -f --tail=100

Specific CodeRun

kubectl logs -n cto -l app=coderun -f

With grep

kubectl logs -n cto -l app=coderun -f | grep -E "error|mismatch|failed"

Get Pod Status

kubectl get pods -n cto -o wide kubectl describe pod -n cto <pod-name>

Events

kubectl get events -n cto --sort-by='.lastTimestamp'

Healer Integration

Healer uses Loki to watch for patterns:

// From crates/healer/src/scanner.rs // Patterns that trigger alerts: "tool\s+inventory\s+mismatch" // A10 "cto-config.*(missing|invalid)" // A11 "mcp.*failed\s+to\s+initialize" // A12

Query Healer-Relevant Logs

All Healer detection patterns

{namespace="cto"} |~ "tool.mismatch|cto-config.(missing|invalid)|mcp.*failed"

Troubleshooting

No Logs Appearing

Check port forward is running: lsof -i :3100
Verify Loki pods: kubectl get pods -n observability -l app=loki
Check label: loki_labels() to see available labels

Metrics Not Found

Check port forward: lsof -i :9090
Verify Prometheus: kubectl get pods -n observability -l app=prometheus
List metrics: prometheus_labels() or browse http://localhost:9090/targets

Grafana Not Loading

Check port forward: lsof -i :3000
Verify pod: kubectl get pods -n observability -l app.kubernetes.io/name=grafana

Reference

Loki LogQL Documentation
Prometheus PromQL Documentation
Grafana Documentation

telemetry

Safety Notice

Copy this and send it to your AI assistant to learn

All logs from CTO namespace

Filter by pod name

Search for errors

JSON parsing

Regex filter

All CodeRun pod logs

Morgan intake logs

Play workflow logs

Errors only

Tool inventory issues (A10)

MCP initialization failures (A12)

Config issues (A11)

Current CPU usage

Memory usage

Rate of requests

Pod restarts

CodeRun pod count

Memory by pod

CPU by pod

OOM killed containers

Pod restart count

All CTO pods

Specific CodeRun

With grep

All Healer detection patterns

Source Transparency

Related Skills

expo-patterns

better-auth-expo

elysia-llm-docs

frontend-excellence