prometheus

Prometheus metrics and PromQL queries. Use when writing PromQL queries, creating recording or alerting rules, debugging metric scraping issues, or understanding counter/gauge/histogram behavior.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "prometheus" with this command: npx skills add kontrolplane/skills/kontrolplane-skills-prometheus

Prometheus

PromQL Gotchas

Counter Functions (Critical)

Counters only increase. Never use raw counter values—always use rate functions:

rate(http_requests_total[5m])      # Per-second average rate
irate(http_requests_total[5m])     # Instant rate (last 2 points, spiky)
increase(http_requests_total[1h])  # Total increase over range
  • rate() handles counter resets automatically
  • Use rate() for dashboards, irate() only for high-resolution spikes

Range Vector Required

Rate functions need [duration]:

rate(metric[5m])    # Correct
rate(metric)        # Error: expected range vector

Vector Matching

Binary operations require matching labels:

# This fails if label sets differ:
metric_a / metric_b

# Ignore extra labels:
metric_a / ignoring(extra_label) metric_b

# Match on specific labels only:
metric_a / on(common_label) metric_b

Histogram Quantiles

histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)
  • Must use _bucket metric with le label
  • Always wrap in rate() for counters
  • by (le) is required; add other labels as needed: by (le, endpoint)

Common Query Patterns

# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
  / sum(rate(http_requests_total[5m]))

# CPU usage (node_exporter)
100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]) * 100)

# Memory usage
1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

# Container memory (Kubernetes)
sum by (pod) (container_memory_working_set_bytes{container!=""})

Alerting Rules

groups:
  - name: example
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
            / sum(rate(http_requests_total[5m])) by (job)
            > 0.05
        for: 5m          # Must be firing for this duration
        labels:
          severity: warning
        annotations:
          summary: "Error rate {{ $value | humanizePercentage }} on {{ $labels.job }}"

for Clause

  • Prevents flapping on brief spikes
  • Alert stays "pending" until duration met
  • Missing for = immediate alerting

Recording Rules

Pre-compute expensive queries:

rules:
  - record: job:http_requests:rate5m
    expr: sum by (job) (rate(http_requests_total[5m]))

Naming convention: level:metric:operations

Staleness

  • Samples older than 5 minutes are "stale"
  • up == 0 only fires if target was recently scraped
  • Use absent(metric) to detect missing metrics entirely

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

kyverno

No summary provided by upstream source.

Repository SourceNeeds Review
General

prometheus

No summary provided by upstream source.

Repository SourceNeeds Review
General

grafana

No summary provided by upstream source.

Repository SourceNeeds Review
General

kubernetes

No summary provided by upstream source.

Repository SourceNeeds Review