Grafana Observability
Full access to Grafana instances (self-hosted or Grafana Cloud) for network infrastructure observability: dashboards, Prometheus metrics (PromQL), Loki logs (LogQL), alerting rules, incident management, OnCall schedules, annotations, and panel image rendering. 75+ tools via the official Grafana MCP server.
MCP Server
Property Value
Source grafana/mcp-grafana
Transport stdio (default), SSE, or streamable-http
Language Go (runs via uvx mcp-grafana )
Tools 75+ (dashboards, Prometheus, Loki, alerting, incidents, OnCall, annotations, admin)
Auth Service account token (preferred) or username/password
Requires Grafana 9.0+, service account with Editor role or granular RBAC
How to Run
stdio mode (default — used by NetClaw)
uvx mcp-grafana
Read-only mode (prevents dashboard/alert modifications)
uvx mcp-grafana --disable-write
Environment Variables
Variable Required Example Description
GRAFANA_URL
Yes http://grafana.example.com:3000
Grafana instance URL
GRAFANA_SERVICE_ACCOUNT_TOKEN
Yes* glsa_abc123...
Service account token (preferred auth)
GRAFANA_USERNAME
Alt admin
Basic auth username (alternative to token)
GRAFANA_PASSWORD
Alt changeme
Basic auth password
GRAFANA_ORG_ID
No 1
Organization ID for multi-org setups
*Either service account token or username/password required.
Key Tool Categories
Dashboard Operations
Tool What It Does
search_dashboards
Find dashboards by title or metadata
get_dashboard_summary
Lightweight overview (context-efficient — use this first)
get_dashboard_by_uid
Full dashboard JSON (large — use sparingly)
get_dashboard_property
Extract specific fields via JSONPath
get_dashboard_panel_queries
Extract panel query details
update_dashboard
Create or modify dashboards
patch_dashboard
Targeted modifications without full JSON replacement
Prometheus (PromQL)
Tool What It Does
query_prometheus
Execute instant or range PromQL queries
list_prometheus_metric_names
Discover available metrics
list_prometheus_label_names
List labels matching selectors
list_prometheus_label_values
Retrieve values for a specific label
query_prometheus_histogram
Calculate percentiles (p50, p90, p95, p99)
list_prometheus_metric_metadata
Metric type, help text, unit
Loki (LogQL)
Tool What It Does
query_loki_logs
Execute LogQL queries against log streams
list_loki_label_names
Discover available log labels
list_loki_label_values
List values for a specific log label
query_loki_stats
Stream statistics (volume, rate)
query_loki_patterns
Detect log structure patterns
Alerting
Tool What It Does
list_alert_rules
View all Grafana and datasource-managed alert rules
get_alert_rule_by_uid
Retrieve specific alert rule details
create_alert_rule
Create new alert rule
update_alert_rule
Modify existing alert rule
delete_alert_rule
Remove alert rule
list_contact_points
View notification endpoints (email, Slack, PagerDuty, etc.)
Incident Management
Tool What It Does
list_incidents
View Grafana Incidents with filtering
get_incident
Single incident details
create_incident
Create a new incident
add_activity_to_incident
Add timeline entry to incident
OnCall
Tool What It Does
list_oncall_schedules
View on-call rotation schedules
get_oncall_shift
Shift details
get_current_oncall_users
Who is on call right now
list_alert_groups
OnCall alert groups with filtering
Annotations & Rendering
Tool What It Does
get_annotations
Query annotations with time/tag filters
create_annotation
Add annotation to dashboard/panel
get_panel_image
Render a panel or dashboard as PNG image
generate_deeplink
Create accurate Grafana URLs for sharing
Investigation (Sift)
Tool What It Does
list_sift_investigations
List automated investigations
get_sift_investigation
Investigation details
find_error_pattern_logs
Detect elevated error patterns in logs
find_slow_requests
Identify slow requests via Tempo traces
Workflow: Network Infrastructure Monitoring
When checking network device metrics in Grafana:
-
Find dashboards: search_dashboards with keyword (e.g., "network", "interface", "BGP")
-
Dashboard overview: get_dashboard_summary for panel list without full JSON
-
Query metrics: query_prometheus with PromQL for specific metrics:
-
Interface traffic: rate(ifHCInOctets{instance="router1"}[5m]) * 8
-
BGP peer state: bgp_peer_state{peer="10.1.1.2"}
-
CPU utilization: device_cpu_utilization{device="core-rtr-01"}
-
Interface errors: increase(ifInErrors{device=~".*"}[1h])
-
Check alerts: list_alert_rules to see active alerting thresholds
-
Search logs: query_loki_logs for syslog or SNMP trap data
-
Report: Metrics summary with alert status and log correlation
-
GAIT: Record all queries in audit trail
Example: Interface Utilization Check
search_dashboards(title="Network Interfaces") get_dashboard_summary(uid="abc123") query_prometheus(expr="rate(ifHCInOctets{device='core-rtr-01'}[5m]) * 8", time_range="1h") query_prometheus(expr="rate(ifHCOutOctets{device='core-rtr-01'}[5m]) * 8", time_range="1h") list_alert_rules(folder="Network")
Workflow: Alert Investigation
When investigating Grafana alerts:
-
List alerts: list_alert_rules — find firing or pending rules
-
Alert details: get_alert_rule_by_uid — thresholds, conditions, datasource
-
Query metrics: query_prometheus — check the metric that triggered the alert
-
Search logs: query_loki_logs — correlate with log events around alert time
-
Check incidents: list_incidents — is this already tracked?
-
Contact points: list_contact_points — verify notification routes
-
Report: Alert analysis with root cause and metric evidence
Workflow: Incident Response
When responding to a Grafana incident:
-
List incidents: list_incidents — find open incidents
-
Incident details: get_incident — timeline, severity, labels
-
OnCall: get_current_oncall_users — who should be notified
-
Correlate metrics: query_prometheus — check affected service metrics
-
Correlate logs: query_loki_logs — find error patterns around incident time
-
Investigate: find_error_pattern_logs — automated error pattern detection
-
Update incident: add_activity_to_incident — add findings to timeline
-
Annotate: create_annotation — mark event on relevant dashboards
Workflow: Log Analysis
When investigating network logs stored in Loki:
-
Discover labels: list_loki_label_names — find available labels (host, severity, facility)
-
Label values: list_loki_label_values — enumerate hosts, severity levels
-
Query logs: query_loki_logs with LogQL:
-
By device: {host="core-rtr-01"}
-
By severity: {host="core-rtr-01"} |= "error"
-
Pattern match: {job="syslog"} |~ "BGP|OSPF"
-
Patterns: query_loki_patterns — detect recurring log structures
-
Stats: query_loki_stats — log volume and rate analysis
Integration with Other Skills
Skill Integration
pyats-health-check Cross-reference pyATS health data with Grafana metrics and dashboards
pyats-routing Correlate OSPF/BGP state changes with Grafana metric timelines
gait-session-tracking Record all Grafana queries and findings in GAIT audit trail
slack-network-alerts Grafana alerts fed through Slack + NetClaw for automated investigation
servicenow-change-workflow Annotate Grafana dashboards during change windows; correlate incidents with CRs
te-network-monitoring Pair ThousandEyes path data with Grafana infrastructure metrics
aws-cloud-monitoring Compare Grafana dashboards with CloudWatch data for hybrid visibility
markmap-viz Visualize Grafana alert rule hierarchies as mind maps
Context Window Management
Grafana dashboards can be large JSON documents. Use these strategies:
-
Always start with get_dashboard_summary — lightweight overview, not full JSON
-
Use get_dashboard_property with JSONPath for specific fields
-
Avoid get_dashboard_by_uid unless you need the complete dashboard definition
-
Use get_dashboard_panel_queries to extract just the query definitions
Important Rules
-
Prefer read-only operations — use search_dashboards , get_dashboard_summary , query_prometheus , query_loki_logs , list_alert_rules before any write operations
-
Dashboard modifications require ServiceNow CR — unless in lab/dev Grafana instance
-
Alert rule changes require approval — creating/updating/deleting alert rules affects production monitoring
-
Token-efficient queries — use get_dashboard_summary over get_dashboard_by_uid , use time ranges to limit Prometheus/Loki result size
-
GAIT audit mandatory — record all Grafana queries, dashboard modifications, alert changes, and incident updates
-
No secrets in queries — never embed credentials or sensitive data in PromQL/LogQL expressions
Error Handling
-
Auth fails (401/403): Check GRAFANA_URL and GRAFANA_SERVICE_ACCOUNT_TOKEN in ~/.openclaw/.env . Verify service account has Editor role or required RBAC permissions.
-
Datasource not found: Use list_datasources to discover available datasource UIDs and names.
-
PromQL/LogQL errors: Use list_prometheus_metric_names or list_loki_label_names to discover valid metric/label names before querying.
-
Dashboard not found: Use search_dashboards to find dashboards by title before using UID-based tools.
-
Rate limiting: Grafana may rate-limit API requests; space out large query batches.