vmware-aria

Use this skill whenever the user needs VMware Aria Operations data — performance metrics, alerts, capacity planning, anomaly detection, and automated reports. Directly handles: query resource metrics, list/acknowledge/cancel alerts, manage alert definitions, check capacity and time-remaining forecasts, detect anomalies, generate and manage reports. Always use this skill for "check vSphere capacity", "what Aria Operations alerts are active", "show VMware anomalies", "generate an Aria report", "rightsizing recommendations", or any Aria/vRealize Operations task. Combined with LLM, Aria data powers natural language reports: "give me a capacity report" → Aria collects data → LLM formats the report. Do NOT use for real-time vCenter alarms/events (use vmware-monitor), VM operations (use vmware-aiops), or NSX networking (use vmware-nsx). For load balancing/AVI/AKO use vmware-avi.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "vmware-aria" with this command: npx skills add zw008/vmware-aria

VMware Aria Operations

Disclaimer: This is a community-maintained open-source project and is not affiliated with, endorsed by, or sponsored by VMware, Inc. or Broadcom Inc. "VMware" and "Aria" are trademarks of Broadcom. Source code is publicly auditable at github.com/zw008/VMware-Aria under the MIT license.

VMware Aria Operations (vRealize Operations) AI-assisted monitoring — 27 MCP tools for resources, alerts, alert definitions, capacity planning, anomaly detection, report automation, and platform health.

Domain-focused monitoring skill for Aria Operations 8.x / vRealize Operations 8.x. Companion skills: vmware-nsx (networking), vmware-aiops (VM lifecycle), vmware-monitor (read-only vSphere), vmware-avi (AVI/ALB/AKO). | vmware-pilot (workflow orchestration) | vmware-policy (audit/policy)

What This Skill Does

CategoryToolsCount
Resourceslist, get details, metrics, health badge, top consumers5
Alertslist, get details, acknowledge, cancel, list definitions5
Alert Definitionslist symptoms, create definition, enable/disable, delete4
Capacitycluster overview, remaining capacity, time remaining, rightsizing4
Reportslist templates, generate, list, get status+download URL, delete5
Anomalylist anomalies, risk badge2
HealthAria platform health, collector group status2

Total: 27 tools (23 read-only + 4 write)

Quick Install

uv tool install vmware-aria
vmware-aria doctor

When to Use This Skill

Performance monitoring (daily proactive checks):

  • Check VM contention: CPU Ready %, Memory Balloon, Swap usage
  • Fetch time-series metrics for any resource (CPU, memory, disk, network)
  • Find top consumers by CPU/memory/disk/network
  • Detect ML-based anomalies and risk scores

Alert management:

  • List, investigate, acknowledge, or cancel active alerts
  • List or filter alert definitions (templates)
  • Create new alert definitions from symptom definitions (post-RCA)
  • Enable or disable alert definitions; delete obsolete ones

Capacity planning:

  • Cluster capacity remaining (CPU, memory, disk headroom)
  • Time-until-full prediction per cluster
  • Right-sizing: find over-provisioned or under-utilized VMs
  • Capacity overview with Aria's built-in recommendations

Report automation:

  • Generate scheduled or on-demand reports (capacity, performance, SLA)
  • Poll report status until COMPLETED; get PDF/CSV download URL
  • Delete generated reports after download

Use companion skills for:

  • VM lifecycle: create, clone, snapshot, power → vmware-aiops
  • NSX networking: segments, gateways, NAT, routing → vmware-nsx
  • vSphere inventory, real-time alarms, events → vmware-monitor
  • Storage: iSCSI, vSAN, datastores → vmware-storage
  • Load balancing, AVI/ALB, AKO, Ingress → vmware-avi

Related Skills — Skill Routing

User IntentRecommended Skill
Aria Operations monitoring, alerts, capacityvmware-aria ← this skill
VM lifecycle, deployment, guest opsvmware-aiops
NSX networking: segments, gateways, NAT, routingvmware-nsx
Read-only vSphere inventory, events, alarmsvmware-monitor
Storage: iSCSI, vSAN, datastoresvmware-storage
Multi-step workflows with approvalvmware-pilot
Load balancer, AVI, ALB, AKO, Ingressvmware-avi (uv tool install vmware-avi)
Audit log queryvmware-policy (vmware-audit CLI)

Common Workflows

Diagnostic investigations: Before running any "why is X slow / failing / down" workflow, follow references/investigation-protocol.md. It enforces the four root-cause completeness criteria (falsifiability / sufficiency / necessity / mechanism) and the up-to-three-rounds deepening loop. Stopping at a partial conclusion is an anti-pattern — always self-check against the criteria before outputting a report.

Daily VM Health Check (Proactive Ops)

Judgment: don't chase the highest CPU consumer — chase the highest contention consumer. A VM at 90% CPU on a quiet host is healthy; a VM at 30% CPU but 15% Ready is starving. Key metrics: CPU Ready, Memory Balloon, Disk Latency.

  1. Find top CPU consumers → vmware-aria resource top --metric cpu|usage_average --top 20 (this is the starting set, not the answer)
  2. Check CPU Ready on hot VMs → vmware-aria resource metrics <vm-id> --metrics cpu.ready.summation --hours 24
    • 5% = warning, >10% = problem, >20% = critical

  3. Check memory pressure → vmware-aria resource metrics <vm-id> --metrics mem.balloon.average,mem.swapped.average --hours 24
    • Balloon >0 = ESXi reclaiming memory; Swap >0 = severe — act immediately
  4. List active CRITICAL/IMMEDIATE alerts → vmware-aria alert list --criticality CRITICAL
  5. Check ML anomalies → vmware-aria anomaly list
  6. Cross-validate against the investigation protocol before reporting any "root cause" — high consumption is rarely the root, usually a downstream symptom

Investigate High CPU Alert

  1. List active CRITICAL alerts → vmware-aria alert list --criticality CRITICAL
  2. Get alert details + symptoms → vmware-aria alert get <alert-id>
  3. Find top CPU consumers → vmware-aria resource top --metric cpu|usage_average
  4. Fetch 24h CPU metrics for the hot VM → vmware-aria resource metrics <vm-id> --metrics cpu|usage_average --hours 24
  5. Check risk badge → vmware-aria anomaly risk <vm-id>
  6. Acknowledge the alert → vmware-aria alert acknowledge <alert-id>

Capacity Planning

  1. List clusters → vmware-aria resource list --kind ClusterComputeResource
  2. Get remaining capacity → vmware-aria capacity remaining <cluster-id>
  3. Predict time until full → vmware-aria capacity time-remaining <cluster-id>
  4. Get capacity overview with recommendations → vmware-aria capacity overview <cluster-id>
  5. Find rightsizing candidates → vmware-aria capacity rightsizing

Post-Incident: Create Detection Alert (RCA Follow-up)

After resolving an incident, create an early-warning alert to prevent recurrence:

  1. Find matching symptom definition → vmware-aria alert symptom-definitions --name <keyword>
  2. Create alert definition referencing symptoms → vmware-aria alertdef create --name "Gold VM CPU Contention" --resource-kind VirtualMachine --symptom-ids <id1>,<id2> --criticality IMMEDIATE
  3. Verify it appears in definitions → vmware-aria alertdef list --name "Gold VM CPU"
  4. Enable it → vmware-aria alertdef enable <definition-id>

Generate Capacity Report

  1. Find report template → vmware-aria report definitions --name "Capacity"
  2. Trigger report generation → vmware-aria report generate <definition-id>
  3. Poll until completed → vmware-aria report get <report-id> (repeat until status == COMPLETED)
  4. Download via the returned download_url (PDF) or csv_url
  5. Clean up → vmware-aria report delete <report-id>

Multi-Target Operations

All commands accept --target <name> to operate against a specific Aria Ops instance:

vmware-aria alert list --target prod
vmware-aria resource top --target lab

Usage Mode

ScenarioRecommendedWhy
Local/small models (Ollama, Qwen)CLI~2K tokens vs ~8K for MCP
Cloud models (Claude, GPT-4o)EitherMCP gives structured JSON I/O
Automated pipelinesMCPType-safe parameters, structured output

MCP Tools (27 — 21 read, 6 write)

All MCP tools accept an optional target parameter to select which Aria Operations instance to connect to.

CategoryToolTypeDescription
Resourcelist_resourcesReadList VMs, hosts, clusters by resource kind
get_resourceReadGet resource details with health, risk, efficiency badges
get_resource_metricsReadFetch time-series metric stats for any resource
get_resource_healthReadGet health badge score (0–100)
get_top_consumersReadRank resources by CPU, memory, disk, or network usage
Alertslist_alertsReadList active alerts with criticality and resource info
get_alertReadGet alert details: symptoms and recommendations
acknowledge_alertWriteMark an alert as acknowledged (does not close it)
cancel_alertWriteCancel (dismiss) an active alert
list_alert_definitionsReadList alert templates configured in Aria Ops
Alert Defslist_symptom_definitionsReadList symptom definitions — use IDs when creating alert defs
create_alert_definitionWriteCreate new alert definition from symptom definition IDs
set_alert_definition_stateWriteEnable or disable an alert definition
delete_alert_definitionWriteDelete an alert definition permanently
Capacityget_capacity_overviewReadCluster capacity recommendations from Aria
get_remaining_capacityReadRemaining CPU, memory, disk before hitting limits
get_time_remainingReadDays until cluster capacity is exhausted
list_rightsizing_recommendationsReadVMs to resize: over/under-provisioned
Reportslist_report_definitionsReadList available report definition templates
generate_reportWriteTrigger report generation (async; returns report_id)
list_reportsReadList generated reports, optionally by definition
get_reportReadPoll report status + get PDF/CSV download URLs
delete_reportWriteDelete a generated report
Anomalylist_anomaliesReadMachine-learning anomalies across monitored resources
get_resource_riskbadgeReadRisk score (0–100): likelihood of future problems
Healthget_aria_healthReadAria platform internal services health
list_collector_groupsReadCollector agents status and connectivity

Read/write split: 21 read-only, 6 write. All write operations are audit-logged to ~/.vmware/audit.db (via vmware-policy).

CLI Quick Reference

# Resources
vmware-aria resource list [--kind VirtualMachine|HostSystem|ClusterComputeResource] [--name <filter>]
vmware-aria resource get <resource-id>
vmware-aria resource metrics <resource-id> --metrics cpu|usage_average,mem|usage_average --hours 4
vmware-aria resource metrics <vm-id> --metrics cpu.ready.summation,mem.balloon.average --hours 24
vmware-aria resource health <resource-id>
vmware-aria resource top --metric cpu|usage_average --kind VirtualMachine --top 10

# Alerts
vmware-aria alert list [--criticality CRITICAL|IMMEDIATE|WARNING|INFORMATION]
vmware-aria alert get <alert-id>
vmware-aria alert acknowledge <alert-id>
vmware-aria alert cancel <alert-id>
vmware-aria alert definitions [--name <filter>]

# Alert Definitions (create/manage alert templates)
vmware-aria alertdef symptom-definitions [--name <filter>] [--resource-kind VirtualMachine]
vmware-aria alertdef create --name <name> --description <desc> --resource-kind <kind> --symptom-ids <id1,id2> --criticality WARNING|IMMEDIATE|CRITICAL
vmware-aria alertdef list [--name <filter>]
vmware-aria alertdef enable <definition-id>
vmware-aria alertdef disable <definition-id>
vmware-aria alertdef delete <definition-id>

# Capacity
vmware-aria capacity overview <cluster-id>
vmware-aria capacity remaining <resource-id>
vmware-aria capacity time-remaining <resource-id>
vmware-aria capacity rightsizing [--resource-id <vm-id>]

# Reports (async: generate → poll get → download → delete)
vmware-aria report definitions [--name <filter>]
vmware-aria report generate <definition-id> [--resource-ids <id1,id2>]
vmware-aria report list [--definition-id <id>]
vmware-aria report get <report-id>        # poll until status == COMPLETED; shows download_url
vmware-aria report delete <report-id>

# Anomaly
vmware-aria anomaly list [--resource-id <id>]
vmware-aria anomaly risk <resource-id>

# Health
vmware-aria health status
vmware-aria health collectors

# Diagnostics
vmware-aria doctor [--skip-auth]

Key Metric Names (for resource metrics command)

MetricAPI KeyWhat It Means
CPU Ready %cpu.ready.summationvCPU waiting for physical core; >5% = warning
CPU Usedcpu.used.summationActual CPU execution time
CPU Demandcpu.demand.averageTotal MHz requested by VM
Memory Activemem.active.averageActively used by guest OS (sizing)
Memory Consumedmem.consumed.averageFootprint on host (capacity)
Memory Balloonmem.balloon.average>0 = ESXi reclaiming memory
Memory Swapmem.swapped.average>0 = severe pressure
Disk Read Latencydisk.read.averageRead I/O latency ms
Disk Write Latencydisk.write.averageWrite I/O latency ms
Net Receivednet.received.averageInbound network KB/s
Net Transmittednet.transmitted.averageOutbound network KB/s

Full CLI reference with all options and output formats: see references/cli-reference.md

Troubleshooting

"Token not found" error after setup

The token acquisition request failed. Verify:

  1. Aria Ops is reachable: vmware-aria doctor
  2. The auth_source in config matches your environment (LOCAL, LDAP, AD)
  3. The password env var follows the naming convention: VMWARE_ARIA_<TARGET>_PASSWORD

Resources appear missing from list_resources

The collector agent may be offline. Check list_collector_groups for any collectors in a non-RUNNING state. Restart the affected collector from the Aria Ops UI under Administration > Collector Groups.

Metrics return empty data

The resource may not have metric collection configured, or the requested metric key is incorrect. Verify metric keys against the resource's available metrics in the Aria Ops UI (Metrics tab on the resource detail page).

"Password not found" error

Variable names follow the pattern VMWARE_ARIA_<TARGET_NAME_UPPER>_PASSWORD where hyphens become underscores. Example: target prod needs VMWARE_ARIA_PROD_PASSWORD. Check your ~/.vmware-aria/.env file.

invalid peer certificate: UnknownIssuer when running uvx (corporate TLS proxy)

uvx re-resolves dependencies from PyPI on every launch. Behind a corporate TLS-intercepting proxy whose CA is not in uv's bundled cert store, the handshake fails. Use the v1.5.15+ recommended single-command form vmware-aria mcp (after uv tool install vmware-aria — no network on launch), or set UV_NATIVE_TLS=true to make uv use the system cert store.

Safety

  • Read-heavy: 21 of 27 tools are read-only
  • Audit logging: Write operations logged to ~/.vmware/audit.db (SQLite WAL, via vmware-policy) with timestamp, user, target, operation, and result
  • Token expiry handling: OpsToken refreshed automatically 60 seconds before expiry (30-minute validity window)
  • Prompt injection defense: API text values sanitized via _sanitize() — strips control characters, truncates to 500 chars
  • Credential safety: Passwords loaded only from environment variables (.env file), never from config.yaml
  • Input validation: resource_id and alert_id validated before API calls; criticality values validated against known enum

Setup

uv tool install vmware-aria
mkdir -p ~/.vmware-aria
cp config.example.yaml ~/.vmware-aria/config.yaml
# Edit config.yaml with your Aria Operations host details

# Add to ~/.vmware-aria/.env (create if missing, chmod 600):
# VMWARE_ARIA_PROD_PASSWORD=<your-password>
chmod 600 ~/.vmware-aria/.env

vmware-aria doctor

All tools are automatically audited via vmware-policy. Audit logs: vmware-audit log --last 20

Full setup guide with multi-target config, MCP server setup, and Docker: see references/setup-guide.md

Architecture

User (natural language)
  |
AI Agent (Claude Code / Goose / Cursor)
  | reads SKILL.md
vmware-aria CLI or MCP server (stdio transport)
  | Aria Operations Suite API (REST/JSON over HTTPS)
  | POST /suite-api/api/auth/token/acquire → OpsToken
Aria Operations Manager
  |
VMs / Hosts / Clusters / Datastores / Alerts / Capacity

The MCP server uses stdio transport (local only, no network listener). Connections to Aria Ops use HTTPS on port 443 with OpsToken authentication (30-minute token validity, auto-refreshed).

Audit & Safety

All operations are automatically audited via vmware-policy (@vmware_tool decorator):

  • Every tool call logged to ~/.vmware/audit.db (SQLite, framework-agnostic)
  • Policy rules enforced via ~/.vmware/rules.yaml (deny rules, maintenance windows, risk levels)
  • Risk classification: each tool tagged as low/medium/high/critical
  • View recent operations: vmware-audit log --last 20
  • View denied operations: vmware-audit log --status denied

vmware-policy is automatically installed as a dependency — no manual setup needed.

License

MIT — github.com/zw008/VMware-Aria

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Self Evolving Skill

Meta-cognitive self-learning system - Automated skill evolution based on predictive coding and value-driven mechanisms.

Registry SourceRecently Updated
11.1K32whtoo
General

Cyber Girlfriend

Build or customize an owner-only proactive companion system with a cyber-girlfriend persona, configurable guardrails, lightweight relationship memory, and op...

Registry SourceRecently Updated
General

Antigravity Quota 1.1.0

Check Antigravity account quotas for Claude and Gemini models. Shows remaining quota and reset times with ban detection.

Registry SourceRecently Updated
General

arguedotfun

Argument-driven prediction markets on Base. You bet USDC on debate outcomes by making compelling arguments. GenLayer's Optimistic Democracy consensus — a panel of AI validators running different LLMs — evaluates reasoning quality and determines winners. Better arguments beat bigger bets.

Registry SourceRecently Updated