VMware VKS
Disclaimer: This is a community-maintained open-source project and is not affiliated with, endorsed by, or sponsored by VMware, Inc. or Broadcom Inc. "VMware" and "vSphere" are trademarks of Broadcom. Source code is publicly auditable at github.com/zw008/VMware-VKS under the MIT license.
AI-powered VMware vSphere Kubernetes Service (VKS) management — 20 MCP tools.
Requires vSphere 8.x+ with Workload Management enabled. Companion skills: vmware-aiops (VM lifecycle), vmware-monitor (monitoring), vmware-storage (storage), vmware-nsx (NSX networking), vmware-nsx-security (DFW/firewall), vmware-aria (metrics/alerts/capacity), vmware-avi (AVI/ALB/AKO). | vmware-pilot (workflow orchestration) | vmware-policy (audit/policy)
What This Skill Does
| Category | Capabilities | Count |
|---|---|---|
| Supervisor | Compatibility check, status, storage policies | 3 |
| Namespace | List, get, create with quotas, update, delete with TKC guard, VM classes | 6 |
| TKC Clusters | List, get, versions, create, scale, upgrade, delete with workload guard | 7 |
| Access | Supervisor kubeconfig, TKC kubeconfig, Harbor registry, storage usage | 4 |
Quick Install
uv tool install vmware-vks
vmware-vks doctor
When to Use This Skill
- Check if vSphere environment supports VKS
- Create, update, or delete Supervisor Namespaces with resource quotas
- Deploy, scale, upgrade, or delete TKC (TanzuKubernetesCluster) clusters
- Get kubeconfig for Supervisor or TKC clusters
- Check Harbor registry info or storage usage
Use companion skills for:
- VM lifecycle, deployment →
vmware-aiops - Inventory, health, alarms →
vmware-monitor - iSCSI, vSAN, datastore →
vmware-storage - Load balancing, AVI/ALB, AKO, Ingress →
vmware-avi
Related Skills — Skill Routing
| User Intent | Recommended Skill |
|---|---|
| Read-only monitoring | vmware-monitor |
| Storage: iSCSI, vSAN | vmware-storage |
| VM lifecycle, deployment | vmware-aiops |
| vSphere Kubernetes Service (vSphere 8.x+) | vmware-vks ← this skill |
| NSX networking: segments, gateways, NAT | vmware-nsx |
| NSX security: DFW rules, security groups | vmware-nsx-security |
| Aria Ops: metrics, alerts, capacity planning | vmware-aria |
| Multi-step workflows with approval | vmware-pilot |
| Load balancer, AVI, ALB, AKO, Ingress | vmware-avi (uv tool install vmware-avi) |
| Audit log query | vmware-policy (vmware-audit CLI) |
Common Workflows
Deploy a New TKC Cluster
Pre-flight (judgment):
- Supervisor must be vSphere 8.x+ with WCP enabled —
supervisor checkreturns pass/fail. If fail, no amount of TKC commands will work; resolve at vSphere/WCP layer first. - K8s version: pick a TKR version that's still supported by VMware (not EOL). New clusters on EOL versions look fine until you need a CVE patch and there isn't one.
- VM class sizing:
best-effort-*for dev,guaranteed-*for prod. Abest-effortworker can be evicted under host pressure — production workloads need guaranteed. - Storage policy: must already exist on the supervisor.
list_supervisor_storage_policiesfirst; creating a TKC against a missing policy fails after CP boot, leaving partial state. - Control-plane count:
1for dev,3for prod (HA). Cannot upgrade from 1→3 without recreating; choose right the first time. - Namespace quota: TKC consumes CP + worker × (cpu, memory) from namespace quota. If quota is too tight, workers fail to schedule with no obvious error.
Steps:
vmware-vks supervisor check --target prod→ must passvmware-vks tkc versions -n <ns>→ pick a non-EOL TKR- (If new namespace)
vmware-vks namespace create dev --storage-policy <policy> --cpu <enough-for-cp+workers> --apply --dry-runthen real vmware-vks tkc create dev-cluster -n dev --version <tkr> --control-plane 1 --workers 3 --vm-class best-effort-large --apply --dry-runthen real- Wait for
phase=running(typically 10-15 min); do not assume success on apply return vmware-vks kubeconfig get dev-cluster -n dev -o ./kubeconfig— write to file, do not paste tokens into the agent context
Scale Workers for Load Testing
Judgment: scaling is fast but reverse-scaling is destructive — workers are deleted, in-flight pods lost. Treat scale-down like a delete.
tkc get dev-cluster -n dev→ record current worker count and any pending pods- Scale-up:
tkc scale dev-cluster -n dev --workers 6→ safe, additive operation - Verify new workers reach
Readyinkubectl get nodesbefore sending traffic - Scale-down: drain pods first via
kubectl drainon the to-be-deleted nodes, THENtkc scale --workers 3. Skipping drain causes pod restarts on remaining nodes — measurable user impact. - Confirm namespace quota leftover supports the new size — quota is enforced at scheduling, not at scale request
Namespace Resource Management
Judgment: quota changes are atomic but consequences are not. Reducing quota below current usage doesn't evict pods — they keep running, but no new pods schedule, looking like a "namespace is broken" symptom.
namespace list→ see all namespaces and their phasestorage -n dev→ check current CPU/memory/storage usage; never reduce quota below current usage + 20% headroomnamespace update dev --cpu <new> --memory <new> --dry-run→ preview, then real- Validate by attempting a small pod scale-up; if it pends with
Insufficient cpu, quota is still the bottleneck
Architecture
User (Natural Language)
↓
AI Agent (Claude Code / Goose / Cursor)
↓ reads SKILL.md
↓
vmware-vks CLI ─── or ─── vmware-vks MCP Server (stdio)
│
├─ Layer 1: pyVmomi → vCenter REST API
│ Supervisor status, storage policies, Namespace CRUD, VM classes, Harbor
│
└─ Layer 2: kubernetes client → Supervisor K8s API endpoint
TKC CR apply / get / delete (cluster.x-k8s.io/v1beta1)
Kubeconfig built from Layer 1 session token
↓
vCenter Server 8.x+ (Workload Management enabled)
↓
Supervisor Cluster → vSphere Namespaces → TanzuKubernetesCluster
Usage Mode
| Scenario | Recommended | Why |
|---|---|---|
| Local/small models (Ollama, Qwen) | CLI | ~2K tokens vs ~8K for MCP |
| Cloud models (Claude, GPT-4o) | Either | MCP gives structured JSON I/O |
| Automated pipelines | MCP | Type-safe parameters, structured output |
MCP Tools (20 — 12 read, 8 write)
All accept optional target parameter to specify a named vCenter.
| Category | Tool | Type |
|---|---|---|
| Supervisor | check_vks_compatibility | Read |
get_supervisor_status | Read | |
list_supervisor_storage_policies | Read | |
| Namespace | list_namespaces | Read |
get_namespace | Read | |
create_namespace | Write | |
update_namespace | Write | |
delete_namespace | Write | |
list_vm_classes | Read | |
| TKC | list_tkc_clusters | Read |
get_tkc_cluster | Read | |
get_tkc_available_versions | Read | |
create_tkc_cluster | Write | |
scale_tkc_cluster | Write | |
upgrade_tkc_cluster | Write | |
delete_tkc_cluster | Write | |
| Access | get_supervisor_kubeconfig | Read |
get_tkc_kubeconfig | Read | |
get_harbor_info | Read | |
list_namespace_storage_usage | Read |
create_namespace / create_tkc_cluster — defaults to dry_run=True, returns a YAML plan for review. Pass dry_run=False to apply.
delete_namespace — requires confirmed=True and rejects if TKC clusters still exist (prevents orphaned clusters).
delete_tkc_cluster — requires confirmed=True and checks for running workloads. Rejects if found unless force=True.
Credential handling: get_supervisor_kubeconfig and get_tkc_kubeconfig return short-lived session tokens (not long-lived credentials). Tokens are derived from the authenticated vCenter session and expire when the session ends. Kubeconfig output is intended for local kubectl use — agents should write it to a file (-o <path>) rather than displaying tokens in conversation context.
Full capability details and safety features: see
references/capabilities.md
CLI Quick Reference
# Supervisor
vmware-vks check [--target <name>]
vmware-vks supervisor status <cluster-id> [--target <name>]
vmware-vks supervisor storage-policies [--target <name>]
# Namespace
vmware-vks namespace list [--target <name>]
vmware-vks namespace get <name> [--target <name>]
vmware-vks namespace create <name> --cluster <id> [--cpu <n>] [--memory <mb>] [--storage-policy <name>] [--apply]
vmware-vks namespace update <name> [--cpu <n>] [--memory <mb>] [--target <name>]
vmware-vks namespace delete <name> [--target <name>]
# TKC Clusters
vmware-vks tkc list [-n <namespace>] [--target <name>]
vmware-vks tkc create <name> -n <ns> [--version <v>] [--workers <n>] [--vm-class <name>] [--apply]
vmware-vks tkc scale <name> -n <ns> --workers <n> [--target <name>]
vmware-vks tkc upgrade <name> -n <ns> --version <v> [--target <name>]
vmware-vks tkc delete <name> -n <ns> [--force] [--target <name>]
# Kubeconfig
vmware-vks kubeconfig supervisor -n <namespace> [--target <name>]
vmware-vks kubeconfig get <cluster-name> -n <namespace> [-o <path>] [--target <name>]
# Harbor & Storage
vmware-vks harbor [--target <name>]
vmware-vks storage -n <namespace> [--target <name>]
Full CLI reference with all flags and interactive creation: see
references/cli-reference.md
Troubleshooting
"VKS not compatible" error
Workload Management must be enabled in vCenter. Check: vCenter UI → Workload Management. Requires vSphere 8.x+ with Enterprise Plus or VCF license.
Namespace creation fails with "storage policy not found"
List available policies first: vmware-vks supervisor storage-policies. Policy names are case-sensitive.
TKC cluster stuck in "Creating" phase
Check Supervisor events in vCenter. Common causes: insufficient resources on ESXi hosts, network issues with NSX-T, or storage policy not available on target datastore.
Kubeconfig retrieval fails
Supervisor API endpoint must be reachable from the machine running vmware-vks. Check firewall rules for port 6443.
Scale operation has no effect
Verify the cluster is in "Running" phase before scaling. Clusters in "Creating" or "Updating" phase reject scale operations.
Delete namespace rejected unexpectedly
The namespace delete guard prevents deletion when TKC clusters exist inside. Delete all TKC clusters in the namespace first, then retry.
Prerequisites
- vSphere 8.x+ with Workload Management enabled
- Enterprise Plus or VCF license
- NSX-T (recommended) or VDS + HAProxy networking
- Supervisor Cluster configured and running
Setup
uv tool install vmware-vks
mkdir -p ~/.vmware-vks
vmware-vks init
All tools are automatically audited via vmware-policy. Audit logs:
vmware-audit log --last 20
Full setup guide, security details, and AI platform compatibility: see
references/setup-guide.md
Audit & Safety
All operations are automatically audited via vmware-policy (@vmware_tool decorator):
- Every tool call logged to
~/.vmware/audit.db(SQLite, framework-agnostic) - Policy rules enforced via
~/.vmware/rules.yaml(deny rules, maintenance windows, risk levels) - Risk classification: each tool tagged as low/medium/high/critical
- View recent operations:
vmware-audit log --last 20 - View denied operations:
vmware-audit log --status denied
vmware-policy is automatically installed as a dependency — no manual setup needed.