Kubernetes Operations
Comprehensive kubectl assistance for debugging, resource management, and cluster operations with token-efficient scripts.
BEFORE YOU START
This skill prevents 5 common errors and saves ~70% tokens.
Metric Without Skill With Skill
Pod Debugging ~1200 tokens ~400 tokens
Resource Listing ~800 tokens ~200 tokens
Cluster Health ~1500 tokens ~300 tokens
Known Issues This Skill Prevents
-
Running kubectl commands in wrong namespace/context
-
Verbose output flooding context with unnecessary data
-
Missing critical debugging steps (events, previous logs)
-
Exposing secrets in plain text output
-
Destructive operations without dry-run verification
Quick Start
Step 1: Verify Context
kubectl config current-context kubectl config get-contexts
Why this matters: Running commands in the wrong cluster can cause production incidents.
Step 2: Debug a Pod
uv run scripts/debug_pod.py <pod-name> [-n namespace]
Why this matters: The script combines describe, logs, and events into a condensed summary, saving ~800 tokens.
Step 3: Check Cluster Health
uv run scripts/cluster_health.py
Why this matters: Quick overview of node status and unhealthy pods without verbose output.
Critical Rules
Always Do
-
Always verify kubectl config current-context before operations
-
Always use -n namespace to be explicit about target
-
Always use --dry-run=client -o yaml before applying changes
-
Always check events when debugging: kubectl get events --sort-by='.lastTimestamp'
-
Always use --previous flag when pod is in CrashLoopBackOff
Never Do
-
Never run kubectl delete without --dry-run first in production
-
Never output secrets without filtering: avoid kubectl get secret -o yaml
-
Never assume default namespace - always specify -n
-
Never ignore resource limits when debugging OOMKilled pods
-
Never skip describe when logs show no errors
Common Mistakes
Wrong:
kubectl logs my-pod
Correct:
kubectl logs my-pod -n my-namespace --tail=100 --timestamps
Why: Default namespace may not be correct, unlimited logs flood context, timestamps help correlate with events.
Known Issues Prevention
Issue Root Cause Solution
CrashLoopBackOff App crash on startup Check kubectl logs --previous and describe for exit codes
ImagePullBackOff Registry auth or image tag Verify image exists and check pull secrets
Pending pods No schedulable nodes Check node resources and pod affinity/tolerations
OOMKilled Memory limit exceeded Check container limits vs actual usage with kubectl top
Connection refused Service selector mismatch Verify pod labels match service selector
Debugging Workflows
Pod Not Starting
1. Get pod status and events
kubectl describe pod <name> -n <namespace>
2. Check logs (current or previous)
kubectl logs <name> -n <namespace> --tail=100 kubectl logs <name> -n <namespace> --previous # If restarting
3. Check events for scheduling issues
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep <name>
4. Interactive debugging
kubectl exec -it <name> -n <namespace> -- /bin/sh
Service Connectivity
1. Verify service exists and has endpoints
kubectl get svc <name> -n <namespace> kubectl get endpoints <name> -n <namespace>
2. Check pod labels match service selector
kubectl get pods -n <namespace> --show-labels
3. Test from within cluster
kubectl run debug --rm -it --image=busybox -- wget -qO- http://<service>:<port>
4. Port-forward for local testing
kubectl port-forward svc/<name> 8080:80 -n <namespace>
Resource Management
Deployments
List deployments
kubectl get deployments -n <namespace>
Scale
kubectl scale deployment <name> --replicas=3 -n <namespace>
Rollout status
kubectl rollout status deployment/<name> -n <namespace>
Rollback
kubectl rollout undo deployment/<name> -n <namespace>
History
kubectl rollout history deployment/<name> -n <namespace>
ConfigMaps and Secrets
List
kubectl get configmaps -n <namespace> kubectl get secrets -n <namespace>
View ConfigMap data
kubectl get configmap <name> -n <namespace> -o jsonpath='{.data}'
View Secret keys (NOT values)
kubectl get secret <name> -n <namespace> -o jsonpath='{.data}' | jq 'keys'
Create from file
kubectl create configmap <name> --from-file=<path> -n <namespace> --dry-run=client -o yaml
Cluster Operations
Node Management
List nodes with status
kubectl get nodes -o wide
Node details
kubectl describe node <name>
Cordon (prevent scheduling)
kubectl cordon <node>
Drain (evict pods)
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
Uncordon
kubectl uncordon <node>
Resource Usage
Node resources
kubectl top nodes
Pod resources
kubectl top pods -n <namespace>
Sort by memory
kubectl top pods -n <namespace> --sort-by=memory
Bundled Resources
Scripts
Located in scripts/ :
-
debug_pod.py
-
Comprehensive pod debugging with condensed output
-
get_resources.py
-
Resource summary using jsonpath for minimal tokens
-
cluster_health.py
-
Quick cluster status overview
References
Located in references/ :
-
kubectl-cheatsheet.md
-
Condensed command reference
-
jsonpath-patterns.md
-
Common JSONPath expressions
-
debugging-flowchart.md
-
Decision tree for pod issues
Note: For deep dives on specific topics, see the reference files above.
Dependencies
Required
Package Version Purpose
kubectl 1.25+ Kubernetes CLI
jq 1.6+ JSON parsing for scripts
Optional
Package Version Purpose
k9s 0.27+ Terminal UI for Kubernetes
stern 1.25+ Multi-pod log tailing
Official Documentation
-
kubectl Quick Reference
-
JSONPath Support
-
kubectl Cheat Sheet
-
Debug Running Pods
Troubleshooting
kubectl command not found
Symptoms: command not found: kubectl
Solution:
macOS
brew install kubectl
Verify
kubectl version --client
Context not set
Symptoms: error: no context is currently set
Solution:
List available contexts
kubectl config get-contexts
Set context
kubectl config use-context <context-name>
Permission denied
Symptoms: Error from server (Forbidden)
Solution:
Check current user
kubectl auth whoami
Check permissions
kubectl auth can-i get pods -n <namespace> kubectl auth can-i --list -n <namespace>
Timeout connecting to cluster
Symptoms: Unable to connect to the server: dial tcp: i/o timeout
Solution:
Check cluster endpoint
kubectl cluster-info
Verify network connectivity
curl -k https://<cluster-api-endpoint>/healthz
Check kubeconfig
cat ~/.kube/config
Setup Checklist
Before using this skill, verify:
-
kubectl installed (kubectl version --client )
-
Kubeconfig configured (~/.kube/config exists)
-
Context set to correct cluster (kubectl config current-context )
-
Permissions verified (kubectl auth can-i get pods )
-
jq installed for JSON parsing (jq --version )