Kubernetes Operations

Comprehensive kubectl assistance for debugging, resource management, and cluster operations with token-efficient scripts.

BEFORE YOU START

This skill prevents 5 common errors and saves ~70% tokens.

Metric Without Skill With Skill

Pod Debugging ~1200 tokens ~400 tokens

Resource Listing ~800 tokens ~200 tokens

Cluster Health ~1500 tokens ~300 tokens

Known Issues This Skill Prevents

Running kubectl commands in wrong namespace/context
Verbose output flooding context with unnecessary data
Missing critical debugging steps (events, previous logs)
Exposing secrets in plain text output
Destructive operations without dry-run verification

Quick Start

Step 1: Verify Context

kubectl config current-context kubectl config get-contexts

Why this matters: Running commands in the wrong cluster can cause production incidents.

Step 2: Debug a Pod

uv run scripts/debug_pod.py <pod-name> [-n namespace]

Why this matters: The script combines describe, logs, and events into a condensed summary, saving ~800 tokens.

Step 3: Check Cluster Health

uv run scripts/cluster_health.py

Why this matters: Quick overview of node status and unhealthy pods without verbose output.

Critical Rules

Always Do

Always verify kubectl config current-context before operations
Always use -n namespace to be explicit about target
Always use --dry-run=client -o yaml before applying changes
Always check events when debugging: kubectl get events --sort-by='.lastTimestamp'
Always use --previous flag when pod is in CrashLoopBackOff

Never Do

Never run kubectl delete without --dry-run first in production
Never output secrets without filtering: avoid kubectl get secret -o yaml
Never assume default namespace - always specify -n
Never ignore resource limits when debugging OOMKilled pods
Never skip describe when logs show no errors

Common Mistakes

Wrong:

kubectl logs my-pod

Correct:

kubectl logs my-pod -n my-namespace --tail=100 --timestamps

Why: Default namespace may not be correct, unlimited logs flood context, timestamps help correlate with events.

Known Issues Prevention

Issue Root Cause Solution

CrashLoopBackOff App crash on startup Check kubectl logs --previous and describe for exit codes

ImagePullBackOff Registry auth or image tag Verify image exists and check pull secrets

Pending pods No schedulable nodes Check node resources and pod affinity/tolerations

OOMKilled Memory limit exceeded Check container limits vs actual usage with kubectl top

Connection refused Service selector mismatch Verify pod labels match service selector

Debugging Workflows

Pod Not Starting

1. Get pod status and events

kubectl describe pod <name> -n <namespace>

2. Check logs (current or previous)

kubectl logs <name> -n <namespace> --tail=100 kubectl logs <name> -n <namespace> --previous # If restarting

3. Check events for scheduling issues

kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep <name>

4. Interactive debugging

kubectl exec -it <name> -n <namespace> -- /bin/sh

Service Connectivity

1. Verify service exists and has endpoints

kubectl get svc <name> -n <namespace> kubectl get endpoints <name> -n <namespace>

2. Check pod labels match service selector

kubectl get pods -n <namespace> --show-labels

3. Test from within cluster

kubectl run debug --rm -it --image=busybox -- wget -qO- http://<service>:<port>

4. Port-forward for local testing

kubectl port-forward svc/<name> 8080:80 -n <namespace>

Resource Management

Deployments

List deployments

kubectl get deployments -n <namespace>

Scale

kubectl scale deployment <name> --replicas=3 -n <namespace>

Rollout status

kubectl rollout status deployment/<name> -n <namespace>

Rollback

kubectl rollout undo deployment/<name> -n <namespace>

History

kubectl rollout history deployment/<name> -n <namespace>

ConfigMaps and Secrets

List

kubectl get configmaps -n <namespace> kubectl get secrets -n <namespace>

View ConfigMap data

kubectl get configmap <name> -n <namespace> -o jsonpath='{.data}'

View Secret keys (NOT values)

kubectl get secret <name> -n <namespace> -o jsonpath='{.data}' | jq 'keys'

Create from file

kubectl create configmap <name> --from-file=<path> -n <namespace> --dry-run=client -o yaml

Cluster Operations

Node Management

List nodes with status

kubectl get nodes -o wide

Node details

kubectl describe node <name>

Cordon (prevent scheduling)

kubectl cordon <node>

Drain (evict pods)

kubectl drain <node> --ignore-daemonsets --delete-emptydir-data

Uncordon

kubectl uncordon <node>

Resource Usage

Node resources

kubectl top nodes

Pod resources

kubectl top pods -n <namespace>

Sort by memory

kubectl top pods -n <namespace> --sort-by=memory

Bundled Resources

Scripts

Located in scripts/ :

debug_pod.py
Comprehensive pod debugging with condensed output
get_resources.py
Resource summary using jsonpath for minimal tokens
cluster_health.py
Quick cluster status overview

References

Located in references/ :

kubectl-cheatsheet.md
Condensed command reference
jsonpath-patterns.md
Common JSONPath expressions
debugging-flowchart.md
Decision tree for pod issues

Note: For deep dives on specific topics, see the reference files above.

Dependencies

Required

Package Version Purpose

kubectl 1.25+ Kubernetes CLI

jq 1.6+ JSON parsing for scripts

Optional

Package Version Purpose

k9s 0.27+ Terminal UI for Kubernetes

stern 1.25+ Multi-pod log tailing

Official Documentation

kubectl Quick Reference
JSONPath Support
kubectl Cheat Sheet
Debug Running Pods

Troubleshooting

kubectl command not found

Symptoms: command not found: kubectl

Solution:

macOS

brew install kubectl

Verify

kubectl version --client

Context not set

Symptoms: error: no context is currently set

Solution:

List available contexts

kubectl config get-contexts

Set context

kubectl config use-context <context-name>

Permission denied

Symptoms: Error from server (Forbidden)

Solution:

Check current user

kubectl auth whoami

Check permissions

kubectl auth can-i get pods -n <namespace> kubectl auth can-i --list -n <namespace>

Timeout connecting to cluster

Symptoms: Unable to connect to the server: dial tcp: i/o timeout

Solution:

Check cluster endpoint

kubectl cluster-info

Verify network connectivity

curl -k https://<cluster-api-endpoint>/healthz

Check kubeconfig

cat ~/.kube/config

Setup Checklist

Before using this skill, verify:

kubectl installed (kubectl version --client )
Kubeconfig configured (~/.kube/config exists)
Context set to correct cluster (kubectl config current-context )
Permissions verified (kubectl auth can-i get pods )
jq installed for JSON parsing (jq --version )

kubernetes-operations

Safety Notice

Copy this and send it to your AI assistant to learn

1. Get pod status and events

2. Check logs (current or previous)

3. Check events for scheduling issues

4. Interactive debugging

1. Verify service exists and has endpoints

2. Check pod labels match service selector

3. Test from within cluster

4. Port-forward for local testing

List deployments

Scale

Rollout status

Rollback

History

List

View ConfigMap data

View Secret keys (NOT values)

Create from file

List nodes with status

Node details

Cordon (prevent scheduling)

Drain (evict pods)

Uncordon

Node resources

Pod resources

Sort by memory

macOS

Verify

List available contexts

Set context

Check current user

Check permissions

Check cluster endpoint

Verify network connectivity

Check kubeconfig

Source Transparency

Related Skills

tauri-v2

ha-automation

esphome-config-helper