Kubernetes Operations

Expert knowledge for Kubernetes cluster management, deployment, and troubleshooting with mastery of kubectl and cloud-native patterns.

Core Expertise

Kubernetes Operations

Workload Management: Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs
Networking: Services, Ingress, NetworkPolicies, and DNS configuration
Configuration & Storage: ConfigMaps, Secrets, PersistentVolumes, and PersistentVolumeClaims
Troubleshooting: Debugging pods, analyzing logs, and inspecting cluster events

Cluster Operations Process

Manifest First: Always prefer declarative YAML manifests for resource management
Validate & Dry-Run: Use kubectl apply --dry-run=client to validate changes
Inspect & Verify: After applying changes, verify with kubectl get , kubectl describe , kubectl logs
Monitor Health: Continuously check status of nodes, pods, and services
Clean Up: Ensure old or unused resources are properly garbage collected

Essential Commands

Resource management

kubectl apply -f manifest.yaml kubectl get pods -A kubectl describe pod <pod-name> kubectl logs -f <pod-name> kubectl exec -it <pod-name> -- /bin/bash

Debugging

kubectl get events --sort-by='.lastTimestamp' kubectl top nodes kubectl top pods --containers kubectl port-forward <pod-name> 8080:80

Deployment management

kubectl rollout status deployment/<name> kubectl rollout history deployment/<name> kubectl rollout undo deployment/<name>

Cluster inspection

kubectl cluster-info kubectl get nodes -o wide kubectl api-resources

Key Debugging Patterns

Pod Debugging

Pod inspection

kubectl describe pod <pod-name> kubectl get pod <pod-name> -o yaml kubectl logs <pod-name> --previous

Interactive debugging

kubectl exec -it <pod-name> -- /bin/bash kubectl debug <pod-name> -it --image=busybox kubectl port-forward <pod-name> 8080:80

Networking Troubleshooting

Service debugging

kubectl get svc -o wide kubectl get endpoints kubectl describe svc <service>

Network connectivity

kubectl run test-pod --image=busybox -it --rm -- sh

Inside pod: nslookup, wget, nc commands

Common Issues

CrashLoopBackOff debugging

kubectl logs <pod> --previous kubectl describe pod <pod> kubectl get events --field-selector involvedObject.name=<pod>

Resource constraints

kubectl top pod <pod> kubectl describe pod <pod> | grep -A 5 Limits

State management

kubectl state list kubectl state show <resource>

Best Practices

Context Safety (CRITICAL)

Always specify --context explicitly in every kubectl command
Never rely on the current context - it may have been changed by another process
Use kubectl --context=<context-name> get pods format for all operations
This prevents accidental operations on the wrong cluster (e.g., running production commands against staging)

CORRECT: Explicit context

kubectl --context=gke_myproject_us-central1_prod get pods kubectl --context=staging-cluster apply -f deployment.yaml

WRONG: Relying on current context

kubectl get pods # Which cluster is this targeting?

Resource Definitions

Use declarative YAML manifests
Implement proper labels and selectors
Define resource requests and limits
Configure health checks (liveness/readiness probes)

Security

Use NetworkPolicies to restrict traffic
Implement RBAC for access control
Store sensitive data in Secrets
Run containers as non-root users

Monitoring

Configure proper logging and metrics
Set up alerts for critical conditions
Use health checks and readiness probes
Monitor resource usage and quotas

Agentic Optimizations

Context Command

Pod status (structured) kubectl get pods -n <ns> -o json | jq '.items[] | {name:.metadata.name, status:.status.phase}'

Quick overview kubectl get pods -n <ns> -o wide

Events (compact) kubectl get events -n <ns> --sort-by='.lastTimestamp' -o json

Resource details kubectl get <resource> -o json

Logs (bounded) kubectl logs <pod> -n <ns> --tail=50

For detailed debugging commands, troubleshooting patterns, Helm workflows, and advanced K8s operations, see REFERENCE.md.

kubernetes-operations

Safety Notice

Copy this and send it to your AI assistant to learn