Kubernetes Operations
Core Workflow
Deployment Lifecycle
1. Validate before applying
kubectl apply --dry-run=server -f <manifest> -n <namespace>
2. Apply manifests
kubectl apply -f <manifest> -n <namespace>
3. Monitor rollout (blocks until complete or timeout)
kubectl rollout status deployment/<name> -n <namespace> --timeout=300s
4. Verify pods running
kubectl get pods -n <namespace> -l app=<label> -o wide
5. Check events for issues
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
Quick Health Check
Cluster overview
kubectl cluster-info kubectl get nodes -o wide kubectl top nodes # requires metrics-server
Namespace health
kubectl get all -n <namespace> kubectl get pods -n <namespace> -o wide kubectl top pods -n <namespace>
Troubleshooting Decision Tree
Pod Not Starting
-
Check pod status: kubectl get pods -n <ns> -o wide
-
Describe for events: kubectl describe pod <pod> -n <ns>
-
Check logs: kubectl logs <pod> -n <ns> --previous (if crashed)
Common causes:
-
ImagePullBackOff : Wrong image name/tag, missing imagePullSecrets
-
CrashLoopBackOff : App crash - check logs, health probes too aggressive
-
Pending : Insufficient resources, node selector/affinity issues
-
ContainerCreating : Volume mount issues, init container stuck
Pod Running But Not Receiving Traffic
-
Check readiness: kubectl get pods -n <ns> (READY column)
-
Check endpoints: kubectl get endpoints <service> -n <ns>
-
Check service selector: kubectl describe service <svc> -n <ns>
-
Test connectivity: kubectl run debug --rm -it --image=busybox -- wget -qO- <service>:<port>
High Restart Count
Get restart details
kubectl get pods -n <ns> -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[0].restartCount}{"\n"}{end}'
Check terminated state
kubectl get pod <pod> -n <ns> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
Review liveness probe config
kubectl get pod <pod> -n <ns> -o jsonpath='{.spec.containers[0].livenessProbe}'
Common Operations
Logs
Single pod
kubectl logs <pod> -n <ns> kubectl logs <pod> -n <ns> -c <container> # multi-container kubectl logs <pod> -n <ns> --previous # crashed container kubectl logs <pod> -n <ns> -f # follow/stream
All pods with label
kubectl logs -l app=<label> -n <ns> --all-containers
Since time
kubectl logs <pod> -n <ns> --since=1h kubectl logs <pod> -n <ns> --since-time="2024-01-01T00:00:00Z"
Exec/Debug
Interactive shell
kubectl exec -it <pod> -n <ns> -- /bin/sh kubectl exec -it <pod> -n <ns> -c <container> -- /bin/bash
Run command
kubectl exec <pod> -n <ns> -- <command>
Debug with ephemeral container (k8s 1.25+)
kubectl debug -it <pod> -n <ns> --image=busybox --target=<container>
Scaling
Manual scale
kubectl scale deployment/<name> -n <ns> --replicas=3
Autoscaling
kubectl autoscale deployment/<name> -n <ns> --min=2 --max=10 --cpu-percent=80 kubectl get hpa -n <ns>
Rollback
View history
kubectl rollout history deployment/<name> -n <ns>
Rollback to previous
kubectl rollout undo deployment/<name> -n <ns>
Rollback to specific revision
kubectl rollout undo deployment/<name> -n <ns> --to-revision=<N>
Pause/resume rollout
kubectl rollout pause deployment/<name> -n <ns> kubectl rollout resume deployment/<name> -n <ns>
Resource Management
Get resource usage
kubectl top pods -n <ns> --sort-by=memory kubectl top pods -n <ns> --sort-by=cpu
Describe resource limits
kubectl describe limitrange -n <ns> kubectl describe resourcequota -n <ns>
Get requests/limits for pods
kubectl get pods -n <ns> -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources}{"\n"}{end}'
Context & Namespace Management
View contexts
kubectl config get-contexts kubectl config current-context
Switch context
kubectl config use-context <context-name>
Set default namespace
kubectl config set-context --current --namespace=<ns>
Create namespace
kubectl create namespace <name>
Output Formats
Wide output with more columns
kubectl get pods -o wide
YAML/JSON export
kubectl get deployment <name> -o yaml kubectl get pod <name> -o json
Custom columns
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,IP:.status.podIP
JSONPath
kubectl get pods -o jsonpath='{.items[*].metadata.name}' kubectl get secret <name> -o jsonpath='{.data.password}' | base64 -d
Port Forwarding
Forward pod port
kubectl port-forward pod/<name> <local>:<remote> -n <ns>
Forward service port
kubectl port-forward svc/<name> <local>:<remote> -n <ns>
Forward deployment (picks a pod)
kubectl port-forward deployment/<name> <local>:<remote> -n <ns>
Labels & Selectors
Add label
kubectl label pods <pod> env=prod -n <ns>
Remove label
kubectl label pods <pod> env- -n <ns>
Select by label
kubectl get pods -l app=nginx,env=prod -n <ns> kubectl get pods -l 'env in (prod,staging)' -n <ns> kubectl delete pods -l app=test -n <ns>
Resource Cleanup
Delete by manifest
kubectl delete -f <manifest> -n <ns>
Delete by label
kubectl delete pods -l app=<label> -n <ns>
Force delete stuck pod
kubectl delete pod <pod> -n <ns> --grace-period=0 --force
Delete completed/failed pods
kubectl delete pods -n <ns> --field-selector=status.phase=Succeeded kubectl delete pods -n <ns> --field-selector=status.phase=Failed
Health Probes Reference
Probe Types
-
Liveness: Is container alive? Failure → restart
-
Readiness: Can container serve traffic? Failure → remove from endpoints
-
Startup: Has app started? Blocks liveness/readiness until success
Debugging Probes
Check probe config
kubectl get pod <pod> -n <ns> -o yaml | grep -A10 livenessProbe
Test HTTP probe manually
kubectl exec <pod> -n <ns> -- wget -qO- localhost:<port>/healthz
Check probe events
kubectl describe pod <pod> -n <ns> | grep -A5 "Liveness|Readiness"
Tips
-
Always use -n <namespace> explicitly to avoid mistakes
-
Use --dry-run=client -o yaml to generate manifests
-
Add --watch to continuously monitor: kubectl get pods -w
-
Use kubectl explain <resource>.<field> to understand spec fields
-
Annotate changes: kubectl annotate deployment/<name> kubernetes.io/change-cause="<reason>"