Kubernetes Best Practices
This skill provides guidance for writing production-ready Kubernetes manifests and managing cloud-native applications.
Resource Management
Memory: Set requests and limits to the same value to ensure QoS class and prevent OOM kills.
CPU: Set requests only, omit limits to allow performance bursting and avoid throttling.
resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "256Mi" # No CPU limit
Image Versioning
Always pin specific versions, never use :latest tag unless explicitly requested:
Good
image: nginx:1.25.3
Bad
image: nginx:latest
For immutability, consider pinning to specific digests.
Configuration Management
Secrets: Sensitive data (passwords, tokens, certificates) ConfigMaps: Non-sensitive configuration (feature flags, URLs, settings)
env:
- name: DATABASE_URL valueFrom: secretKeyRef: name: app-secrets key: database-url
- name: LOG_LEVEL valueFrom: configMapKeyRef: name: app-config key: log-level
Best practices:
-
Never hardcode secrets in manifests
-
Use external secret management (Sealed Secrets, External Secrets Operator)
-
Rotate secrets regularly
-
Limit access with RBAC
Workload Selection
Choose the appropriate workload type:
-
Deployment: Stateless applications (web servers, APIs, microservices)
-
StatefulSet: Stateful applications (databases, message queues)
-
DaemonSet: Node-level services (log collectors, monitoring agents)
-
Job/CronJob: Batch processing and scheduled tasks
Security Context
Always implement security best practices:
securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 capabilities: drop: - ALL readOnlyRootFilesystem: true allowPrivilegeEscalation: false
Security checklist:
-
Run as non-root user
-
Drop all capabilities by default
-
Use read-only root filesystem
-
Disable privilege escalation
-
Implement network policies
-
Scan images for vulnerabilities
Health Checks
Implement all three probe types:
Liveness: Restart container if unhealthy Readiness: Remove from service endpoints if not ready Startup: Allow slow-starting containers time to initialize
livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10
readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5
startupProbe: httpGet: path: /startup port: 8080 periodSeconds: 10 failureThreshold: 30
High Availability
Replica counts: Set minimum 2 for production workloads
Pod Disruption Budgets: Maintain availability during voluntary disruptions
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: app-pdb spec: minAvailable: 2 selector: matchLabels: app: web-app
Additional HA considerations:
-
Use anti-affinity rules for pod distribution across nodes
-
Configure graceful shutdown periods
-
Implement horizontal pod autoscaling
-
Set appropriate resource requests for scheduling
Namespace Organization
Use namespaces for environment isolation and apply resource quotas:
apiVersion: v1 kind: ResourceQuota metadata: name: prod-quota namespace: production spec: hard: requests.cpu: "100" requests.memory: 200Gi persistentvolumeclaims: "10"
Benefits: Logical separation, resource limits, RBAC boundaries, cost tracking
Labels and Annotations
Use consistent, recommended labels:
metadata: labels: app.kubernetes.io/name: myapp app.kubernetes.io/instance: myapp-prod app.kubernetes.io/version: "1.0.0" app.kubernetes.io/component: backend app.kubernetes.io/part-of: ecommerce app.kubernetes.io/managed-by: helm
Service Types
-
ClusterIP: Internal cluster communication (default)
-
NodePort: External access via node ports (dev/test)
-
LoadBalancer: Cloud provider load balancer (production)
-
ExternalName: DNS CNAME record (external services)
Storage
Choose appropriate storage class and access mode:
Access Modes:
-
ReadWriteOnce (RWO): Single node read-write
-
ReadOnlyMany (ROX): Multiple nodes read-only
-
ReadWriteMany (RWX): Multiple nodes read-write
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: app-data spec: accessModes: - ReadWriteOnce storageClassName: fast-ssd resources: requests: storage: 10Gi
Validation and Testing
Always validate before applying to production:
-
Client-side validation: kubectl apply --dry-run=client -f manifest.yaml
-
Server-side validation: kubectl apply --dry-run=server -f manifest.yaml
-
Test in staging: Deploy to non-production environment first
-
Monitor metrics: Watch resource usage and application health
-
Gradual rollout: Use rolling updates with health checks
Application Checklist
When creating or reviewing Kubernetes manifests:
-
Resource requests and limits configured
-
Specific image version pinned (not :latest)
-
Secrets and ConfigMaps used for configuration
-
Security context implemented (non-root, dropped capabilities)
-
Health checks configured (liveness, readiness, startup)
-
Pod Disruption Budget defined for HA workloads
-
Consistent labels applied
-
Appropriate workload type selected
-
Namespace and resource quotas configured
-
Validated with dry-run before applying