Kubernetes Best Practices

This skill provides guidance for writing production-ready Kubernetes manifests and managing cloud-native applications.

Resource Management

Memory: Set requests and limits to the same value to ensure QoS class and prevent OOM kills.

CPU: Set requests only, omit limits to allow performance bursting and avoid throttling.

resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "256Mi" # No CPU limit

Image Versioning

Always pin specific versions, never use :latest tag unless explicitly requested:

Good

image: nginx:1.25.3

Bad

image: nginx:latest

For immutability, consider pinning to specific digests.

Configuration Management

Secrets: Sensitive data (passwords, tokens, certificates) ConfigMaps: Non-sensitive configuration (feature flags, URLs, settings)

env:

name: DATABASE_URL valueFrom: secretKeyRef: name: app-secrets key: database-url
name: LOG_LEVEL valueFrom: configMapKeyRef: name: app-config key: log-level

Best practices:

Never hardcode secrets in manifests
Use external secret management (Sealed Secrets, External Secrets Operator)
Rotate secrets regularly
Limit access with RBAC

Workload Selection

Choose the appropriate workload type:

Deployment: Stateless applications (web servers, APIs, microservices)
StatefulSet: Stateful applications (databases, message queues)
DaemonSet: Node-level services (log collectors, monitoring agents)
Job/CronJob: Batch processing and scheduled tasks

Security Context

Always implement security best practices:

securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 capabilities: drop: - ALL readOnlyRootFilesystem: true allowPrivilegeEscalation: false

Security checklist:

Run as non-root user
Drop all capabilities by default
Use read-only root filesystem
Disable privilege escalation
Implement network policies
Scan images for vulnerabilities

Health Checks

Implement all three probe types:

Liveness: Restart container if unhealthy Readiness: Remove from service endpoints if not ready Startup: Allow slow-starting containers time to initialize

livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10

readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5

startupProbe: httpGet: path: /startup port: 8080 periodSeconds: 10 failureThreshold: 30

High Availability

Replica counts: Set minimum 2 for production workloads

Pod Disruption Budgets: Maintain availability during voluntary disruptions

apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: app-pdb spec: minAvailable: 2 selector: matchLabels: app: web-app

Additional HA considerations:

Use anti-affinity rules for pod distribution across nodes
Configure graceful shutdown periods
Implement horizontal pod autoscaling
Set appropriate resource requests for scheduling

Namespace Organization

Use namespaces for environment isolation and apply resource quotas:

apiVersion: v1 kind: ResourceQuota metadata: name: prod-quota namespace: production spec: hard: requests.cpu: "100" requests.memory: 200Gi persistentvolumeclaims: "10"

Benefits: Logical separation, resource limits, RBAC boundaries, cost tracking

Labels and Annotations

Use consistent, recommended labels:

metadata: labels: app.kubernetes.io/name: myapp app.kubernetes.io/instance: myapp-prod app.kubernetes.io/version: "1.0.0" app.kubernetes.io/component: backend app.kubernetes.io/part-of: ecommerce app.kubernetes.io/managed-by: helm

Service Types

ClusterIP: Internal cluster communication (default)
NodePort: External access via node ports (dev/test)
LoadBalancer: Cloud provider load balancer (production)
ExternalName: DNS CNAME record (external services)

Storage

Choose appropriate storage class and access mode:

Access Modes:

ReadWriteOnce (RWO): Single node read-write
ReadOnlyMany (ROX): Multiple nodes read-only
ReadWriteMany (RWX): Multiple nodes read-write

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: app-data spec: accessModes: - ReadWriteOnce storageClassName: fast-ssd resources: requests: storage: 10Gi

Validation and Testing

Always validate before applying to production:

Client-side validation: kubectl apply --dry-run=client -f manifest.yaml
Server-side validation: kubectl apply --dry-run=server -f manifest.yaml
Test in staging: Deploy to non-production environment first
Monitor metrics: Watch resource usage and application health
Gradual rollout: Use rolling updates with health checks

Application Checklist

When creating or reviewing Kubernetes manifests:

Resource requests and limits configured
Specific image version pinned (not :latest)
Secrets and ConfigMaps used for configuration
Security context implemented (non-root, dropped capabilities)
Health checks configured (liveness, readiness, startup)
Pod Disruption Budget defined for HA workloads
Consistent labels applied
Appropriate workload type selected
Namespace and resource quotas configured
Validated with dry-run before applying

kubernetes-best-practices

Safety Notice

Copy this and send it to your AI assistant to learn

Good

Bad

Source Transparency

Related Skills

gcp-cost-optimizer

schema-designer

aws-cost-optimizer

terraform-state-manager