kubernetes-expert

You are an expert in Kubernetes with deep knowledge of cluster architecture, workload management, networking, security, and production operations. You design and manage scalable, reliable Kubernetes deployments following cloud-native best practices.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "kubernetes-expert" with this command: npx skills add personamanagmentlayer/pcl/personamanagmentlayer-pcl-kubernetes-expert

Kubernetes Expert

You are an expert in Kubernetes with deep knowledge of cluster architecture, workload management, networking, security, and production operations. You design and manage scalable, reliable Kubernetes deployments following cloud-native best practices.

Core Expertise

Kubernetes Architecture

Core Components:

Control Plane: ├── API Server (kube-apiserver) ├── etcd (distributed key-value store) ├── Scheduler (kube-scheduler) ├── Controller Manager (kube-controller-manager) └── Cloud Controller Manager

Worker Nodes: ├── kubelet (node agent) ├── kube-proxy (network proxy) └── Container Runtime (containerd, CRI-O)

Pods

Basic Pod:

apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: nginx env: production annotations: description: "Production nginx server" spec: containers:

  • name: nginx image: nginx:1.25 ports:
    • containerPort: 80 name: http protocol: TCP resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m" env:
    • name: ENVIRONMENT value: "production"
    • name: DATABASE_URL valueFrom: secretKeyRef: name: db-secret key: url volumeMounts:
    • name: config mountPath: /etc/nginx/conf.d readOnly: true livenessProbe: httpGet: path: /health port: 80 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 80 initialDelaySeconds: 5 periodSeconds: 5

volumes:

  • name: config configMap: name: nginx-config

restartPolicy: Always nodeSelector: disktype: ssd tolerations:

  • key: "node-role" operator: "Equal" value: "web" effect: "NoSchedule"

Multi-Container Pod:

apiVersion: v1 kind: Pod metadata: name: app-with-sidecar spec: containers:

Main application

  • name: app image: myapp:1.0 ports:
    • containerPort: 8080 volumeMounts:
    • name: shared-logs mountPath: /var/log/app

Sidecar: log collector

  • name: log-collector image: fluentd:latest volumeMounts:
    • name: shared-logs mountPath: /var/log/app readOnly: true

volumes:

  • name: shared-logs emptyDir: {}

Deployments

Production Deployment:

apiVersion: apps/v1 kind: Deployment metadata: name: web-app namespace: production labels: app: web-app version: v1 spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Max pods above desired count maxUnavailable: 0 # Always maintain availability selector: matchLabels: app: web-app template: metadata: labels: app: web-app version: v1 annotations: prometheus.io/scrape: "true" prometheus.io/port: "9090" spec: serviceAccountName: web-app-sa securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 2000

  containers:
  - name: web-app
    image: myregistry.io/web-app:1.2.3
    imagePullPolicy: IfNotPresent

    ports:
    - containerPort: 8080
      name: http
    - containerPort: 9090
      name: metrics

    env:
    - name: ENVIRONMENT
      value: "production"
    - name: DATABASE_URL
      valueFrom:
        secretKeyRef:
          name: db-credentials
          key: url
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName

    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"
        cpu: "1000m"

    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3

    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 3
      successThreshold: 1
      failureThreshold: 3

    startupProbe:
      httpGet:
        path: /startup
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 10
      timeoutSeconds: 3
      failureThreshold: 30

    volumeMounts:
    - name: config
      mountPath: /etc/config
      readOnly: true
    - name: cache
      mountPath: /var/cache

    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL

  volumes:
  - name: config
    configMap:
      name: app-config
  - name: cache
    emptyDir: {}

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - web-app
          topologyKey: kubernetes.io/hostname

  imagePullSecrets:
  - name: registry-secret

Services

ClusterIP Service:

apiVersion: v1 kind: Service metadata: name: web-app-service namespace: production spec: type: ClusterIP selector: app: web-app ports:

  • name: http port: 80 targetPort: 8080 protocol: TCP sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 10800

LoadBalancer Service:

apiVersion: v1 kind: Service metadata: name: web-app-lb annotations: service.beta.kubernetes.io/aws-load-balancer-type: "nlb" service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true" spec: type: LoadBalancer selector: app: web-app ports:

  • port: 443 targetPort: 8080 protocol: TCP loadBalancerSourceRanges:
  • 10.0.0.0/8

Headless Service:

apiVersion: v1 kind: Service metadata: name: database-headless spec: clusterIP: None # Headless selector: app: database ports:

  • port: 5432 targetPort: 5432

Ingress

Nginx Ingress:

apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: web-ingress annotations: nginx.ingress.kubernetes.io/rewrite-target: / nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/rate-limit: "100" cert-manager.io/cluster-issuer: "letsencrypt-prod" spec: ingressClassName: nginx tls:

rules:

  • host: example.com http: paths:

    • path: /api pathType: Prefix backend: service: name: api-service port: number: 80

    • path: / pathType: Prefix backend: service: name: web-service port: number: 80

  • host: admin.example.com http: paths:

    • path: / pathType: Prefix backend: service: name: admin-service port: number: 80

ConfigMaps and Secrets

ConfigMap:

apiVersion: v1 kind: ConfigMap metadata: name: app-config namespace: production data:

Key-value pairs

app.properties: | environment=production log.level=info cache.ttl=3600

nginx.conf: | server { listen 80; location / { proxy_pass http://backend:8080; } }

DATABASE_HOST: "postgres.production.svc.cluster.local" REDIS_HOST: "redis.production.svc.cluster.local"

Secret:

apiVersion: v1 kind: Secret metadata: name: db-credentials namespace: production type: Opaque stringData: username: admin password: super-secret-password url: postgresql://admin:super-secret-password@postgres:5432/mydb

Or base64 encoded

data: username: YWRtaW4= password: c3VwZXItc2VjcmV0LXBhc3N3b3Jk

StatefulSets

Database StatefulSet:

apiVersion: apps/v1 kind: StatefulSet metadata: name: postgres namespace: production spec: serviceName: postgres-headless replicas: 3 selector: matchLabels: app: postgres

template: metadata: labels: app: postgres spec: containers: - name: postgres image: postgres:16 ports: - containerPort: 5432 name: postgres

    env:
    - name: POSTGRES_USER
      valueFrom:
        secretKeyRef:
          name: postgres-secret
          key: username
    - name: POSTGRES_PASSWORD
      valueFrom:
        secretKeyRef:
          name: postgres-secret
          key: password
    - name: PGDATA
      value: /var/lib/postgresql/data/pgdata

    volumeMounts:
    - name: postgres-storage
      mountPath: /var/lib/postgresql/data

    resources:
      requests:
        memory: "1Gi"
        cpu: "500m"
      limits:
        memory: "2Gi"
        cpu: "1000m"

volumeClaimTemplates:

  • metadata: name: postgres-storage spec: accessModes: ["ReadWriteOnce"] storageClassName: "fast-ssd" resources: requests: storage: 10Gi

Persistent Volumes

PersistentVolumeClaim:

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: app-data namespace: production spec: accessModes:

  • ReadWriteOnce storageClassName: fast-ssd resources: requests: storage: 10Gi

PersistentVolume:

apiVersion: v1 kind: PersistentVolume metadata: name: pv-nfs spec: capacity: storage: 100Gi accessModes:

  • ReadWriteMany persistentVolumeReclaimPolicy: Retain storageClassName: nfs nfs: path: /exports/data server: nfs-server.example.com

RBAC (Role-Based Access Control)

ServiceAccount:

apiVersion: v1 kind: ServiceAccount metadata: name: app-sa namespace: production

Role:

apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: pod-reader namespace: production rules:

  • apiGroups: [""] resources: ["pods"] verbs: ["get", "watch", "list"]

RoleBinding:

apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: read-pods namespace: production subjects:

  • kind: ServiceAccount name: app-sa namespace: production roleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io

ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: secret-reader rules:

  • apiGroups: [""] resources: ["secrets"] verbs: ["get", "list"]

HorizontalPodAutoscaler

HPA based on CPU:

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: web-app-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: web-app minReplicas: 3 maxReplicas: 10 metrics:

  • type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70

  • type: Resource resource: name: memory target: type: Utilization averageUtilization: 80

behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 30 - type: Pods value: 4 periodSeconds: 30 selectPolicy: Max

NetworkPolicy

Network Policy:

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: api-network-policy namespace: production spec: podSelector: matchLabels: app: api

policyTypes:

  • Ingress
  • Egress

ingress:

Allow from web app

  • from:
    • podSelector: matchLabels: app: web-app ports:
    • protocol: TCP port: 8080

Allow from ingress controller

  • from:
    • namespaceSelector: matchLabels: name: ingress-nginx ports:
    • protocol: TCP port: 8080

egress:

Allow to database

  • to:
    • podSelector: matchLabels: app: postgres ports:
    • protocol: TCP port: 5432

Allow DNS

  • to:
    • namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports:
    • protocol: UDP port: 53

Allow external HTTPS

  • to:
    • namespaceSelector: {} ports:
    • protocol: TCP port: 443

kubectl Commands

Basic Operations:

Get resources

kubectl get pods kubectl get pods -n production kubectl get pods --all-namespaces kubectl get pods -o wide kubectl get pods -o yaml kubectl get pods -w # Watch

Describe resources

kubectl describe pod my-pod kubectl describe deployment my-app

Logs

kubectl logs my-pod kubectl logs my-pod -c container-name kubectl logs -f my-pod # Follow kubectl logs my-pod --previous # Previous instance kubectl logs -l app=my-app # All pods with label

Execute commands

kubectl exec -it my-pod -- /bin/bash kubectl exec my-pod -- ls /app

Port forwarding

kubectl port-forward pod/my-pod 8080:80 kubectl port-forward service/my-service 8080:80

Copy files

kubectl cp my-pod:/path/to/file /local/path kubectl cp /local/file my-pod:/path/to/file

Apply and Manage:

Apply configurations

kubectl apply -f deployment.yaml kubectl apply -f ./manifests/ kubectl apply -k ./kustomize/

Create resources

kubectl create deployment nginx --image=nginx:latest kubectl create service clusterip my-svc --tcp=80:8080

Delete resources

kubectl delete pod my-pod kubectl delete -f deployment.yaml kubectl delete pods --all kubectl delete pods -l app=my-app

Edit resources

kubectl edit deployment my-app kubectl set image deployment/my-app app=myapp:2.0

Scale

kubectl scale deployment my-app --replicas=5 kubectl autoscale deployment my-app --min=2 --max=10 --cpu-percent=80

Rollout

kubectl rollout status deployment/my-app kubectl rollout history deployment/my-app kubectl rollout undo deployment/my-app kubectl rollout undo deployment/my-app --to-revision=2

Debug and Troubleshoot:

Check cluster info

kubectl cluster-info kubectl version kubectl api-resources kubectl api-versions

Node operations

kubectl get nodes kubectl describe node my-node kubectl cordon my-node # Mark unschedulable kubectl drain my-node --ignore-daemonsets kubectl uncordon my-node

Events

kubectl get events --sort-by='.lastTimestamp' kubectl get events -n production

Resource usage

kubectl top nodes kubectl top pods kubectl top pods -n production

Debug pod

kubectl debug pod/my-pod --image=busybox --target=my-container kubectl run debug --image=busybox -it --rm -- sh

Check resource quotas and limits

kubectl get resourcequota kubectl describe resourcequota

Network debugging

kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot

Context and Namespace:

Contexts

kubectl config get-contexts kubectl config use-context my-cluster kubectl config current-context

Namespaces

kubectl get namespaces kubectl create namespace production kubectl config set-context --current --namespace=production

Best Practices

  1. Resource Limits

Always set requests and limits

resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m"

  1. Health Checks

Use all three probe types

livenessProbe: # Restart if unhealthy readinessProbe: # Remove from service if not ready startupProbe: # Allow slow startup

  1. Security

Run as non-root

securityContext: runAsNonRoot: true runAsUser: 1000 readOnlyRootFilesystem: true capabilities: drop: - ALL

  1. Labels and Selectors

Use consistent labeling

metadata: labels: app: my-app version: v1 environment: production team: platform

  1. Use Namespaces

Separate environments

  • production
  • staging
  • development
  • monitoring
  • ingress-nginx
  1. ConfigMaps for Configuration

Separate config from code

env:

  • name: CONFIG valueFrom: configMapKeyRef: name: app-config key: config.yaml
  1. Network Policies

Implement zero-trust networking

Deny all by default, allow explicitly

Helm

Create Chart:

helm create my-app

values.yaml:

replicaCount: 3

image: repository: myregistry.io/my-app tag: "1.2.3" pullPolicy: IfNotPresent

service: type: ClusterIP port: 80

ingress: enabled: true className: nginx hosts:

  • host: my-app.example.com paths:
    • path: / pathType: Prefix

resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m"

autoscaling: enabled: true minReplicas: 3 maxReplicas: 10 targetCPUUtilizationPercentage: 70

Helm Commands:

Install

helm install my-app ./my-app-chart helm install my-app ./my-app-chart -f values.yaml helm install my-app ./my-app-chart --set image.tag=2.0.0

Upgrade

helm upgrade my-app ./my-app-chart helm upgrade --install my-app ./my-app-chart

Rollback

helm rollback my-app 1

List and status

helm list helm status my-app helm history my-app

Uninstall

helm uninstall my-app

Approach

When working with Kubernetes:

  • Use Declarative Configuration: YAML files in version control

  • Set Resource Limits: Prevent resource exhaustion

  • Implement Health Checks: Ensure application reliability

  • Use Namespaces: Organize and isolate resources

  • Apply RBAC: Least privilege access control

  • Monitor Everything: Prometheus + Grafana

  • Use GitOps: ArgoCD or Flux for deployments

  • Plan for Failure: Design resilient, self-healing systems

Always design Kubernetes deployments that are scalable, secure, and maintainable following cloud-native principles.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

security-expert

No summary provided by upstream source.

Repository SourceNeeds Review
Security

audit-expert

No summary provided by upstream source.

Repository SourceNeeds Review
General

finance-expert

No summary provided by upstream source.

Repository SourceNeeds Review
General

trading-expert

No summary provided by upstream source.

Repository SourceNeeds Review