k8s-hpa-cost-tuning

Tune Kubernetes HPA scale-up/down behavior, topology spread, and resource requests to reduce idle cluster capacity. Use when users need to audit cluster costs on a schedule, analyze post-incident scaling behavior, investigate why replicas or nodes do not scale down, or reduce over-reservation and wasted compute resources.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "k8s-hpa-cost-tuning" with this command: npx skills add microlinkhq/skills/microlinkhq-skills-k8s-hpa-cost-tuning

Kubernetes HPA Cost & Scale-Down Tuning

Mode selection (mandatory)

Declare a mode before executing this skill. All reasoning, thresholds, and recommendations depend on this choice.

mode = audit | incident

If no mode is provided, refuse to run and request clarification.

When to use

mode = audit — Periodic cost-savings audit

Run on a schedule (weekly or bi-weekly) to:

  • Detect over-reservation early
  • Validate that scale-down and node consolidation still work
  • Identify safe opportunities to reduce cluster cost

This mode assumes no active incident and prioritizes stability-preserving recommendations.

mode = incident — Post-incident scaling analysis

Run after a production incident or anomaly, attaching:

  • Production logs
  • HPA events
  • Scaling timelines

This mode focuses on:

  • Explaining why scaling behaved the way it did
  • Distinguishing traffic-driven vs configuration-driven incidents
  • Preventing recurrence without overcorrecting

This skill assumes Datadog for observability and standard Kubernetes HPA + Cluster Autoscaler.

Core mental model

Kubernetes scaling is a three-layer system:

  1. HPA decides how many pods (based on usage / requests)
  2. Scheduler decides where pods go (based on requests + constraints)
  3. Cluster Autoscaler decides how many nodes exist (only when nodes can empty)

Cost optimization only works if all three layers can move downward.

Key takeaway: HPA decides quantity, scheduler decides placement, autoscaler decides cost. Scale-up can be aggressive; scale-down must be possible. If replicas drop but nodes do not, the scheduler is the bottleneck.

Key Datadog metrics

The utility scripts query three metric families:

  • CPU used % — real utilization (kubernetes.cpu.usage.total / node.cpu_allocatable)
  • CPU requested % — reserved on paper (kubernetes.cpu.requests / node.cpu_allocatable)
  • Memory used vs requests — HPA-relevant ratio

CPU requested % must go down after scale-down for cost savings to be real. If memory usage stays above target, memory drives scale-up even when CPU is idle.

Scale-down as a first-class cost control

When scale-down is slow or blocked:

  • Replicas plateau
  • Pods remain evenly spread
  • Nodes never empty
  • Cluster Autoscaler cannot remove nodes

Result: permanent over-reservation.

Recommended HPA scale-down policy

scaleDown:
  stabilizationWindowSeconds: 60
  selectPolicy: Max
  policies:
    - type: Percent
      value: 50
      periodSeconds: 30

Effects: fast reaction once load drops, predictable replica collapse, low flapping risk.

Topology spread: critical cost lever

Topology spread must never prevent pod consolidation during scale-down.

Strict constraints block scheduler flexibility and freeze cluster size.

Anti-pattern (breaks cost optimization)

maxSkew: 1
whenUnsatisfiable: DoNotSchedule

Pods cannot collapse onto fewer nodes. Nodes never drain. Reserved CPU/memory never decreases.

Recommended default (cost-safe)

topologySpreadConstraints:
- topologyKey: kubernetes.io/hostname
  maxSkew: 2
  whenUnsatisfiable: ScheduleAnyway

Strong preference for spreading while allowing bin-packing during scale-down and enabling node removal.

Strict isolation (AZ-level only)

When hard guarantees are required:

topologySpreadConstraints:
- topologyKey: topology.kubernetes.io/zone
  maxSkew: 1
  whenUnsatisfiable: DoNotSchedule

Do not combine this with strict hostname-level spread.

Anti-affinity as a soft alternative

To avoid hot nodes without blocking scale-down:

podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
  - weight: 100
    podAffinityTerm:
      topologyKey: kubernetes.io/hostname
      labelSelector:
        matchLabels:
          app: your-app

Anti-affinity is advisory and cost-safe.

Resource requests tuning

  • Over-requesting CPU = slower scale-down
  • Over-requesting memory = unexpected scale-ups

Practical defaults:

  • targetCPUUtilizationPercentage: 70
  • targetMemoryUtilizationPercentage: 75–80

Adjust one knob at a time.

Validation loop

Run weekly (or after changes):

  1. Check HPA current/target values
  2. Compare CPU used % vs CPU requested %
  3. Observe replica collapse after load drops
  4. Verify nodes drain and disappear
  5. Re-check latency, errors, OOMs

Quick validation commands

kubectl -n <namespace> get hpa <deployment>
kubectl -n <namespace> describe hpa <deployment>
kubectl -n <namespace> top pod --containers
kubectl top node
kubectl -n <namespace> get pods -o wide | sort -k7

Utility scripts

Both scripts require Datadog credentials:

export DD_API_KEY=...
export DD_APP_KEY=...
export DD_SITE=datadoghq.com   # optional, defaults to datadoghq.com

audit-metrics.mjs — Cost-savings discovery

Scan a cluster over a wide window (default 24 h) to find over-reservation and waste.

# Cluster-wide audit
node scripts/audit-metrics.mjs --cluster <cluster>

# With deployment deep-dive
node scripts/audit-metrics.mjs \
  --cluster <cluster> \
  --namespace <namespace> \
  --deployment <deployment>

Reports:

  • Cluster: CPU/memory used %, requested %, and waste % (requested minus used)
  • Deployment (when provided): CPU/memory usage vs requests, HPA replica range
  • Savings opportunities: actionable recommendations based on thresholds

incident-metrics.mjs — Post-incident analysis

Collect metrics for a narrow incident window and get a tuning recommendation.

node scripts/incident-metrics.mjs \
  --cluster <cluster> \
  --namespace <namespace> \
  --deployment <deployment> \
  --from <ISO8601> \
  --to <ISO8601>

Reports:

  • Cluster: CPU used % and requested % of allocatable
  • Deployment: CPU/memory usage vs requests, unavailable %
  • HPA: current / desired / max replicas
  • Capacity planning: required allocatable cores for 80 % and 70 % reservation ceilings
  • Tuning order: step-by-step recommendation (one knob at a time)

Interpretation notes

  • Keep limits.memory unchanged unless OOMKills or near-limit memory usage are confirmed
  • Use --out <path> to save full JSON for deeper analysis or diffing across runs
  • Run --help on either script for all options (relative windows, custom HPA name, pretty JSON)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

optimo

No summary provided by upstream source.

Repository SourceNeeds Review
General

k8s-hpa-cost-tuning

No summary provided by upstream source.

Repository SourceNeeds Review
General

nodejs-performance

No summary provided by upstream source.

Repository SourceNeeds Review
Security

Skill Safe Install

L0 级技能安全安装流程。触发“安装技能/安全安装/审查权限”时,强制执行 Step0-5(查重→检索→审查→沙箱→正式安装→白名单)。

Registry SourceRecently Updated
3790Profile unavailable