What is a HorizontalPodAutoscaler?

An HPA automatically adjusts the replica count of a workload based on observed metrics such as CPU or memory utilization. When load rises above the target it adds pods; when load drops it removes them, within the min and max replica bounds you set.

Why does my HPA show unknown for CPU?

Utilization is calculated as a percentage of each pod's resource requests. If the target pods have no resources.requests.cpu set, the HPA cannot compute a percentage and reports unknown. Always declare CPU and memory requests on the workload.

What is the difference between autoscaling v1 and v2?

autoscaling/v1 only supports a single CPU target. autoscaling/v2 is the current stable API and supports multiple metrics — CPU, memory, custom, and external — plus scaling behavior policies. This builder emits v2.

What does the stabilization window do?

The scale-down stabilization window makes the HPA wait before removing pods, smoothing out brief dips in load so it does not thrash up and down. A common value is 300 seconds; scale-up is usually faster to handle spikes.

Do I need the metrics server installed?

Yes. CPU and memory utilization come from the Kubernetes metrics-server (or an equivalent like Prometheus Adapter for custom metrics). Without it the HPA has no data and will not scale.

What is the Kubernetes HorizontalPodAutoscaler Builder?

Build a Kubernetes HorizontalPodAutoscaler manifest (autoscaling/v2) with a target Deployment, min/max replicas, CPU and memory utilization targets, and a scale-down stabilization window. It runs free in your browser on Gera Tools, with nothing uploaded.

Kubernetes HorizontalPodAutoscaler Builder

Name: Kubernetes HorizontalPodAutoscaler Builder
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Generate a HorizontalPodAutoscaler in seconds

An HPA keeps your application responsive under load and cost-efficient when idle by scaling pod replicas based on real metrics. This builder produces an autoscaling/v2 manifest targeting a Deployment or StatefulSet, with CPU and memory utilization targets and a scale-down stabilization window.

How it works

The HPA controller periodically reads metrics for the pods owned by your target workload and compares them against your targets. For a Utilization target, it computes the ratio of current usage to the pod’s declared resource request and averages across all pods. If the average exceeds the target percentage it scales up; if it falls below, it scales down — always staying within minReplicas and maxReplicas.

The desired replica count is roughly ceil(currentReplicas × currentMetric / targetMetric). Because utilization is relative to requests, your target pods must declare resources.requests for the metric you scale on, or the controller reports unknown and never scales.

Scale-up vs scale-down behaviour

Kubernetes applies different default policies to scale-up and scale-down to prevent thrashing:

Direction	Default behaviour	Why
Scale up	Fast, reacts immediately	Traffic spikes need fast response
Scale down	Slow, 5-minute stabilisation window	Prevents removing pods during a brief lull

The builder exposes the scale-down stabilisation window (stabilizationWindowSeconds) because this is the most common value teams tune. A value of 300 seconds (5 minutes) is the Kubernetes default; lower it if your workload genuinely drops off quickly and you want to save cost faster.

Common pitfalls

Pods show “unknown” metrics

The HPA controller cannot compute utilization without resource requests declared on the target pods. Add resources.requests.cpu and resources.requests.memory to your Deployment’s container spec:

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"

HPA shows correct target but never scales

Check that the metrics-server (or Prometheus Adapter) is installed and healthy:

kubectl top pods

If this command fails, metrics are unavailable cluster-wide and no HPA will fire.

Scaling thrashes up and down

Lower your CPU or memory target utilization so there is more headroom before scale-down fires, or increase the stabilisation window. Alternatively, add a second metric (for example, request rate via a custom metric) so the HPA has a more stable signal.

Tips and notes

Always set resource requests on the target Deployment — they are the denominator for utilization math.
Use a scale-down stabilization window (e.g. 300s) to prevent flapping, and keep scale-up responsive for traffic spikes.
Combine CPU and memory targets when a workload is bound by both; the HPA scales to satisfy whichever demands more replicas.
The metrics-server must be installed in the cluster for CPU/memory targets to work.
StatefulSets can be autoscaled with an HPA, but be cautious if your application has per-pod state (databases, sharded services) — scaling down may lose data if the application does not handle it gracefully.