Generate a HorizontalPodAutoscaler in seconds
An HPA keeps your application responsive under load and cost-efficient when idle by scaling pod replicas based on real metrics. This builder produces an autoscaling/v2 manifest targeting a Deployment or StatefulSet, with CPU and memory utilization targets and a scale-down stabilization window.
How it works
The HPA controller periodically reads metrics for the pods owned by your target workload and compares them against your targets. For a Utilization target, it computes the ratio of current usage to the pod’s declared resource request and averages across all pods. If the average exceeds the target percentage it scales up; if it falls below, it scales down — always staying within minReplicas and maxReplicas.
The desired replica count is roughly ceil(currentReplicas × currentMetric / targetMetric). Because utilization is relative to requests, your target pods must declare resources.requests for the metric you scale on, or the controller reports unknown and never scales.
Tips and notes
- Always set resource requests on the target Deployment — they are the denominator for utilization math.
- Use a scale-down stabilization window (e.g. 300s) to prevent flapping, and keep scale-up responsive for traffic spikes.
- Combine CPU and memory targets when a workload is bound by both; the HPA scales to satisfy whichever demands more replicas.
- The metrics-server must be installed in the cluster for CPU/memory targets to work.