Vertical Pod Autoscaler (VPA)#

Horizontal scaling adds more pod replicas. Vertical scaling gives each pod more (or fewer) resources. VPA automates the vertical side by watching actual CPU and memory usage over time and adjusting resource requests to match reality. Without it, teams guess at resource requests during initial deployment and rarely revisit them, leading to either waste (over-provisioned) or instability (under-provisioned).

What VPA Does#

VPA monitors historical and current resource usage for pods in a target Deployment (or StatefulSet, DaemonSet, etc.) and produces recommendations for CPU and memory requests. Depending on the configured mode, it either reports these recommendations passively or actively applies them by evicting and recreating pods with updated requests.

VPA does not change resource limits directly. It adjusts requests, and if you have a limit-to-request ratio in your container spec, the admission controller can maintain that ratio.

VPA Components#

VPA runs as three components in the cluster:

Component	Role
Recommender	Reads metrics from the metrics API (metrics-server or Prometheus), analyzes usage patterns, and produces CPU/memory recommendations.
Updater	Watches for pods whose requests differ from VPA recommendations. When the difference exceeds a threshold, it evicts those pods so they are recreated with new values.
Admission Controller	Intercepts pod creation and mutates the resource requests to match VPA recommendations before the pod starts.

In Off mode, only the Recommender is active. The Updater and Admission Controller do nothing.

Installation#

VPA is not a built-in Kubernetes component. Install it from the upstream autoscaler repository or via Helm.

# Option 1: From the upstream repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Option 2: Via Helm (Fairwinds chart)
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa --namespace vpa --create-namespace

# Verify components are running
kubectl get pods -n vpa

VPA requires metrics-server (or a Prometheus adapter exposing resource metrics) to function. Without a metrics source, the Recommender has no data to analyze.

VPA Modes#

The updateMode in the VPA spec controls how aggressively VPA applies its recommendations:

Off Mode (Recommendation Only)#

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: "Off"

VPA analyzes usage and produces recommendations but does not change anything. This is the safest starting point. Run VPA in Off mode for a week or two, review the recommendations, and decide if they make sense before enabling automatic updates.

Initial Mode#

updatePolicy:
  updateMode: "Initial"

VPA sets resource requests only when a pod is first created (or recreated for other reasons). It does not evict existing pods to apply changes. This is useful for workloads that should not be disrupted but where you want new pods to start with better values.

Auto Mode#

updatePolicy:
  updateMode: "Auto"

VPA actively evicts pods to apply updated recommendations. When the Recommender produces new values that differ significantly from current requests, the Updater evicts pods one at a time. The Admission Controller then sets the new requests when the replacement pods are created by the Deployment controller.

Resource Policy: Setting Bounds#

The resourcePolicy lets you constrain VPA recommendations to prevent extreme values. Without bounds, VPA might set CPU requests to 1m for a pod that genuinely needs 100m during startup, or set memory to 16Gi for a pod with a temporary spike.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: "4"
          memory: 4Gi
        controlledResources: ["cpu", "memory"]
      - containerName: sidecar
        mode: "Off"  # Do not touch this container

Key resourcePolicy options:

minAllowed / maxAllowed: Floor and ceiling for recommendations.
controlledResources: Which resources VPA manages. Set to ["memory"] to have VPA handle only memory.
mode: "Off": Exclude a specific container from VPA management.

Reading VPA Recommendations#

kubectl describe vpa web-api-vpa -n production

The output includes a Recommendation section:

Recommendation:
  Container Recommendations:
    Container Name: api
    Lower Bound:
      Cpu:     100m
      Memory:  200Mi
    Target:
      Cpu:     250m
      Memory:  384Mi
    Uncapped Target:
      Cpu:     250m
      Memory:  384Mi
    Upper Bound:
      Cpu:     800m
      Memory:  1Gi

Field	Meaning
Target	The recommended request values. This is what VPA would set.
Lower Bound	The minimum request that VPA considers safe. Going below this risks instability.
Upper Bound	The maximum request VPA has seen needed. Useful for setting limits.
Uncapped Target	What VPA would recommend without `minAllowed`/`maxAllowed` constraints. If this differs from Target, your bounds are capping the recommendation.

Use the Target values as your baseline. If Target and Uncapped Target differ, review whether your bounds are too restrictive.

VPA and HPA Conflicts#

VPA and HPA can conflict when both operate on CPU. HPA scales the number of replicas based on CPU utilization (current usage divided by request). VPA changes the CPU request. If VPA raises the CPU request, the utilization percentage drops, and HPA scales down replicas. If VPA lowers the CPU request, utilization spikes, and HPA scales up aggressively. They end up fighting.

Solutions to the conflict:

Option 1: Split resources. Use VPA for memory only, HPA for CPU.

# VPA controls memory only
resourcePolicy:
  containerPolicies:
    - containerName: api
      controlledResources: ["memory"]
      minAllowed:
        memory: 128Mi
      maxAllowed:
        memory: 2Gi

# HPA scales on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Option 2: VPA in Off mode. Run VPA solely for recommendations. Apply changes manually during maintenance windows. HPA runs normally.

Option 3: HPA on custom metrics. If HPA scales on a custom metric (requests per second, queue depth), it does not conflict with VPA adjusting CPU requests.

VPA and Resource Quotas#

If a namespace has a ResourceQuota, VPA might try to set requests that exceed the quota. The pod creation will be rejected by the quota admission controller. Use maxAllowed in the VPA resource policy to stay within quota limits:

resourcePolicy:
  containerPolicies:
    - containerName: api
      maxAllowed:
        cpu: "2"       # Namespace quota allows max 10 CPU total
        memory: 2Gi    # Keep individual pods under control

Goldilocks: VPA Dashboard#

Goldilocks is an open-source tool by Fairwinds that creates a VPA in Off mode for every Deployment in labeled namespaces and provides a web dashboard showing recommendations.

# Install Goldilocks
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install goldilocks fairwinds-stable/goldilocks --namespace goldilocks --create-namespace

# Enable a namespace for Goldilocks
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

# Port-forward the dashboard
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80

Goldilocks shows a per-deployment view of current requests vs VPA recommendations with color-coded indicators for over-provisioned and under-provisioned resources. It is a practical way to get visibility across dozens of services without manually creating VPA objects.

When to Use VPA vs Manual Tuning#

VPA is most valuable in these scenarios:

Large clusters with many services where no one has time to right-size each one.
Workloads with usage patterns that change over time (seasonal traffic, evolving features).
New services where you genuinely do not know the right resource values yet.

Manual tuning is better for:

Critical services (databases, payment processing) where you understand the resource profile deeply and want explicit control.
Services with extremely spiky usage patterns that VPA’s historical averaging might not handle well.

Common Gotchas#

VPA evicts pods to resize them. In Auto mode, VPA causes pod restarts. Each restart is a brief disruption for that pod instance. If your service has only 1 replica with no PodDisruptionBudget, VPA can cause downtime. Always run at least 2 replicas and define PDBs for services managed by VPA in Auto mode.

Recommendations need time to stabilize. VPA needs at least 24-48 hours of metric data to produce reliable recommendations. The first few hours of recommendations may be volatile. Do not enable Auto mode on day one.

VPA does not resize in place. Unlike some container runtimes that support live resizing, Kubernetes VPA must evict and recreate the pod to change its requests. In-place resource resize (KEP-1287) is an alpha feature as of Kubernetes 1.32 and is not yet integrated with VPA.

Metrics source is required. Without metrics-server or a Prometheus adapter, VPA has no data and produces no recommendations. Verify metrics availability with kubectl top pods before deploying VPA.

Init containers and sidecar containers. VPA can manage init container resources, but its recommendations may be inaccurate for containers that run briefly. Use mode: "Off" in the resource policy for containers that VPA should not touch, including short-lived init containers and injected sidecars whose resource profiles are managed by their operators.