Resource Requests and Limits#

Requests and limits control how Kubernetes schedules pods and enforces resource usage. Getting them wrong leads to pods that get evicted, throttled to a crawl, or that starve other workloads on the same node.

Requests vs Limits#

Requests are what the scheduler uses for placement. When you request 500m CPU and 256Mi memory, Kubernetes finds a node with that much allocatable capacity. The request is a guarantee – the kubelet reserves those resources for your container.

Limits are the ceiling. If your container tries to use more memory than its limit, it gets OOMKilled. If it tries to use more CPU than its limit, it gets throttled (not killed).

resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

Units:

CPU: 1 = 1 vCPU/core. 250m = 0.25 cores. 100m is a common minimum for lightweight services.
Memory: Mi (mebibytes) and Gi (gibibytes). Do not use M and G (decimal) unless you mean it – 128Mi is 134MB, 128M is 128MB.

What Happens Without Them#

If you set no requests and no limits, the pod is BestEffort QoS. It can use whatever is available on the node, but it is the first to be evicted when the node runs low on resources. In a production cluster, this is a recipe for random evictions.

If you set requests but no limits, the container is guaranteed its requested resources but can burst above them. This is often the right choice for CPU – let it burst when the node has spare cycles. For memory, this is riskier because the container can grow unbounded until the node OOM killer intervenes.

QoS Classes#

Kubernetes assigns a Quality of Service class to every pod based on its resource configuration:

QoS Class	Condition	Eviction Priority
Guaranteed	Every container has requests = limits for both CPU and memory	Last to be evicted
Burstable	At least one container has a request or limit set, but they are not all equal	Middle
BestEffort	No requests or limits on any container	First to be evicted

# Guaranteed: requests == limits
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 500m
    memory: 512Mi

# Burstable: requests < limits (most common)
resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

# BestEffort: nothing set (do not do this in production)
# resources: {}

For critical workloads (databases, payment services), use Guaranteed. For general web services, Burstable is usually fine.

CPU Throttling#

When a container hits its CPU limit, the kernel throttles it using CFS (Completely Fair Scheduler) bandwidth control. The container does not get killed – it just runs slower. This manifests as increased request latency with no obvious cause in application logs.

Detect throttling by checking the container’s cgroup metrics:

# Exec into the pod and check throttling stats
kubectl exec web-api-6d4f8b7c9-x2k4m -- cat /sys/fs/cgroup/cpu.stat
# Look for: nr_throttled and throttled_usec

A growing nr_throttled count means your CPU limit is too low. Recommendation: For most workloads, do not set a CPU limit at all. Set a CPU request (which guarantees scheduling) and let the container burst when the node has capacity. CPU limits cause more problems than they solve unless you need strict multi-tenant isolation.

# Recommended for most services
resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    # No CPU limit -- let it burst
    memory: 512Mi

OOMKilled Debugging#

When a container exceeds its memory limit, the kernel OOM killer terminates it. The pod shows OOMKilled in its status.

# Check for OOMKilled
kubectl get pod web-api-6d4f8b7c9-x2k4m -o jsonpath='{.status.containerStatuses[0].lastState}'

# See the exit code (137 = OOMKilled / SIGKILL)
kubectl describe pod web-api-6d4f8b7c9-x2k4m | grep -A5 "Last State"

# Check current memory usage
kubectl top pod web-api-6d4f8b7c9-x2k4m

Common causes of OOMKilled:

Memory limit too low – the application legitimately needs more memory. Increase the limit.
Memory leak – the application grows over time. The fix is in the application, not the limit.
JVM/runtime overhead – your app uses 200Mi but the JVM overhead pushes total container memory past the limit. Account for runtime overhead in your limit. For Java: -XX:MaxRAMPercentage=75.0 keeps the heap at 75% of the container limit.

Node-level OOM is different from container OOM. If the node runs out of memory, the kubelet evicts BestEffort pods first, then Burstable, then Guaranteed. This is why QoS class matters.

LimitRanges and ResourceQuotas#

LimitRange sets default and maximum resource values per container in a namespace. Use it to prevent anyone from deploying pods without resource requests:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-resources
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: 250m
      memory: 256Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "2"
      memory: 2Gi
    min:
      cpu: 50m
      memory: 64Mi

ResourceQuota caps the total resources consumed by all pods in a namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"

When a ResourceQuota is active, every pod must specify requests and limits for the resources being quota’d. If you enable a memory quota but a pod does not have a memory request, the pod will be rejected. This is why LimitRange with defaults should always accompany ResourceQuota.

Vertical Pod Autoscaler#

The Vertical Pod Autoscaler (VPA) monitors actual resource usage and recommends (or automatically applies) better request/limit values. Run it in recommendation mode first:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: "Off"  # "Off" = recommend only, "Auto" = apply changes

Check recommendations with:

kubectl describe vpa web-api-vpa

VPA and HPA (Horizontal Pod Autoscaler) can conflict if both try to scale based on CPU. If you use both, have HPA scale on custom metrics and VPA handle resource sizing.

Practical Recommendations#

Always set memory limits. An unconstrained container can take down the entire node.
Consider omitting CPU limits to avoid throttling. Use CPU requests to guarantee scheduling.
Start with generous limits and tighten based on VPA recommendations or kubectl top pod data over a week.
Set LimitRange defaults on every namespace so no pod runs without resources defined.
For production databases: Guaranteed QoS with requests equal to limits.