Resource Requests and Limits#
Requests and limits control how Kubernetes schedules pods and enforces resource usage. Getting them wrong leads to pods that get evicted, throttled to a crawl, or that starve other workloads on the same node.
Requests vs Limits#
Requests are what the scheduler uses for placement. When you request 500m CPU and 256Mi memory, Kubernetes finds a node with that much allocatable capacity. The request is a guarantee – the kubelet reserves those resources for your container.
Limits are the ceiling. If your container tries to use more memory than its limit, it gets OOMKilled. If it tries to use more CPU than its limit, it gets throttled (not killed).
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 500m
memory: 512MiUnits:
- CPU:
1= 1 vCPU/core.250m= 0.25 cores.100mis a common minimum for lightweight services. - Memory:
Mi(mebibytes) andGi(gibibytes). Do not useMandG(decimal) unless you mean it –128Miis 134MB,128Mis 128MB.
What Happens Without Them#
If you set no requests and no limits, the pod is BestEffort QoS. It can use whatever is available on the node, but it is the first to be evicted when the node runs low on resources. In a production cluster, this is a recipe for random evictions.
If you set requests but no limits, the container is guaranteed its requested resources but can burst above them. This is often the right choice for CPU – let it burst when the node has spare cycles. For memory, this is riskier because the container can grow unbounded until the node OOM killer intervenes.
QoS Classes#
Kubernetes assigns a Quality of Service class to every pod based on its resource configuration:
| QoS Class | Condition | Eviction Priority |
|---|---|---|
| Guaranteed | Every container has requests = limits for both CPU and memory | Last to be evicted |
| Burstable | At least one container has a request or limit set, but they are not all equal | Middle |
| BestEffort | No requests or limits on any container | First to be evicted |
# Guaranteed: requests == limits
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 500m
memory: 512Mi
# Burstable: requests < limits (most common)
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# BestEffort: nothing set (do not do this in production)
# resources: {}For critical workloads (databases, payment services), use Guaranteed. For general web services, Burstable is usually fine.
CPU Throttling#
When a container hits its CPU limit, the kernel throttles it using CFS (Completely Fair Scheduler) bandwidth control. The container does not get killed – it just runs slower. This manifests as increased request latency with no obvious cause in application logs.
Detect throttling by checking the container’s cgroup metrics:
# Exec into the pod and check throttling stats
kubectl exec web-api-6d4f8b7c9-x2k4m -- cat /sys/fs/cgroup/cpu.stat
# Look for: nr_throttled and throttled_usecA growing nr_throttled count means your CPU limit is too low. Recommendation: For most workloads, do not set a CPU limit at all. Set a CPU request (which guarantees scheduling) and let the container burst when the node has capacity. CPU limits cause more problems than they solve unless you need strict multi-tenant isolation.
# Recommended for most services
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
# No CPU limit -- let it burst
memory: 512MiOOMKilled Debugging#
When a container exceeds its memory limit, the kernel OOM killer terminates it. The pod shows OOMKilled in its status.
# Check for OOMKilled
kubectl get pod web-api-6d4f8b7c9-x2k4m -o jsonpath='{.status.containerStatuses[0].lastState}'
# See the exit code (137 = OOMKilled / SIGKILL)
kubectl describe pod web-api-6d4f8b7c9-x2k4m | grep -A5 "Last State"
# Check current memory usage
kubectl top pod web-api-6d4f8b7c9-x2k4mCommon causes of OOMKilled:
- Memory limit too low – the application legitimately needs more memory. Increase the limit.
- Memory leak – the application grows over time. The fix is in the application, not the limit.
- JVM/runtime overhead – your app uses 200Mi but the JVM overhead pushes total container memory past the limit. Account for runtime overhead in your limit. For Java:
-XX:MaxRAMPercentage=75.0keeps the heap at 75% of the container limit.
Node-level OOM is different from container OOM. If the node runs out of memory, the kubelet evicts BestEffort pods first, then Burstable, then Guaranteed. This is why QoS class matters.
LimitRanges and ResourceQuotas#
LimitRange sets default and maximum resource values per container in a namespace. Use it to prevent anyone from deploying pods without resource requests:
apiVersion: v1
kind: LimitRange
metadata:
name: default-resources
namespace: production
spec:
limits:
- type: Container
default:
cpu: 250m
memory: 256Mi
defaultRequest:
cpu: 100m
memory: 128Mi
max:
cpu: "2"
memory: 2Gi
min:
cpu: 50m
memory: 64MiResourceQuota caps the total resources consumed by all pods in a namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"When a ResourceQuota is active, every pod must specify requests and limits for the resources being quota’d. If you enable a memory quota but a pod does not have a memory request, the pod will be rejected. This is why LimitRange with defaults should always accompany ResourceQuota.
Vertical Pod Autoscaler#
The Vertical Pod Autoscaler (VPA) monitors actual resource usage and recommends (or automatically applies) better request/limit values. Run it in recommendation mode first:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-api
updatePolicy:
updateMode: "Off" # "Off" = recommend only, "Auto" = apply changesCheck recommendations with:
kubectl describe vpa web-api-vpaVPA and HPA (Horizontal Pod Autoscaler) can conflict if both try to scale based on CPU. If you use both, have HPA scale on custom metrics and VPA handle resource sizing.
Practical Recommendations#
- Always set memory limits. An unconstrained container can take down the entire node.
- Consider omitting CPU limits to avoid throttling. Use CPU requests to guarantee scheduling.
- Start with generous limits and tighten based on VPA recommendations or
kubectl top poddata over a week. - Set LimitRange defaults on every namespace so no pod runs without resources defined.
- For production databases: Guaranteed QoS with requests equal to limits.