Pod Topology Spread Constraints#

Pod anti-affinity gives you binary control: either a pod avoids another pod’s topology domain or it does not. But it does not give you even distribution. If you have 6 replicas and 3 zones, anti-affinity cannot express “put exactly 2 in each zone.” Topology spread constraints solve this by letting you specify the maximum allowed imbalance between any two topology domains.

How Topology Spread Works#

A topology spread constraint defines:

Which topology domains to spread across (via topologyKey)
How much imbalance is acceptable (via maxSkew)
What to do when the constraint cannot be met (via whenUnsatisfiable)
Which pods count toward the distribution (via labelSelector)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
spec:
  replicas: 6
  template:
    metadata:
      labels:
        app: web-api
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: web-api
      containers:
        - name: api
          image: web-api:latest

With 6 replicas, 3 zones, and maxSkew: 1, the scheduler distributes pods as evenly as possible. The result is 2 pods per zone. If one zone already has 2 pods and another has 1, the scheduler places the next pod in the zone with fewer pods to keep the skew within 1.

The maxSkew Parameter#

maxSkew is the maximum allowed difference in pod count between any two topology domains. It is always a positive integer.

maxSkew: 1 – the strictest possible. Domains can differ by at most 1 pod. With 6 pods across 3 zones, you get 2-2-2. With 7 pods, you get 3-2-2 or 2-3-2 or 2-2-3.
maxSkew: 2 – more relaxed. One zone can have up to 2 more pods than another. With 6 pods across 3 zones, you could get 4-1-1.

In most production scenarios, maxSkew: 1 is the right choice for zone-level spreading. Use higher values only when you need scheduling flexibility and can tolerate uneven distribution.

whenUnsatisfiable#

This controls what happens when the constraint cannot be met:

Value	Behavior
DoNotSchedule	Hard constraint. The pod stays Pending if placing it would violate maxSkew.
ScheduleAnyway	Soft constraint. The scheduler minimizes skew as a scoring factor but still schedules the pod even if skew is exceeded.

topologySpreadConstraints:
  # Hard: zone spread is critical for HA
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: web-api
  # Soft: node spread is nice to have
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: web-api

Use DoNotSchedule for failure domain spreading that is truly critical (zones, regions). Use ScheduleAnyway for best-effort spreading (across nodes within a zone) where you prefer even distribution but cannot afford Pending pods.

minDomains#

minDomains ensures pods spread across at least a minimum number of topology domains. This prevents all pods from piling into a single zone when a cluster has few nodes.

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    minDomains: 3
    labelSelector:
      matchLabels:
        app: web-api

Without minDomains, if your cluster only has nodes in one zone, the scheduler happily places all pods there (skew is 0 since there is only one domain). With minDomains: 3, the scheduler treats missing zones as having 0 pods. If all 6 pods are in one zone, the skew is 6-0=6, which violates maxSkew: 1, so scheduling is blocked until nodes exist in enough zones.

minDomains requires whenUnsatisfiable: DoNotSchedule and the MinDomainsInPodTopologySpread feature gate (stable since Kubernetes 1.30).

Multiple Constraints#

You can combine multiple topology spread constraints to achieve multi-level spreading. The scheduler must satisfy all constraints simultaneously.

topologySpreadConstraints:
  # Level 1: spread across zones
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: web-api
  # Level 2: spread across nodes within each zone
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: web-api

This gives you two-dimensional spreading: even across zones (hard requirement) and even across nodes within each zone (soft preference). For 6 replicas across 3 zones with 2 nodes per zone, the ideal result is 1 pod per node.

Interaction with Node Affinity#

Topology spread constraints only consider nodes that the pod is eligible to run on. If you combine spread constraints with node affinity, the scheduler first filters to matching nodes and then evaluates spread within that subset.

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: node-type
                operator: In
                values:
                  - compute
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app: web-api

Here, the pod only runs on node-type=compute nodes, and the spread is calculated only across those nodes. If compute nodes exist in only 2 of 3 zones, the scheduler spreads across 2 zones, not 3.

Interaction with Pod Affinity#

Topology spread constraints and pod affinity/anti-affinity work independently. The scheduler must satisfy both. This can create conflicts:

Topology spread says “distribute evenly across zones”
Pod affinity says “run near Redis pods, which are only in zone-a”

In this case, the pod might stay Pending because it cannot be both evenly spread and co-located with Redis. If this happens, use ScheduleAnyway for the spread constraint or preferred for the affinity.

Cluster-Level Defaults#

You can set default topology spread constraints for all pods at the cluster level by configuring the kube-scheduler. This is useful when you want every workload to spread across zones without requiring every team to add constraints to their specs.

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - pluginConfig:
      - name: PodTopologySpread
        args:
          defaultConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
          defaultingType: List

Cluster defaults are overridden by any topologySpreadConstraints defined in the pod spec.

Pod Anti-Affinity vs Topology Spread Constraints#

Both features control pod distribution, but they work differently:

Aspect	Pod Anti-Affinity	Topology Spread Constraints
Control	Binary: avoid or do not avoid	Numeric: maxSkew defines allowed imbalance
Even distribution	Cannot enforce even spread	Designed for even spread
Performance	Expensive to evaluate at scale	More efficient at scale
Flexibility	required or preferred	DoNotSchedule or ScheduleAnyway, plus maxSkew tuning
Simplicity	Simpler to understand	More parameters to configure
Best for	“No two replicas on the same node”	“Spread 6 replicas evenly across 3 zones”

Use pod anti-affinity when you need simple binary exclusion (no two pods on the same node). Use topology spread constraints when you need controlled, even distribution across multiple domains.

Common Gotchas#

Label selector must match the pods being scheduled. The labelSelector in a topology spread constraint should match the labels of the pods in the same Deployment. If it does not match, the constraint has no effect because no existing pods count toward the distribution. The scheduler sees zero pods in every domain and considers skew to be 0.

# Verify labels match
kubectl get pods -l app=web-api --show-labels

maxSkew=1 with fewer replicas than domains. If you have 3 zones and only 2 replicas with maxSkew: 1 and DoNotSchedule, one zone will always be empty. This is fine – the skew is 1-0=1, which satisfies the constraint. But if you set minDomains: 3 with only 2 replicas, you get 1-1-0 which has a skew of 1, still satisfying maxSkew: 1. However, having minDomains greater than your replica count can cause issues if skew math does not work out at larger scales.

Topology domains with no eligible nodes are ignored. If zone-c exists but has no nodes matching your nodeSelector, it is not counted as a topology domain. Pods will not be Pending waiting for a domain that has no eligible nodes.

Rollout interactions. During a rolling update, old and new pods both count toward the spread calculation. If you have a strict maxSkew: 1 across 3 zones with 3 replicas, and a rolling update creates a 4th pod before terminating an old one, the 4th pod might be temporarily Pending until the surge resolves. Set appropriate maxSurge and maxUnavailable in your Deployment strategy.