StatefulSets and Persistent Storage#

Deployments treat pods as interchangeable. StatefulSets do not – each pod gets a stable hostname, a persistent volume, and an ordered startup sequence. This is what you need for databases, message queues, and any workload where identity matters.

StatefulSet vs Deployment#

Feature Deployment StatefulSet
Pod names Random suffix (web-api-6d4f8) Ordinal index (postgres-0, postgres-1)
Startup order All at once Sequential (0, then 1, then 2)
Stable network identity No Yes, via headless Service
Persistent storage Shared or none Per-pod via volumeClaimTemplates
Scaling down Removes random pods Removes highest ordinal first

Use StatefulSets when your application needs any of: stable hostnames, ordered deployment/scaling, or per-pod persistent storage. Common examples: PostgreSQL, MySQL, Redis Sentinel, Kafka, ZooKeeper, Elasticsearch.

Stable Network Identity#

A StatefulSet requires a headless Service (one with clusterIP: None). Each pod gets a DNS record in the form <pod-name>.<service-name>.<namespace>.svc.cluster.local.

apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:16
        ports:
        - containerPort: 5432
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: pg-secret
              key: password
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: standard
      resources:
        requests:
          storage: 10Gi

The pods will be named postgres-0, postgres-1, postgres-2. Their DNS names are:

  • postgres-0.postgres.default.svc.cluster.local
  • postgres-1.postgres.default.svc.cluster.local
  • postgres-2.postgres.default.svc.cluster.local

Applications can address specific replicas by name. This is how PostgreSQL replication knows where the primary is, and how Kafka brokers find each other.

volumeClaimTemplates#

The volumeClaimTemplates block creates a separate PersistentVolumeClaim for each pod. When postgres-0 starts, Kubernetes creates a PVC named data-postgres-0. When postgres-1 starts, it gets data-postgres-1.

kubectl get pvc
# NAME              STATUS   VOLUME        CAPACITY   ACCESS MODES   STORAGECLASS
# data-postgres-0   Bound    pv-abc123     10Gi       RWO            standard
# data-postgres-1   Bound    pv-def456     10Gi       RWO            standard
# data-postgres-2   Bound    pv-ghi789     10Gi       RWO            standard

If a pod is deleted and rescheduled, it reattaches to the same PVC. The data survives pod restarts. This is the core guarantee of StatefulSets.

PersistentVolumes, PVCs, and StorageClasses#

The storage stack has three layers:

StorageClass defines how storage is provisioned. Most cloud providers include default StorageClasses.

kubectl get storageclass
# NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE
# standard (default)   k8s.io/minikube-hostpath   Delete       Immediate
# gp3                  ebs.csi.aws.com            Delete       WaitForFirstConsumer

PersistentVolumeClaim (PVC) is a request for storage. It references a StorageClass and specifies the size and access mode.

PersistentVolume (PV) is the actual storage. With dynamic provisioning (the default on most clusters), PVs are created automatically when a PVC is created. You rarely need to create PVs manually.

VOLUMEBINDINGMODE matters. WaitForFirstConsumer delays PV creation until a pod that uses the PVC is scheduled. This ensures the PV is created in the same availability zone as the pod. Immediate creates the PV right away, which can cause zone mismatches in multi-AZ clusters.

Ordered Pod Management#

By default, StatefulSets use OrderedReady pod management:

  • Scale up: Pods are created in order. postgres-1 does not start until postgres-0 is Running and Ready.
  • Scale down: Pods are removed in reverse order. postgres-2 is terminated before postgres-1.
  • Updates: Pods are updated in reverse ordinal order by default (highest first).

If your application does not need strict ordering (for example, a cache cluster where all nodes are peers), use parallel management:

spec:
  podManagementPolicy: Parallel

This starts all pods simultaneously, which is faster but does not guarantee startup order.

Resizing PVCs#

You can expand PVCs if the StorageClass allows it (most do). You cannot shrink them.

# Check if expansion is allowed
kubectl get storageclass standard -o jsonpath='{.allowVolumeExpansion}'
# true

To resize:

kubectl patch pvc data-postgres-0 -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'

Some storage providers require the pod to be restarted for the filesystem to expand. Check the PVC status:

kubectl get pvc data-postgres-0 -o jsonpath='{.status.conditions}'
# If you see "FileSystemResizePending", delete the pod to trigger the resize
kubectl delete pod postgres-0
# StatefulSet will recreate it, and the filesystem will be expanded on mount

The PVC Deletion Gotcha#

Deleting a StatefulSet does not delete its PVCs. This is intentional – it protects your data. But it catches people in two ways:

1. Data persists after you think you cleaned up. You delete the StatefulSet, redeploy it, and the old data is still there because the PVCs were rebound to the new pods.

# Delete the StatefulSet but PVCs remain
kubectl delete statefulset postgres
kubectl get pvc
# data-postgres-0, data-postgres-1, data-postgres-2 are all still there

# To fully clean up, delete PVCs explicitly
kubectl delete pvc data-postgres-0 data-postgres-1 data-postgres-2

2. Stale PVCs block fresh starts. You delete a StatefulSet to start fresh, but the old PVCs with old data get reattached. If you changed database credentials or schema, the old data causes errors. Delete the PVCs before redeploying.

3. The persistentVolumeClaimRetentionPolicy field (stable in Kubernetes 1.27+) lets you control this:

spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Delete    # Delete PVCs when StatefulSet is deleted
    whenScaled: Retain     # Keep PVCs when scaling down

Practical Patterns#

Database with init script:

initContainers:
- name: init-db
  image: postgres:16
  command: ['sh', '-c', 'cp /config/init.sql /docker-entrypoint-initdb.d/']
  volumeMounts:
  - name: config
    mountPath: /config
  - name: initdb
    mountPath: /docker-entrypoint-initdb.d

Backup before upgrade: Before upgrading a StatefulSet that runs a database, snapshot the PVCs or run a logical backup. StatefulSet updates are not as easily rolled back as Deployments because the data may have been migrated.

# Create a VolumeSnapshot (requires CSI snapshot support)
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-backup-before-upgrade
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: data-postgres-0
EOF

When NOT to use StatefulSets: If your app stores state in an external database and just needs persistent cache or scratch space, a Deployment with a PVC is simpler. StatefulSets add operational complexity – use them only when you need stable network identity or per-pod storage.