StatefulSets and Persistent Storage#
Deployments treat pods as interchangeable. StatefulSets do not – each pod gets a stable hostname, a persistent volume, and an ordered startup sequence. This is what you need for databases, message queues, and any workload where identity matters.
StatefulSet vs Deployment#
| Feature | Deployment | StatefulSet |
|---|---|---|
| Pod names | Random suffix (web-api-6d4f8) |
Ordinal index (postgres-0, postgres-1) |
| Startup order | All at once | Sequential (0, then 1, then 2) |
| Stable network identity | No | Yes, via headless Service |
| Persistent storage | Shared or none | Per-pod via volumeClaimTemplates |
| Scaling down | Removes random pods | Removes highest ordinal first |
Use StatefulSets when your application needs any of: stable hostnames, ordered deployment/scaling, or per-pod persistent storage. Common examples: PostgreSQL, MySQL, Redis Sentinel, Kafka, ZooKeeper, Elasticsearch.
Stable Network Identity#
A StatefulSet requires a headless Service (one with clusterIP: None). Each pod gets a DNS record in the form <pod-name>.<service-name>.<namespace>.svc.cluster.local.
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
clusterIP: None
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16
ports:
- containerPort: 5432
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: pg-secret
key: password
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 10GiThe pods will be named postgres-0, postgres-1, postgres-2. Their DNS names are:
postgres-0.postgres.default.svc.cluster.localpostgres-1.postgres.default.svc.cluster.localpostgres-2.postgres.default.svc.cluster.local
Applications can address specific replicas by name. This is how PostgreSQL replication knows where the primary is, and how Kafka brokers find each other.
volumeClaimTemplates#
The volumeClaimTemplates block creates a separate PersistentVolumeClaim for each pod. When postgres-0 starts, Kubernetes creates a PVC named data-postgres-0. When postgres-1 starts, it gets data-postgres-1.
kubectl get pvc
# NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS
# data-postgres-0 Bound pv-abc123 10Gi RWO standard
# data-postgres-1 Bound pv-def456 10Gi RWO standard
# data-postgres-2 Bound pv-ghi789 10Gi RWO standardIf a pod is deleted and rescheduled, it reattaches to the same PVC. The data survives pod restarts. This is the core guarantee of StatefulSets.
PersistentVolumes, PVCs, and StorageClasses#
The storage stack has three layers:
StorageClass defines how storage is provisioned. Most cloud providers include default StorageClasses.
kubectl get storageclass
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
# standard (default) k8s.io/minikube-hostpath Delete Immediate
# gp3 ebs.csi.aws.com Delete WaitForFirstConsumerPersistentVolumeClaim (PVC) is a request for storage. It references a StorageClass and specifies the size and access mode.
PersistentVolume (PV) is the actual storage. With dynamic provisioning (the default on most clusters), PVs are created automatically when a PVC is created. You rarely need to create PVs manually.
VOLUMEBINDINGMODE matters. WaitForFirstConsumer delays PV creation until a pod that uses the PVC is scheduled. This ensures the PV is created in the same availability zone as the pod. Immediate creates the PV right away, which can cause zone mismatches in multi-AZ clusters.
Ordered Pod Management#
By default, StatefulSets use OrderedReady pod management:
- Scale up: Pods are created in order.
postgres-1does not start untilpostgres-0is Running and Ready. - Scale down: Pods are removed in reverse order.
postgres-2is terminated beforepostgres-1. - Updates: Pods are updated in reverse ordinal order by default (highest first).
If your application does not need strict ordering (for example, a cache cluster where all nodes are peers), use parallel management:
spec:
podManagementPolicy: ParallelThis starts all pods simultaneously, which is faster but does not guarantee startup order.
Resizing PVCs#
You can expand PVCs if the StorageClass allows it (most do). You cannot shrink them.
# Check if expansion is allowed
kubectl get storageclass standard -o jsonpath='{.allowVolumeExpansion}'
# trueTo resize:
kubectl patch pvc data-postgres-0 -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'Some storage providers require the pod to be restarted for the filesystem to expand. Check the PVC status:
kubectl get pvc data-postgres-0 -o jsonpath='{.status.conditions}'
# If you see "FileSystemResizePending", delete the pod to trigger the resize
kubectl delete pod postgres-0
# StatefulSet will recreate it, and the filesystem will be expanded on mountThe PVC Deletion Gotcha#
Deleting a StatefulSet does not delete its PVCs. This is intentional – it protects your data. But it catches people in two ways:
1. Data persists after you think you cleaned up. You delete the StatefulSet, redeploy it, and the old data is still there because the PVCs were rebound to the new pods.
# Delete the StatefulSet but PVCs remain
kubectl delete statefulset postgres
kubectl get pvc
# data-postgres-0, data-postgres-1, data-postgres-2 are all still there
# To fully clean up, delete PVCs explicitly
kubectl delete pvc data-postgres-0 data-postgres-1 data-postgres-22. Stale PVCs block fresh starts. You delete a StatefulSet to start fresh, but the old PVCs with old data get reattached. If you changed database credentials or schema, the old data causes errors. Delete the PVCs before redeploying.
3. The persistentVolumeClaimRetentionPolicy field (stable in Kubernetes 1.27+) lets you control this:
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Delete # Delete PVCs when StatefulSet is deleted
whenScaled: Retain # Keep PVCs when scaling downPractical Patterns#
Database with init script:
initContainers:
- name: init-db
image: postgres:16
command: ['sh', '-c', 'cp /config/init.sql /docker-entrypoint-initdb.d/']
volumeMounts:
- name: config
mountPath: /config
- name: initdb
mountPath: /docker-entrypoint-initdb.dBackup before upgrade: Before upgrading a StatefulSet that runs a database, snapshot the PVCs or run a logical backup. StatefulSet updates are not as easily rolled back as Deployments because the data may have been migrated.
# Create a VolumeSnapshot (requires CSI snapshot support)
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: postgres-backup-before-upgrade
spec:
volumeSnapshotClassName: csi-snapclass
source:
persistentVolumeClaimName: data-postgres-0
EOFWhen NOT to use StatefulSets: If your app stores state in an external database and just needs persistent cache or scratch space, a Deployment with a PVC is simpler. StatefulSets add operational complexity – use them only when you need stable network identity or per-pod storage.