etcd Maintenance for Self-Managed Clusters#

etcd is the backing store for all Kubernetes cluster state. Every object – pods, services, secrets, configmaps – lives in etcd. If etcd is unhealthy, your cluster is unhealthy. If etcd data is lost, your cluster is gone. Managed Kubernetes services (EKS, GKE, AKS) handle etcd for you, but self-managed clusters require you to operate it directly.

All etcdctl commands below require TLS flags. Set these as environment variables to avoid repeating them:

export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key

Health Checks#

Endpoint Health#

etcdctl endpoint health --cluster
# https://10.0.1.10:2379 is healthy: successfully committed proposal: took = 2.345ms
# https://10.0.1.11:2379 is healthy: successfully committed proposal: took = 3.012ms
# https://10.0.1.12:2379 is healthy: successfully committed proposal: took = 2.789ms

The --cluster flag checks all members, not just the local endpoint. A “took” value consistently above 100ms indicates performance problems.

Endpoint Status#

etcdctl endpoint status --cluster --write-table
# +------------------------+------------------+---------+---------+-----------+...
# |        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER |
# +------------------------+------------------+---------+---------+-----------+...
# | https://10.0.1.10:2379 | 8e9e05c52164694d |  3.5.15 |  45 MB  |   true    |
# | https://10.0.1.11:2379 | 4f3e8a2b1c7d9e0f |  3.5.15 |  45 MB  |   false   |
# | https://10.0.1.12:2379 | 2a1b3c4d5e6f7a8b |  3.5.15 |  44 MB  |   false   |
# +------------------------+------------------+---------+---------+-----------+...

Watch for: DB sizes that differ significantly across members (indicates a member fell behind), frequent leader changes (network instability), or DB size approaching the quota (default 2 GB).

Snapshot Backup#

etcd snapshots are the primary backup mechanism. Take snapshots regularly and store them off-cluster.

# Take snapshot from the leader
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db

# Verify the snapshot
etcdctl snapshot status /backup/etcd-20260222-020000.db --write-table
# +----------+----------+------------+------------+
# |   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
# +----------+----------+------------+------------+
# | 6d15a8c2 |  4528901 |       1284 |    45 MB   |
# +----------+----------+------------+------------+

Automate backups with a cron job or a Kubernetes CronJob running on the control plane node:

# /etc/cron.d/etcd-backup
0 */6 * * * root ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +\%Y\%m\%d-\%H\%M\%S).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key && \
  find /backup -name "etcd-*.db" -mtime +7 -delete

This takes a snapshot every 6 hours and removes backups older than 7 days.

Snapshot Restore#

Restoring from snapshot is a destructive operation. It creates a new cluster with a new cluster ID. All existing members must restore from the same snapshot.

Single-Node Restore#

# Stop the API server and etcd
systemctl stop kubelet
# If etcd runs as a static pod, move its manifest
mv /etc/kubernetes/manifests/etcd.yaml /tmp/

# Remove existing data
rm -rf /var/lib/etcd/member

# Restore from snapshot
etcdctl snapshot restore /backup/etcd-20260222-020000.db \
  --data-dir=/var/lib/etcd \
  --name=etcd-0 \
  --initial-cluster=etcd-0=https://10.0.1.10:2380 \
  --initial-advertise-peer-urls=https://10.0.1.10:2380

# Restore the etcd manifest and start kubelet
mv /tmp/etcd.yaml /etc/kubernetes/manifests/
systemctl start kubelet

Multi-Node Restore#

Each member must restore the same snapshot with its own identity:

# On node 1
etcdctl snapshot restore /backup/etcd-20260222-020000.db \
  --data-dir=/var/lib/etcd \
  --name=etcd-0 \
  --initial-cluster=etcd-0=https://10.0.1.10:2380,etcd-1=https://10.0.1.11:2380,etcd-2=https://10.0.1.12:2380 \
  --initial-advertise-peer-urls=https://10.0.1.10:2380

# On node 2
etcdctl snapshot restore /backup/etcd-20260222-020000.db \
  --data-dir=/var/lib/etcd \
  --name=etcd-1 \
  --initial-cluster=etcd-0=https://10.0.1.10:2380,etcd-1=https://10.0.1.11:2380,etcd-2=https://10.0.1.12:2380 \
  --initial-advertise-peer-urls=https://10.0.1.11:2380

# On node 3 -- same pattern with etcd-2 and its peer URL

After restoring on all nodes, start etcd on each and verify cluster health.

Compaction and Defragmentation#

etcd keeps a history of all key revisions. Over time, this grows. Compaction removes superseded revisions. Defragmentation reclaims disk space after compaction.

Kubernetes enables automatic compaction (every 5 minutes by default via --etcd-compaction-interval on kube-apiserver). But defragmentation is manual.

# Check current DB size and in-use size
etcdctl endpoint status --cluster --write-table

# Defragment each member one at a time (not all at once)
etcdctl defrag --endpoints=https://10.0.1.10:2379
etcdctl defrag --endpoints=https://10.0.1.11:2379
etcdctl defrag --endpoints=https://10.0.1.12:2379

Defrag one member at a time because it blocks that member during the operation. On a 3-member cluster, the other two maintain quorum. Typical defrag time is seconds to a few minutes depending on DB size.

Run defragmentation when DB size is significantly larger than the logical data (check endpoint status for DB SIZE versus actual key count).

Alarm Management#

etcd raises alarms when thresholds are breached. The most common is the NOSPACE alarm, triggered when the DB reaches its storage quota.

# Check for active alarms
etcdctl alarm list
# memberID:12345678 alarm:NOSPACE

# When NOSPACE triggers:
# 1. Compact to remove old revisions
etcdctl compact $(etcdctl endpoint status --write-table | awk '{print $3}' | tail -1)

# 2. Defragment to reclaim space
etcdctl defrag --endpoints=https://10.0.1.10:2379

# 3. Disarm the alarm
etcdctl alarm disarm

If the DB genuinely needs more space, increase the quota (default is 2 GB, max recommended is 8 GB):

# In the etcd static pod manifest or etcd config
--quota-backend-bytes=8589934592    # 8 GB

Member Management#

Remove a Failed Member#

# List members
etcdctl member list --write-table

# Remove by member ID
etcdctl member remove 4f3e8a2b1c7d9e0f

Add a New Member#

# Add the new member (run from an existing member)
etcdctl member add etcd-3 --peer-urls=https://10.0.1.13:2380

# On the new node, start etcd with --initial-cluster-state=existing
# and the full --initial-cluster list including all current members

Never add and remove members simultaneously. Change one member at a time and verify cluster health between operations. A 3-member cluster can only tolerate 1 failure – if you remove a member first, you have a 2-member cluster with zero fault tolerance.

Monitoring#

Key metrics to watch (exposed on the /metrics endpoint):

Metric Healthy Threshold Problem Indicator
etcd_disk_wal_fsync_duration_seconds p99 < 10ms Slow disk
etcd_disk_backend_commit_duration_seconds p99 < 25ms Slow disk
etcd_server_leader_changes_seen_total Stable (not incrementing) Network instability
etcd_mvcc_db_total_size_in_bytes Below 80% of quota Approaching NOSPACE
etcd_network_peer_round_trip_time_seconds p99 < 50ms Network latency
# Quick check via curl (from control plane node)
curl -s --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/etcd/server.crt \
  --key /etc/kubernetes/pki/etcd/server.key \
  https://127.0.0.1:2379/metrics | grep -E "etcd_mvcc_db_total_size|etcd_server_leader_changes"

For Prometheus-based monitoring, scrape the etcd metrics endpoint and alert on fsync latency, leader changes, and DB size approaching quota.