Redis on Kubernetes: Deployment Patterns, Operators, and Production Configuration#
Running Redis on Kubernetes requires more thought than deploying a stateless application. Redis is stateful, memory-sensitive, and its clustering model makes assumptions about network identity that conflict with Kubernetes defaults. This guide covers the deployment options from simplest to most complex, the configuration details that matter in production, and the mistakes that cause outages.
Deployment Options#
There are three main approaches to deploying Redis on Kubernetes, each with different tradeoffs.
Bitnami Helm chart is the most common path. It supports single-instance, leader-follower, and Sentinel topologies out of the box. It handles StatefulSets, PVCs, Services, ConfigMaps, and optional Sentinel deployment. For most teams, this is the right starting point.
Redis Operators (Spotahome’s redis-operator or OpsTree’s redis-operator) provide custom resources (RedisCluster, RedisSentinel) that manage the full lifecycle. Operators handle scaling, failover, and upgrades through Kubernetes-native CRDs. They add operational automation but also add a dependency – if the operator has bugs, your Redis suffers.
Redis Cluster via Helm deploys a sharded Redis Cluster topology. The Bitnami chart supports this as architecture: cluster. This is the most complex option and requires understanding Redis Cluster’s hash slot model.
Single Instance: Development and Caching#
A single Redis instance is appropriate for development environments and non-critical caching where data loss on pod restart is acceptable.
Deploy with a Deployment (not StatefulSet) if persistence is unnecessary, or a StatefulSet with a PVC if you want data to survive pod restarts:
# values.yaml for Bitnami Redis chart
architecture: standalone
auth:
enabled: true
password: "your-password-here"
master:
persistence:
enabled: true
size: 5Gi
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 256MiInstall with: helm install redis oci://registry-1.docker.io/bitnamicharts/redis -f values.yaml
For throwaway caching, set persistence.enabled: false. The pod gets a Service for internal cluster access. This topology has no high availability – if the pod dies, Redis is unavailable until Kubernetes reschedules it.
Leader-Follower with Sentinel#
This is the recommended production topology for most workloads. Sentinel monitors the leader, detects failure, and automatically promotes a follower.
# values.yaml
architecture: replication
auth:
enabled: true
password: "your-password-here"
replica:
replicaCount: 2
persistence:
enabled: true
size: 10Gi
storageClass: "ssd"
resources:
requests:
memory: 1Gi
cpu: 250m
limits:
memory: 1Gi
sentinel:
enabled: true
quorum: 2
resources:
requests:
memory: 64Mi
cpu: 50m
limits:
memory: 128Mi
master:
persistence:
enabled: true
size: 10Gi
storageClass: "ssd"
resources:
requests:
memory: 1Gi
cpu: 250m
limits:
memory: 1GiThis creates a StatefulSet with one leader and two replicas, plus three Sentinel pods. Sentinel handles failover automatically. The chart creates two Services: one for the leader (read-write) and one for replicas (read-only).
Critical detail: Sentinel requires at least 3 instances for quorum. Running fewer means Sentinel cannot reach consensus on leader failure, and failover will not trigger. The quorum: 2 setting means 2 of 3 Sentinels must agree.
Redis Cluster on Kubernetes#
Redis Cluster shards data across multiple nodes using 16384 hash slots. This is for workloads that exceed the memory capacity of a single node or need write scaling.
# values.yaml
architecture: cluster
cluster:
nodes: 6
replicas: 1
auth:
enabled: true
password: "your-password-here"
persistence:
enabled: true
size: 10Gi
storageClass: "ssd"
resources:
requests:
memory: 2Gi
cpu: 500m
limits:
memory: 2GiRedis Cluster on Kubernetes has a fundamental tension: Redis Cluster nodes need to know each other’s IP addresses, but Kubernetes pod IPs are ephemeral. The Bitnami chart handles this by using the pod’s stable hostname (from the StatefulSet) and a headless Service.
Clients connecting to Redis Cluster need to be cluster-aware. The client connects to any node, receives a MOVED redirect if the key lives on a different node, and the client library handles this transparently. Most Redis client libraries (jedis, lettuce, redis-py, ioredis) support cluster mode natively.
Multi-key operations (MGET, MSET, transactions with MULTI) only work if all keys hash to the same slot. Use hash tags to colocate related keys: {user:42}:profile and {user:42}:sessions will always be on the same node.
Resource Tuning#
Redis is memory-bound. CPU is rarely the bottleneck unless you are running Lua scripts or very high throughput.
Set memory requests equal to limits. This is the most important resource configuration for Redis on Kubernetes. If requests are lower than limits, the Kubernetes scheduler may place Redis on a node without enough memory, leading to OOMKilled pods. Equal requests and limits guarantee the memory is reserved.
resources:
requests:
memory: 2Gi
limits:
memory: 2GiSet maxmemory to 75% of the container memory limit. Redis needs headroom for overhead: the operating system, connection buffers, child process forking for RDB/AOF, and fragmentation. For a container with a 2Gi limit, set maxmemory 1536mb.
Configure this via the Bitnami chart:
master:
configuration: |
maxmemory 1536mb
maxmemory-policy allkeys-lruCPU is secondary. Redis’s single-threaded command processing rarely needs more than one core. Set CPU requests to 250m-500m and limits to 1 core for most workloads. If you enable I/O threads (Redis 6+), allocate more CPU accordingly.
Persistence on Kubernetes#
Persistence in Kubernetes means PersistentVolumeClaims (PVCs). For Redis, persistence protects against pod restarts losing data.
Use SSD-backed storage classes. Redis persistence involves disk I/O during RDB snapshots and AOF writes. Spinning disks introduce latency that can stall Redis operations. Use a storageClass that provisions SSDs (for example, gp3 on AWS EKS, pd-ssd on GKE, managed-premium on AKS).
Size PVCs for RDB saves. During an RDB snapshot, Redis forks and the child writes the entire dataset to disk. The PVC needs enough space for the RDB file (approximately the size of the in-memory dataset) plus the AOF file. A safe starting point: PVC size should be at least 2x your expected dataset size.
Backup RDB snapshots to object storage. Use a CronJob or sidecar container that copies the RDB file from the PVC to S3, GCS, or Azure Blob Storage on a schedule. This provides disaster recovery beyond what PVCs offer.
Client Configuration#
How your application connects to Redis on Kubernetes matters as much as how Redis is deployed.
Use Sentinel-aware client libraries. When Sentinel is enabled, clients must connect to the Sentinel service, not directly to a Redis pod. The Sentinel service tells the client which pod is the current leader. If failover occurs, the client automatically discovers the new leader.
In most languages, this looks like:
# Python with redis-py
from redis.sentinel import Sentinel
sentinel = Sentinel([('redis-sentinel.default.svc.cluster.local', 26379)],
socket_timeout=0.5)
master = sentinel.master_for('mymaster', password='your-password')
master.set('key', 'value')Connection pooling. Create a connection pool in your application rather than opening a new connection per request. Most Redis client libraries provide built-in pooling. Set pool size to match your concurrency needs – 10-20 connections per pod is a typical starting point.
Timeouts and retries. Set connection timeouts (1-2 seconds), command timeouts (1-5 seconds depending on expected latency), and retry logic with exponential backoff. Redis commands are fast, so a timeout usually indicates a network issue or server overload, not a slow query.
Monitoring with Prometheus#
The Prometheus Redis exporter (oliver006/redis_exporter) exposes Redis metrics in Prometheus format. The Bitnami chart includes it as an optional sidecar:
metrics:
enabled: true
serviceMonitor:
enabled: trueKey metrics to monitor and alert on:
- redis_memory_used_bytes vs redis_memory_max_bytes: Alert when used memory exceeds 80% of maxmemory
- redis_keyspace_hits_total / (hits + misses): Cache hit ratio. Below 90% means the cache is ineffective or too small
- redis_connected_clients: Sudden spikes indicate connection leaks in application code
- redis_evicted_keys_total: Non-zero means Redis is at capacity and evicting data. Increase memory or review TTL policies
- redis_replication_lag: Time or offset difference between leader and follower. High lag means followers serve stale data
- redis_commands_processed_total: Overall throughput. A sudden drop may indicate issues
Create Grafana dashboards for these metrics. The community Redis dashboard (Grafana ID 763) is a solid starting point.
Common Gotchas#
Redis Cluster needs direct pod connectivity. In Redis Cluster mode, clients receive MOVED redirects that contain pod IP addresses. If clients cannot reach individual pod IPs (for example, they are in a different network namespace or behind a load-balanced Service), Cluster mode will not work. Use a headless Service so clients resolve pod IPs directly, and ensure network policies allow direct pod-to-pod and client-to-pod traffic.
Sentinel requires at least 3 instances. Running a single Sentinel instance means no quorum is possible – failover will never trigger automatically. Running 2 means a single Sentinel failure prevents quorum. Always run 3 (or 5 for extra tolerance). The Bitnami chart defaults to 3.
PVC sizing for RDB saves. When Redis saves an RDB snapshot, the forked child process writes the entire dataset to disk. If your Redis instance holds 8GB of data, the RDB file is roughly 8GB, and the PVC needs space for both the RDB file and the AOF. A 10Gi PVC for an 8GB dataset is too tight – size it to at least 2x memory. Additionally, the fork temporarily increases memory usage on the host, so monitor both disk and memory during save operations.
Kubernetes liveness probes and Redis. Default liveness probe configurations can cause false positives during RDB saves or AOF rewrites, when Redis may be briefly unresponsive. Set initialDelaySeconds and timeoutSeconds generously (at least 5 seconds) to avoid Kubernetes killing a healthy Redis pod during a save operation.