Temporal High Availability#
A single-replica Temporal deployment works for development, but any pod going down takes the workflow engine offline. This guide configures a multi-replica cluster with proper resource allocation, Elasticsearch visibility, and health monitoring.
For the single-replica setup this builds on, see Running Temporal Server on Minikube.
Why HA Matters#
| Component | What Breaks When It Goes Down |
|---|---|
| Frontend | No client can start, signal, query, or cancel workflows. Workers cannot poll. |
| History | Running workflows stall. No state transitions. Timers do not fire. |
| Matching | Tasks queue up but never dispatch. Workflows appear frozen. |
| Worker | Internal system workflows stop (archival, replication). Application workflows unaffected. |
With multiple replicas, losing a pod triggers a brief rebalance (seconds), not an outage.
HA Architecture#
Each service runs as a separate Deployment with 3+ replicas. Frontend is stateless and load-balances trivially. History partitions workflow state into shards (default 512); when a pod dies, its shards rebalance to survivors. Matching partitions task queue dispatch similarly. Worker runs Temporal internals and needs only 2 replicas.
HA Helm Values#
# values-temporal-ha.yaml
server:
config:
persistence:
default:
driver: sql
sql:
driver: postgres12
host: temporal-ha-postgresql
port: 5432
database: temporal
user: postgres
password: temporal
maxConns: 40
visibility:
driver: sql
sql:
driver: postgres12
host: temporal-ha-postgresql
port: 5432
database: temporal_visibility
user: postgres
password: temporal
maxConns: 20
numHistoryShards: 512
frontend:
replicaCount: 3
resources:
requests: { cpu: 500m, memory: 512Mi }
limits: { cpu: "1", memory: 1Gi }
history:
replicaCount: 3
resources:
requests: { cpu: 500m, memory: 1Gi }
limits: { cpu: "2", memory: 2Gi }
matching:
replicaCount: 3
resources:
requests: { cpu: 250m, memory: 256Mi }
limits: { cpu: "1", memory: 512Mi }
worker:
replicaCount: 2
resources:
requests: { cpu: 250m, memory: 256Mi }
limits: { cpu: 500m, memory: 512Mi }
cassandra: { enabled: false }
mysql: { enabled: false }
postgresql: { enabled: false }
elasticsearch: { enabled: false }
schema: { setup: { enabled: true }, update: { enabled: true } }
web:
replicaCount: 2
service: { type: ClusterIP, port: 8080 }helm upgrade --install temporal temporal/temporal \
--namespace temporal -f values-temporal-ha.yaml --timeout 600sPostgreSQL for HA#
With 11 service replicas at maxConns: 40, Temporal opens up to 440 connections. PostgreSQL defaults to 100. Configure it with headroom:
primary:
extendedConfiguration: |
max_connections = 600
shared_buffers = 512MB
effective_cache_size = 1536MB
resources:
requests: { cpu: "1", memory: 2Gi }
limits: { cpu: "2", memory: 4Gi }
persistence:
size: 20GiFor high-throughput clusters, deploy PgBouncer between Temporal and PostgreSQL to pool connections. At minimum, configure automated pg_dump backups – Temporal’s PostgreSQL is the system of record for all running workflows.
Elasticsearch Visibility#
SQL-based visibility works for small deployments but struggles with complex queries. Elasticsearch provides indexed custom search attributes and fast filtering.
Enable it by updating the Temporal values:
server:
config:
persistence:
visibility:
driver: elasticsearch
elasticsearch:
version: v7
url: { scheme: http, host: "temporal-elasticsearch:9200" }
indices: { visibility: temporal_visibility_v1 }Register custom search attributes to make workflows queryable by business fields:
temporal operator search-attribute create \
--namespace default --name CustomerId --type Keyword
temporal operator search-attribute create \
--namespace default --name OrderAmount --type DoubleSet them from workflow code:
func OrderWorkflow(ctx workflow.Context, order Order) error {
_ = workflow.UpsertSearchAttributes(ctx, map[string]interface{}{
"CustomerId": order.CustomerID,
"OrderAmount": order.Amount,
})
// ... workflow logic
return nil
}Query with the CLI:
temporal workflow list \
--query 'CustomerId = "cust-123" AND OrderAmount > 100.0'Health Monitoring#
Temporal exposes Prometheus metrics on port 9090. The critical ones:
| Metric | Meaning |
|---|---|
temporal_persistence_latency | Database response time. Spikes indicate PostgreSQL issues. |
schedule_to_start_latency | Time from task creation to worker pickup. High means workers cannot keep up. |
persistence_errors | Database errors. Any sustained increase needs investigation. |
history_size | Workflow event count. Histories above 50K events impact performance. |
Alert on these conditions:
groups:
- name: temporal
rules:
- alert: TemporalPersistenceLatencyHigh
expr: histogram_quantile(0.99, rate(temporal_persistence_latency_bucket[5m])) > 1
for: 5m
annotations:
summary: "Temporal persistence p99 above 1 second"
- alert: TemporalScheduleToStartHigh
expr: histogram_quantile(0.99, rate(schedule_to_start_latency_bucket[5m])) > 30
for: 5m
annotations:
summary: "Tasks waiting 30s+ for worker pickup"Scaling Guidelines#
Scale frontend when gRPC latency rises (stateless, simple to add). Scale history when workflow task latency grows or shard rebalancing is slow. Scale matching when schedule_to_start_latency is high but workers are idle.
The numHistoryShards is set at cluster creation and cannot be changed without data migration. Choose carefully: 512 for most production workloads, 1024 for high-throughput (>10K concurrent workflows per namespace), 128 for development.
Comparison: Standard vs HA#
| Dimension | Standard (Dev) | HA (Production) |
|---|---|---|
| Service replicas | 1 each | 2-3 each |
| CPU total | ~1.5 cores | ~6 cores |
| Memory total | ~2 GB | ~10 GB |
| Visibility | SQL-based | Elasticsearch |
| Pod disruption tolerance | None | Loses 1 pod per service |
| Recovery time | Minutes (pod restart) | Seconds (shard rebalance) |
Next Steps#
- Namespaces and Task Queues – organize workflows with proper isolation
- Temporal Multi-Cluster on Minikube – multi-cluster setups spanning profiles