Database High Availability Patterns#
Every database HA decision starts with two numbers: RPO (Recovery Point Objective – how much data you can afford to lose) and RTO (Recovery Time Objective – how long the database can be unavailable). These numbers dictate the pattern, and each pattern carries specific operational tradeoffs.
Core Concepts#
RPO = 0 means zero data loss. Every committed transaction must survive a failure. This requires synchronous replication, which adds latency to every write.
RPO > 0 means you accept losing some recent transactions. Asynchronous replication is sufficient, and writes are faster because the primary does not wait for replicas.
RTO determines how automated your failover must be. Manual failover (operator runs a command) gives RTO in minutes. Automated failover (a controller detects failure and promotes a replica) gives RTO in seconds.
Pattern 1 – Primary-Replica (Active-Passive)#
One primary accepts all writes. One or more replicas receive a copy of the data through streaming replication and can serve read-only queries.
When to Use#
- Most applications. This is the default pattern.
- Read-heavy workloads where you can route reads to replicas.
- RPO near zero is achievable with synchronous replication.
- RTO of seconds to minutes depending on automated vs manual failover.
How It Works#
The primary streams WAL records (PostgreSQL) or binary log events (MySQL) to replicas. In asynchronous mode, the primary commits without waiting for replica confirmation. In synchronous mode, the primary waits for at least one replica to acknowledge before confirming the commit to the client.
Tradeoffs#
- Simple to operate and reason about.
- Write throughput limited to a single node.
- Synchronous replication adds 1-5ms per write (same datacenter) or 30-100ms (cross-region).
- Failover requires promoting a replica and redirecting clients.
Pattern 2 – Multi-Master (Active-Active)#
Multiple nodes accept writes simultaneously. Each node replicates changes to the others. Systems include CockroachDB, Galera Cluster, and MySQL Group Replication in multi-primary mode.
When to Use#
- Multi-region deployments where users need low-latency writes from any region.
- Applications that can tolerate or resolve write conflicts.
Tradeoffs#
- Write conflicts add complexity. Application must handle retry logic.
- Cross-node coordination adds write latency.
- Schema changes (DDL) require special handling – all nodes must agree.
- Higher operational complexity in exchange for write availability in every region.
Pattern 3 – Patroni for PostgreSQL#
Patroni is a cluster manager that automates PostgreSQL failover using a distributed consensus store (etcd, ZooKeeper, or Consul) as the source of truth for leadership.
Architecture#
+-----------+
| etcd |
| cluster |
+-----+-----+
|
+---------------+---------------+
| | |
+----+----+ +-----+----+ +-----+----+
| Patroni | | Patroni | | Patroni |
| + PG | | + PG | | + PG |
| PRIMARY | | REPLICA | | REPLICA |
+---------+ +----------+ +----------+HAProxy or PgBouncer sits in front, routing writes to the current primary and reads to replicas. Patroni updates the etcd leader key when a failover occurs, and the proxy follows.
Setup Essentials#
Each PostgreSQL node runs a Patroni sidecar configured with:
scope: my-cluster
name: node1
restapi:
listen: 0.0.0.0:8008
etcd:
hosts: etcd1:2379,etcd2:2379,etcd3:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
synchronous_mode: false
postgresql:
parameters:
max_connections: 200
shared_buffers: 4GB
wal_level: replica
max_wal_senders: 5
initdb:
- encoding: UTF8
- data-checksums
postgresql:
listen: 0.0.0.0:5432
data_dir: /var/lib/postgresql/data
authentication:
superuser:
username: postgres
password: postgres-pass
replication:
username: replicator
password: repl-passKey parameters:
- maximum_lag_on_failover: Replicas lagging more than this many bytes are ineligible for promotion. Prevents promoting a stale replica.
- synchronous_mode: When true, at least one replica is synchronous. The primary will not commit until the synchronous replica confirms. This gives RPO = 0 but adds write latency.
- ttl and loop_wait: Control how quickly Patroni detects a failed primary. A primary that cannot renew its leader key within
ttlseconds is considered dead.
Operational Commands#
# Check cluster status
patronictl -c /etc/patroni.yml list
# Manual switchover (planned maintenance)
patronictl -c /etc/patroni.yml switchover --master node1 --candidate node2
# Restart PostgreSQL on a specific node
patronictl -c /etc/patroni.yml restart my-cluster node1
# Reinitialize a failed member
patronictl -c /etc/patroni.yml reinit my-cluster node3When to Use Patroni#
- PostgreSQL in production where automated failover is required.
- RTO target under 30 seconds.
- RPO = 0 is needed (enable synchronous_mode).
- You already run or can run etcd (or Consul/ZooKeeper).
Pattern 4 – MySQL Group Replication#
Group Replication is MySQL’s built-in consensus-based replication. It provides automatic failover and optional multi-primary writes without external tools.
Single-Primary Mode#
One member accepts writes. If the primary fails, the group elects a new primary automatically. This is the recommended mode for most deployments.
-- On each member
SET GLOBAL group_replication_group_name = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee";
SET GLOBAL group_replication_local_address = "node1:33061";
SET GLOBAL group_replication_group_seeds = "node1:33061,node2:33061,node3:33061";
SET GLOBAL group_replication_single_primary_mode = ON;
-- Bootstrap the first member
SET GLOBAL group_replication_bootstrap_group = ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group = OFF;
-- Join subsequent members
START GROUP_REPLICATION;Multi-Primary Mode#
All members accept writes. Conflict detection rolls back the later conflicting transaction. Enable by setting group_replication_single_primary_mode = OFF. Use only when the application can handle transaction rollbacks from conflicts.
MySQL Router#
MySQL Router provides transparent client routing:
mysqlrouter --bootstrap root@node1:3306 --user=mysqlrouter
mysqlrouter &Clients connect to the router on port 6446 (read-write, routed to primary) or 6447 (read-only, routed to secondaries). The router automatically detects primary changes.
When to Use Group Replication#
- MySQL deployments needing automated failover without external tools.
- Single-primary mode for straightforward HA with sub-30-second failover.
- Multi-primary mode only when multi-region write locality is required and the application handles conflicts.
Synchronous vs Asynchronous Replication#
This is the most consequential decision in HA design.
Asynchronous: The primary commits immediately and streams changes in the background. RPO is greater than zero (replication lag determines data at risk). No write latency penalty. If the primary crashes, uncommitted-to-replica transactions are lost. Best for most applications.
Synchronous: The primary waits for at least one replica to confirm before acknowledging the commit. RPO is zero. Adds network round-trip latency (1-5ms same-datacenter, 30-100ms cross-region). If all synchronous replicas are unreachable, writes stall. Best for financial systems and workloads where data loss is unacceptable.
Semi-Synchronous (MySQL)#
MySQL offers a middle ground where the primary waits for the replica to receive the event (stored in relay log) but not necessarily apply it. This reduces the risk window compared to fully asynchronous while adding less latency than fully synchronous.
-- On primary
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_master_timeout = 5000; -- ms, then falls back to async
-- On replica
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
SET GLOBAL rpl_semi_sync_slave_enabled = 1;If the replica does not acknowledge within the timeout, MySQL falls back to asynchronous replication. This prevents write stalls but opens a data loss window.
Decision Matrix#
| Requirement | Pattern | RPO | RTO | Complexity |
|---|---|---|---|---|
| Standard web app, single region | Primary-replica, async | Seconds | Minutes (manual) or seconds (automated) | Low |
| Zero data loss required | Primary-replica, sync (Patroni sync_mode or MySQL semi-sync) | Zero | Seconds (automated) | Medium |
| Multi-region write locality | Multi-master (CockroachDB, Galera, GR multi-primary) | Zero (consensus) | Near-zero | High |
| PostgreSQL with automated failover | Patroni + etcd + HAProxy | Configurable | Under 30 seconds | Medium |
| MySQL with automated failover, no external deps | Group Replication single-primary + MySQL Router | Near-zero (semi-sync) | Under 30 seconds | Medium |
| Maximum simplicity, manual failover acceptable | Primary-replica, async, manual promotion | Minutes | Minutes | Low |
Choosing Based on RPO/RTO#
Start with RPO. If the business requires RPO = 0, synchronous replication is mandatory. If RPO of a few seconds is acceptable, asynchronous replication simplifies operations significantly.
Then choose RTO. If the business requires RTO under 60 seconds, you need automated failover – Patroni, Group Replication, or a distributed database like CockroachDB. If RTO of a few minutes is acceptable, manual failover with runbooks is simpler.
Then consider write patterns. If writes come from multiple regions, multi-master avoids routing all writes through a single region. If writes come from one region, primary-replica is simpler and faster.
The most common mistake is over-engineering. Patroni (PostgreSQL) or Group Replication single-primary (MySQL) covers the vast majority of production workloads. Multi-master is only justified when write locality across regions is a hard requirement.