Node Liveness Issues#

Every node must renew its liveness record every 4.5 seconds. Failure to renew marks the node suspect, then dead, triggering re-replication of its ranges.

cockroach node status --insecure --host=localhost:26257

Look at is_live. If a node shows false, check in order:

Process crashed. Check cockroach-data/logs/ for fatal or panic entries. OOM kills are the most common cause – check dmesg | grep -i oom on the host.

Network partition. The node runs but cannot reach peers. If cockroach node status succeeds locally but fails from other nodes, the problem is network-level (firewalls, security groups, DNS).

Disk stall. CockroachDB logs disk stall detected when storage I/O blocks liveness renewal:

SELECT node_id, store_id, available, used, capacity
FROM crdb_internal.kv_store_status;

Under-Replicated Ranges#

Ranges with fewer replicas than the replication factor. Normal briefly during failures, but persistent under-replication needs attention.

SELECT range_id, start_key, end_key, replicas, lease_holder
FROM crdb_internal.ranges_no_leases
WHERE array_length(replicas, 1) < 3;

Common causes: too few nodes for the replication factor, an incompletely drained decommission, or zone constraints that cannot be satisfied. If a node was lost permanently, re-replication starts after server.time_until_store_dead (default 5 minutes). If under-replication persists, check for stale locality constraints:

SHOW ZONE CONFIGURATIONS;

Slow Queries: EXPLAIN ANALYZE#

CockroachDB distributes query execution across nodes. EXPLAIN ANALYZE shows the distributed execution plan with per-node timing.

EXPLAIN ANALYZE SELECT o.id, o.total, c.name
FROM orders o JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > '2026-01-01' AND o.total > 100
ORDER BY o.total DESC LIMIT 50;

Key things to look for:

  • Full table scans. spans: FULL SCAN means no index is being used. Add an appropriate index.
  • High intermediate row counts. A join reading millions of rows to return 50 means filters are applied too late.
  • Network overhead. Disproportionate KV rows read on one node suggests data locality issues.
  • Contention time. Non-zero values mean the query waited on locks from other transactions.

Find the slowest queries cluster-wide:

SELECT fingerprint_id, metadata ->> 'query' AS query,
       statistics -> 'statistics' -> 'latencyInfo' ->> 'p99' AS p99_latency,
       statistics -> 'statistics' ->> 'cnt' AS exec_count
FROM crdb_internal.statement_statistics
ORDER BY (statistics -> 'statistics' -> 'latencyInfo' ->> 'p99')::FLOAT DESC
LIMIT 20;

Transaction Retry Errors (SQLSTATE 40001)#

Serializable isolation means conflicting transactions produce 40001 errors that must be retried. This is expected behavior, not a bug. Your application must implement retry loops:

import psycopg2
import time

def run_transaction(conn, callback, max_retries=5):
    for attempt in range(max_retries):
        try:
            with conn.cursor() as cur:
                callback(cur)
                conn.commit()
                return
        except psycopg2.errors.SerializationFailure:
            conn.rollback()
            sleep_time = (2 ** attempt) * 0.01  # exponential backoff
            time.sleep(sleep_time)
    raise Exception("Transaction failed after max retries")

def transfer_funds(cur):
    cur.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")
    cur.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2")

conn = psycopg2.connect("postgresql://appuser@localhost:26257/myapp?sslmode=disable")
conn.set_session(autocommit=False)
run_transaction(conn, transfer_funds)

If you see high retry rates, investigate contention:

-- Find transactions with the most contention
SELECT fingerprint_id,
       metadata ->> 'query' AS query,
       statistics -> 'statistics' -> 'contentionTime' ->> 'mean' AS avg_contention
FROM crdb_internal.transaction_statistics
WHERE (statistics -> 'statistics' -> 'contentionTime' ->> 'mean')::FLOAT > 0
ORDER BY (statistics -> 'statistics' -> 'contentionTime' ->> 'mean')::FLOAT DESC
LIMIT 10;

Reduce contention by keeping transactions small, avoiding long-running transactions, and not funneling concurrent updates through hot rows.

Hot Ranges#

A hot range receives disproportionate traffic, creating a bottleneck on the node holding its leaseholder. The DB Console “Hot Ranges” page shows the top ranges by QPS.

From SQL:

SELECT range_id, start_key, end_key, queries_per_second, lease_holder
FROM crdb_internal.ranges
ORDER BY queries_per_second DESC
LIMIT 10;

Common causes and fixes:

  • Sequential primary keys. SERIAL or auto-incrementing IDs cause all inserts to hit the last range. Fix: use UUID primary keys with gen_random_uuid(), or use HASH-sharded indexes.
  • Single-row hot spot. A counter or status row updated by every request. Fix: shard the counter across multiple rows and sum on read.
  • Hash-sharded index for write-heavy tables:
CREATE TABLE events (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  created_at TIMESTAMPTZ DEFAULT now(),
  data JSONB,
  INDEX idx_events_created (created_at) USING HASH
);

Clock Skew Issues#

CockroachDB requires synchronized clocks across nodes. The default maximum clock offset is 500ms. If a node’s clock drifts beyond this, it self-terminates to protect consistency.

Symptoms: a node crashes with clock synchronization error in logs, or transactions fail with timestamp-related errors.

# Check clock offset from the DB Console or via SQL
SELECT node_id, clock_offset_ns / 1e6 AS offset_ms
FROM crdb_internal.kv_node_status;

Fix: Run NTP or chrony on every node. On Kubernetes, node clocks come from the host – ensure the host has NTP configured. On cloud VMs, the cloud provider typically handles this, but verify with timedatectl status.

Storage Capacity Problems#

CockroachDB uses a Pebble LSM-tree storage engine. When disk usage exceeds 95%, the node enters a ballast-recovery state and may refuse writes.

-- Check per-node storage
SELECT node_id, store_id,
       (used::FLOAT / capacity::FLOAT * 100)::INT AS pct_used,
       pg_size_pretty(available) AS available
FROM crdb_internal.kv_store_status;

CockroachDB pre-allocates a 1 GiB ballast file (auxiliary/EMERGENCY_BALLAST). When the disk is full, delete this file to free space for recovery:

rm cockroach-data/auxiliary/EMERGENCY_BALLAST

Then add storage, decommission the node, or drop unused tables/indexes. After cleanup, the node recreates the ballast on restart. Set alerts on capacity_available and note that space from deletes is reclaimed by compaction over time, not immediately.