CockroachDB Setup and Architecture

Architecture: What CockroachDB Actually Does Under the Hood#

CockroachDB is a distributed SQL database that stores data across multiple nodes while presenting a single logical database to clients. Understanding three concepts is essential before deploying it.

Ranges. All data is stored in key-value pairs, sorted by key. CockroachDB splits this sorted keyspace into contiguous chunks called ranges, each targeting 512 MiB by default. Every SQL table, index, and system table maps to one or more ranges. When a range grows beyond the threshold, it splits automatically.

Replicas and Raft consensus. Each range is replicated (default: 3 replicas) across different nodes. These replicas form a Raft consensus group. Writes must be acknowledged by a majority of replicas before being committed. This means a 3-node cluster tolerates 1 node failure without data loss or downtime.

Leaseholders. One replica per range holds the lease and serves all reads and coordinates all writes for that range. The leaseholder is the only replica that talks to SQL clients for that range’s data. CockroachDB automatically moves leaseholders to balance load, but you can influence placement with zone configurations for latency-sensitive workloads.

Single-Node Setup for Development#

For local development and testing, a single-node insecure cluster is the fastest path:

# Download and start a single-node cluster
cockroach start-single-node --insecure --store=cockroach-data --listen-addr=localhost:26257 --http-addr=localhost:8080 --background

# Connect with the built-in SQL client
cockroach sql --insecure --host=localhost:26257

# Or connect with psql (CockroachDB is wire-compatible with PostgreSQL)
psql "postgresql://root@localhost:26257/defaultdb?sslmode=disable"

The --http-addr flag exposes the DB Console, a built-in web UI for monitoring at http://localhost:8080.

Multi-Node Cluster with Docker#

For a local multi-node cluster that actually tests distributed behavior:

# Create a shared network
docker network create crdb-net

# Start three nodes
docker run -d --name crdb-1 --hostname crdb-1 --net crdb-net \
  cockroachdb/cockroach:v24.3.2 start --insecure --join=crdb-1,crdb-2,crdb-3 \
  --advertise-addr=crdb-1 --listen-addr=0.0.0.0:26257 --http-addr=0.0.0.0:8080

docker run -d --name crdb-2 --hostname crdb-2 --net crdb-net \
  cockroachdb/cockroach:v24.3.2 start --insecure --join=crdb-1,crdb-2,crdb-3 \
  --advertise-addr=crdb-2

docker run -d --name crdb-3 --hostname crdb-3 --net crdb-net \
  cockroachdb/cockroach:v24.3.2 start --insecure --join=crdb-1,crdb-2,crdb-3 \
  --advertise-addr=crdb-3

# Initialize the cluster (required once, on any node)
docker exec crdb-1 cockroach init --insecure

The --join flag is critical. Every node must list the same set of join addresses so they can discover each other. Without cockroach init, the nodes wait indefinitely – they will not form a cluster on their own.

Kubernetes Deployment with the CockroachDB Operator#

For production and production-like environments, the CockroachDB Kubernetes Operator manages the StatefulSet, persistent volumes, and cluster lifecycle:

# Install the operator CRDs and controller
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.15.0/install/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.15.0/install/operator.yaml

# Wait for the operator pod to be ready
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=cockroach-operator -n cockroach-operator-system --timeout=120s

Then create a CrdbCluster resource:

apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
  name: crdb
  namespace: crdb
spec:
  dataStore:
    pvc:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
  resources:
    requests:
      memory: "2Gi"
      cpu: "1"
    limits:
      memory: "4Gi"
      cpu: "2"
  tlsEnabled: true
  image:
    name: cockroachdb/cockroach:v24.3.2
  nodes: 3

kubectl apply -f crdb-cluster.yaml

The operator handles cockroach init automatically. Each node gets a persistent volume, and the operator manages TLS certificates when tlsEnabled: true.

Connect to the cluster from within Kubernetes:

kubectl exec -it crdb-0 -n crdb -- cockroach sql --certs-dir=/cockroach/cockroach-certs

Creating Databases and Users#

-- Create a database
CREATE DATABASE myapp;

-- Create a user with a password
CREATE USER appuser WITH PASSWORD 'strongpassword';

-- Grant privileges
GRANT ALL ON DATABASE myapp TO appuser;

-- For finer-grained access
USE myapp;
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE * TO appuser;

CockroachDB uses the root user by default with full privileges. In production, create dedicated users immediately and do not use root for application connections.

Key Differences from PostgreSQL#

CockroachDB is PostgreSQL wire-compatible, so most drivers and ORMs work. But the behavior differs in important ways:

No full pg_catalog support. Some PostgreSQL system tables are stubs or missing. Tools that introspect pg_catalog heavily (certain migration frameworks, pgAdmin features) may fail or return incomplete results.

Serializable isolation only. CockroachDB defaults to SERIALIZABLE and does not support READ COMMITTED (unless explicitly enabled in v24.1+). Applications that relied on weaker isolation in PostgreSQL will see transaction retry errors (SQLSTATE 40001) that they must handle with retry loops.

No triggers or stored procedures with PL/pgSQL (limited support). CockroachDB added partial PL/pgSQL support in v23.2, but it is not feature-complete. Avoid relying on complex procedural logic.

Primary keys are required. CockroachDB uses the primary key to distribute data across ranges. If you omit a primary key, it adds a hidden rowid column using unique_rowid(). For performance, always define an explicit primary key. Prefer UUIDs (gen_random_uuid()) over sequential integers to avoid write hotspots.

No sequences for distributed workloads. SERIAL maps to unique_rowid(), not a PostgreSQL-style sequence. This avoids the hot-range problem but means IDs are not monotonically increasing. If you need ordering, use a TIMESTAMPTZ column alongside the ID.