Database Testing Strategies#

Database tests are the tests most teams get wrong. They either skip them entirely (testing with mocks, then discovering schema mismatches in production), or they build a fragile suite sharing a single database where tests interfere with each other. The right approach depends on what you are testing and what tradeoffs you can accept.

Fixtures vs Factories#

Fixtures#

Fixtures are static SQL files loaded before a test suite runs:

-- fixtures/users.sql
INSERT INTO users (id, email, role) VALUES
  (1, 'admin@test.com', 'admin'),
  (2, 'user@test.com', 'member');

-- fixtures/orders.sql
INSERT INTO orders (id, user_id, total, status) VALUES
  (100, 2, 49.99, 'completed'),
  (101, 2, 129.00, 'pending');

Use fixtures when: the domain model is stable, tests need complex interconnected data, and you want fast loading. Fixtures break when: schema changes require updating every fixture file – a new required column means editing every INSERT across all files.

Factories#

Factories generate test data on demand with sensible defaults:

import factory
from myapp.models import User, Order

class UserFactory(factory.Factory):
    class Meta:
        model = User
    email = factory.Sequence(lambda n: f"user{n}@test.com")
    role = "member"

class OrderFactory(factory.Factory):
    class Meta:
        model = Order
    user = factory.SubFactory(UserFactory)
    total = factory.Faker("pydecimal", left_digits=3, right_digits=2, positive=True)
    status = "pending"

# Each test builds exactly what it needs
def test_order_completion():
    order = OrderFactory(status="pending", total=49.99)
    complete_order(order.id)
    assert Order.get(order.id).status == "completed"

Use factories when: schema changes frequently and each test should be self-documenting. Factories break when: you need complex graphs of related data – building an order that requires a user, product, inventory, warehouse, and shipping zone becomes a chain of factory calls.

The pragmatic approach: use both. Factories for test-specific data, fixtures for reference data that rarely changes (countries, currencies, permission definitions).

Database Containers with Testcontainers#

Testcontainers spins up real database instances in Docker for each test run. No shared test databases, no version mismatches between local and CI.

@Testcontainers
class OrderRepositoryTest {
    @Container
    static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16")
        .withDatabaseName("testdb")
        .withInitScript("schema.sql");

    @Test
    void shouldCreateOrder() {
        OrderRepository repo = new OrderRepository(dataSource);
        Order order = repo.create(new Order(userId, 49.99));
        assertThat(order.getId()).isNotNull();
    }
}

func TestOrderRepository(t *testing.T) {
    ctx := context.Background()
    container, err := postgres.Run(ctx, "postgres:16",
        postgres.WithDatabase("testdb"),
    )
    require.NoError(t, err)
    defer container.Terminate(ctx)

    connStr, _ := container.ConnectionString(ctx, "sslmode=disable")
    db, _ := sql.Open("pgx", connStr)
    runMigrations(db)
    // test against real PostgreSQL
}

Use Testcontainers for: integration tests against real database engines, testing database-specific features (jsonb queries, full-text search), CI pipelines needing reproducible state. Skip it for: unit tests where in-memory databases or mocks suffice and container startup time (2-5 seconds) hurts fast feedback.

Container Lifecycle#

One container per suite with transaction rollback between tests balances isolation and speed:

@pytest.fixture(autouse=True)
def db_transaction(db_connection):
    transaction = db_connection.begin()
    yield db_connection
    transaction.rollback()  # every test starts clean

Migration Testing#

Schema migrations are deployments that run DDL against production data. Test them like deployments.

Forward and Rollback Testing#

#!/bin/bash
# Run in CI on every PR with migrations
docker run -d --name migration-test -e POSTGRES_DB=testdb postgres:16
sleep 3

# Apply all migrations up to previous release
flyway -url=jdbc:postgresql://localhost:5432/testdb \
       -target=previous_release migrate

# Load representative test data
psql -h localhost -d testdb -f test-data/representative-dataset.sql

# Apply new migrations -- this is what we are testing
flyway -url=jdbc:postgresql://localhost:5432/testdb migrate

# Validate schema matches expected state
pg_dump -h localhost -d testdb --schema-only > actual_schema.sql
diff expected_schema.sql actual_schema.sql

Every migration should have a rollback script. Test both directions:

-- V5__add_order_status.sql (forward)
ALTER TABLE orders ADD COLUMN status VARCHAR(20) DEFAULT 'pending';

-- V5__add_order_status_rollback.sql (reverse)
ALTER TABLE orders DROP COLUMN status;

Data Migration Testing#

When migrations transform data, test with production-scale volumes. A migration taking 200ms on 1000 rows might take 45 minutes on 10 million rows, locking the table the entire time. Run data migrations against a masked copy of production data to discover this before the maintenance window.

Performance Regression Testing#

Database performance degrades gradually. A query that took 5ms last month takes 50ms now because the table grew and nobody noticed the missing index.

Baseline and Regression Detection#

# Capture EXPLAIN ANALYZE for critical queries in CI
psql -h localhost -d testdb <<'EOF' > baseline/order_lookup.json
EXPLAIN (ANALYZE, FORMAT JSON)
SELECT o.*, u.email FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending'
AND o.created_at > NOW() - INTERVAL '7 days';
EOF

def check_regression(baseline_file, current_file, threshold=2.0):
    baseline = json.load(open(baseline_file))
    current = json.load(open(current_file))
    baseline_time = baseline[0]["Execution Time"]
    current_time = current[0]["Execution Time"]
    if current_time > baseline_time * threshold:
        print(f"REGRESSION: {baseline_time:.1f}ms -> {current_time:.1f}ms")
        return False
    return True

Run with a dataset large enough to reveal performance differences. A 1000-row test database will not expose a missing index on a low-cardinality column.

Data Masking for Test Environments#

Test environments need realistic data. Production data contains PII. Data masking bridges the gap.

-- Mask emails (preserve domain distribution)
UPDATE users SET email = 'user_' || id || '@' ||
  CASE (id % 3)
    WHEN 0 THEN 'gmail.com'
    WHEN 1 THEN 'company.com'
    WHEN 2 THEN 'outlook.com'
  END;

-- Mask names
UPDATE users SET first_name = 'First' || id, last_name = 'Last' || id;

-- Mask phone numbers (preserve format)
UPDATE users SET phone = '+1555' || LPAD((id % 10000000)::TEXT, 7, '0');

Mask in dependency order (parent tables first) and verify referential integrity after masking. Automate the pipeline: nightly dump of production, mask in an isolated database, validate, restore to staging.

Decision Matrix#

Scenario	Approach	Why
Unit testing repository logic	Factories + Testcontainers	Isolated, each test defines its own state
Testing DB-specific features	Testcontainers	Real engine, reproducible
Testing migrations	Forward + rollback against representative data	Catches data bugs before production
Performance regression detection	Baseline capture in CI with production-scale data	Prevents gradual degradation
Staging environment data	Masked production dump	Realistic without PII exposure
Reference/seed data	Fixtures	Stable data loaded once

The common thread: test against real databases, with realistic data, in isolated environments. Mocks catch logic errors but miss the schema mismatches, query performance issues, and migration failures that cause production incidents.