Database Schema Migrations in CI/CD#

Schema migrations are the riskiest step in most deployment pipelines. Application code can be rolled back in seconds by deploying the previous container image. A database migration that drops a column, changes a data type, or restructures a table cannot be undone by pressing a button. Yet many teams run migrations manually, or tack them onto deployment scripts without testing, rollback plans, or zero-downtime considerations.

Migration Tools#

Flyway#

Flyway uses numbered SQL files. Migrations are named with a version prefix and run in order:

sql/
  V1__create_users.sql
  V2__add_email_index.sql
  V3__create_orders.sql

Flyway tracks applied migrations in a flyway_schema_history table. It refuses to run if a previously applied migration’s checksum has changed, preventing accidental modification of already-applied migrations.

flyway -url=jdbc:postgresql://localhost:5432/myapp \
       -user=deploy \
       -password=$DB_PASSWORD \
       migrate

Strengths: Simple. SQL-only. No abstraction layer – you write the exact DDL you want. Checksum validation prevents drift. Community edition is free.

Weaknesses: No automatic rollback generation. Undo migrations (V1__create_users, U1__drop_users) require the paid Teams edition. Java dependency.

Liquibase#

Liquibase uses changelog files in XML, YAML, JSON, or SQL. It tracks migrations in a databasechangelog table:

# changelog.yaml
databaseChangeLog:
  - changeSet:
      id: 1
      author: deploy
      changes:
        - createTable:
            tableName: users
            columns:
              - column:
                  name: id
                  type: bigint
                  autoIncrement: true
                  constraints:
                    primaryKey: true
              - column:
                  name: email
                  type: varchar(255)
                  constraints:
                    nullable: false
                    unique: true

Strengths: Database-agnostic changelog format. Automatic rollback generation for many operations (create table can auto-generate drop table). Preconditions let you guard migrations (onFail: MARK_RAN to skip already-applied changes). Contexts and labels allow selective migration execution per environment.

Weaknesses: The abstraction layer hides the actual SQL being executed. Complex migrations end up using raw SQL anyway. XML changelogs are verbose.

golang-migrate#

A lightweight, Go-based migration tool. Migrations are pairs of up/down SQL files:

migrations/
  000001_create_users.up.sql
  000001_create_users.down.sql
  000002_add_email_index.up.sql
  000002_add_email_index.down.sql
migrate -path ./migrations \
        -database "postgresql://deploy:$DB_PASSWORD@localhost:5432/myapp?sslmode=require" \
        up

Strengths: Single binary, no JVM dependency. Supports PostgreSQL, MySQL, SQLite, CockroachDB, and many others. Simple versioning scheme. Clean separation of up and down migrations.

Weaknesses: No checksum validation – you can silently modify an applied migration. No built-in diffing or changelog features.

Rails Migrations#

Rails migrations use Ruby DSL with automatic up/down generation:

class AddOrdersTable < ActiveRecord::Migration[7.1]
  def change
    create_table :orders do |t|
      t.references :user, null: false, foreign_key: true
      t.decimal :total, precision: 10, scale: 2
      t.string :status, default: 'pending'
      t.timestamps
    end
    add_index :orders, [:user_id, :status]
  end
end

Run in CI with rails db:migrate. The change method auto-generates rollback. For irreversible operations, use explicit up and down methods.

Running Migrations in CI/CD Pipelines#

Migrations should run as a distinct pipeline step, after tests pass and before application deployment. Never embed migration execution inside the application startup – if the migration fails, the application enters a crash loop.

GitHub Actions Example#

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: make test

  migrate:
    needs: test
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      - name: Run migrations
        run: |
          migrate -path ./migrations \
            -database "${{ secrets.DATABASE_URL }}" \
            up
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}

  deploy:
    needs: migrate
    runs-on: ubuntu-latest
    environment: production
    steps:
      - run: kubectl set image deployment/api api=$IMAGE_TAG

The environment: production scope ensures that DATABASE_URL contains the production connection string and that the migration step requires manual approval (if configured on the environment).

Kubernetes Job Pattern#

For pipelines deploying to Kubernetes, run migrations as a Kubernetes Job or a Helm pre-upgrade hook:

# migration-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate-{{ .Release.Revision }}
  annotations:
    helm.sh/hook: pre-upgrade
    helm.sh/hook-weight: "-1"
    helm.sh/hook-delete-policy: hook-succeeded
spec:
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: migrate
          image: myapp:{{ .Values.image.tag }}
          command: ["migrate", "-path", "/migrations", "-database", "$(DATABASE_URL)", "up"]
          envFrom:
            - secretRef:
                name: db-credentials

backoffLimit: 0 prevents retry on failure. A failed migration should stop the deployment, not retry and potentially apply partial changes again. hook-delete-policy: hook-succeeded cleans up completed jobs automatically.

Rollback Strategies#

The safest rollback strategy is to never roll back. Instead, fix forward by applying a new migration that corrects the issue. Rolling back migrations is dangerous because:

  • The application code deployed between the migration and the rollback may have written data using the new schema.
  • Down migrations are rarely tested. They are written once and never executed until a crisis.
  • Rollback of data migrations (backfills, transforms) can cause data loss.

Down Migrations#

If you must support rollback, maintain paired up/down migrations and test them:

# Apply
migrate -path ./migrations -database $DB_URL up

# Roll back the last migration
migrate -path ./migrations -database $DB_URL down 1

Test down migrations in CI by running up then down then up again against a test database. This catches down migrations that fail or leave the schema in an inconsistent state.

Zero-Downtime Migrations: Expand and Contract#

Most schema changes can be made without downtime using the expand/contract pattern. The idea: never make a breaking change in a single step. Instead, expand the schema to support both old and new code, deploy the new code, then contract the schema to remove the old structure.

Example: Renaming a Column#

You cannot rename a column in one step without breaking running application instances. Instead:

Migration 1 (Expand): Add the new column and backfill data.

-- V4__expand_add_full_name.sql
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
UPDATE users SET full_name = name;
-- Trigger to keep both columns in sync during transition
CREATE OR REPLACE FUNCTION sync_user_name() RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' OR NEW.name IS DISTINCT FROM OLD.name THEN
    NEW.full_name := NEW.name;
  END IF;
  IF NEW.full_name IS DISTINCT FROM OLD.full_name THEN
    NEW.name := NEW.full_name;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_sync_user_name BEFORE INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_user_name();

Deploy: Update application code to read from full_name and write to both full_name and name.

Migration 2 (Contract): Once all application instances use the new column, drop the old column and trigger.

-- V5__contract_drop_name.sql
DROP TRIGGER trg_sync_user_name ON users;
DROP FUNCTION sync_user_name();
ALTER TABLE users DROP COLUMN name;

This pattern adds a deployment cycle but guarantees zero downtime. The expand migration and the contract migration ship in separate releases.

Adding a NOT NULL Column#

Adding a NOT NULL column to an existing table fails if the table has rows. The expand/contract approach:

-- Step 1: Add as nullable
ALTER TABLE orders ADD COLUMN tracking_number VARCHAR(100);

-- Step 2: Backfill (may need batching for large tables)
UPDATE orders SET tracking_number = 'PENDING' WHERE tracking_number IS NULL;

-- Step 3: Add constraint (after application sets value for new rows)
ALTER TABLE orders ALTER COLUMN tracking_number SET NOT NULL;

On PostgreSQL, ALTER TABLE ... SET NOT NULL acquires an ACCESS EXCLUSIVE lock and scans the entire table. For large tables, use a CHECK constraint instead, which can be added as NOT VALID and validated separately:

ALTER TABLE orders ADD CONSTRAINT orders_tracking_not_null
  CHECK (tracking_number IS NOT NULL) NOT VALID;
-- Validate in background without blocking writes
ALTER TABLE orders VALIDATE CONSTRAINT orders_tracking_not_null;

Migration Testing in CI#

Test migrations against a real database in CI, not just syntax checking:

services:
  postgres:
    image: postgres:16
    env:
      POSTGRES_DB: test
      POSTGRES_PASSWORD: test
    ports:
      - 5432:5432

steps:
  - name: Test migrations forward
    run: migrate -path ./migrations -database "postgresql://postgres:test@localhost:5432/test?sslmode=disable" up

  - name: Test migrations backward
    run: migrate -path ./migrations -database "postgresql://postgres:test@localhost:5432/test?sslmode=disable" down -all

  - name: Test migrations forward again
    run: migrate -path ./migrations -database "postgresql://postgres:test@localhost:5432/test?sslmode=disable" up

  - name: Validate schema matches expected
    run: |
      pg_dump -s -h localhost -U postgres test > actual_schema.sql
      diff expected_schema.sql actual_schema.sql

The up-down-up cycle verifies that down migrations work and that re-applying up migrations produces the same schema. The schema diff catches drift between migrations and the expected schema.

For large tables, test migration performance by loading representative data volumes into the CI database. A migration that runs in 50ms on an empty table may lock a production table with 10 million rows for minutes.

Common Mistakes#

  1. Running migrations inside application startup. If the migration fails, the application crashes and restarts, potentially retrying a partially applied migration. Run migrations as a separate, pre-deployment step.
  2. No down migrations. Even if you follow a fix-forward strategy, having down migrations gives you an emergency escape hatch. Write them. Test them.
  3. Large table ALTERs without considering locks. On PostgreSQL, most ALTER TABLE operations acquire ACCESS EXCLUSIVE locks. Adding a column with a default, adding NOT NULL, or changing a type blocks all reads and writes. Use the expand/contract pattern.
  4. Not testing with production-scale data. A migration that runs in CI on an empty database may time out in production. Profile migrations against representative data volumes before applying to production.