Docker Compose Validation Stacks#

Docker Compose validates multi-service architectures without Kubernetes overhead. It answers the question: do these services actually work together? Containers start, connect, and communicate – or they fail, giving you fast feedback before you push to a cluster.

This article provides complete Compose stacks for four common validation scenarios. Each includes the full docker-compose.yml, health check scripts, and teardown procedures. The pattern for using them is always the same: clone the template, customize for your services, bring it up, validate, capture results, bring it down.

Stack 1: Web Application + PostgreSQL + Redis#

The most common stack. A web application that uses PostgreSQL for persistence and Redis for caching or session storage. Validates database connectivity, cache behavior, and the application’s startup sequence.

# docker-compose-web-stack.yml
version: "3.8"

services:
  app:
    image: ${APP_IMAGE:-my-app:latest}
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8080:8080"
    environment:
      DATABASE_URL: "postgresql://appuser:${POSTGRES_PASSWORD:-apppass}@postgres:5432/appdb"
      REDIS_URL: "redis://redis:6379/0"
      APP_ENV: "test"
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    read_only: true
    tmpfs:
      - /tmp
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:8080/health"]
      interval: 5s
      timeout: 3s
      retries: 10
      start_period: 10s
    networks:
      - frontend
      - backend

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: appuser
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-apppass}
      POSTGRES_DB: appdb
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
    read_only: true
    tmpfs:
      - /tmp
      - /var/run/postgresql
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser -d appdb"]
      interval: 5s
      timeout: 3s
      retries: 10
    networks:
      - backend

  redis:
    image: redis:7-alpine
    read_only: true
    tmpfs:
      - /tmp
      - /data
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          memory: 256M
          cpus: "0.5"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 10
    networks:
      - backend

networks:
  frontend:
  backend:
    internal: true

volumes:
  postgres_data:

Security patterns applied: Read-only root filesystems with tmpfs for writable directories, all capabilities dropped (PostgreSQL gets the minimum it needs), no-new-privileges, resource limits on every container, backend network is internal-only (PostgreSQL and Redis have no internet access), init scripts mounted read-only, passwords sourced from environment variables. See Securing Docker Validation Templates for the full security reference.

Health check script:

#!/bin/bash
# validate-web-stack.sh
set -euo pipefail

COMPOSE_FILE="docker-compose-web-stack.yml"
PROJECT_NAME="web-validation-$$"

cleanup() {
  echo "--- Tearing down ---"
  docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" down -v --remove-orphans 2>/dev/null || true
}
trap cleanup EXIT

echo "=== Starting stack ==="
docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" up -d --build --wait

echo "=== Checking service health ==="

# Verify PostgreSQL
docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" exec -T postgres \
  psql -U appuser -d appdb -c "SELECT 1 AS health_check;"
echo "PostgreSQL: healthy"

# Verify Redis
docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" exec -T redis \
  redis-cli ping
echo "Redis: healthy"

# Verify application
for i in $(seq 1 30); do
  if curl -sf http://localhost:8080/health > /dev/null 2>&1; then
    echo "Application: healthy"
    break
  fi
  if [ "$i" -eq 30 ]; then
    echo "ERROR: Application health check failed after 30 attempts"
    docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" logs app
    exit 1
  fi
  sleep 2
done

# Test database connectivity through the app
HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" http://localhost:8080/health)
if [ "${HTTP_STATUS}" -ne 200 ]; then
  echo "ERROR: App returned HTTP ${HTTP_STATUS}"
  exit 1
fi

echo "=== All checks passed ==="

What this validates: Application starts, connects to PostgreSQL, connects to Redis, and responds to health checks. The depends_on with condition: service_healthy ensures the startup order is correct.

What this misses: Production-like connection pooling, SSL/TLS between services, persistent volume behavior across restarts, and performance under load.

Stack 2: Microservices (3 Services + Message Queue + Database)#

A realistic microservices topology. Three services communicate through a message queue (RabbitMQ), with a shared database for the services that need persistence. Validates inter-service communication, queue connectivity, and service independence.

# docker-compose-microservices.yml
version: "3.8"

services:
  api-gateway:
    image: ${API_GATEWAY_IMAGE:-api-gateway:latest}
    build:
      context: ./services/api-gateway
    ports:
      - "8080:8080"
    environment:
      ORDER_SERVICE_URL: "http://order-service:8081"
      USER_SERVICE_URL: "http://user-service:8082"
    depends_on:
      order-service:
        condition: service_healthy
      user-service:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:8080/health"]
      interval: 5s
      timeout: 3s
      retries: 10
      start_period: 10s

  order-service:
    image: ${ORDER_SERVICE_IMAGE:-order-service:latest}
    build:
      context: ./services/order-service
    ports:
      - "8081:8081"
    environment:
      DATABASE_URL: "postgresql://orders:orderspass@postgres:5432/ordersdb"
      RABBITMQ_URL: "amqp://guest:guest@rabbitmq:5672/"
      QUEUE_NAME: "order-events"
    depends_on:
      postgres:
        condition: service_healthy
      rabbitmq:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:8081/health"]
      interval: 5s
      timeout: 3s
      retries: 10
      start_period: 10s

  user-service:
    image: ${USER_SERVICE_IMAGE:-user-service:latest}
    build:
      context: ./services/user-service
    ports:
      - "8082:8082"
    environment:
      DATABASE_URL: "postgresql://users:userspass@postgres:5432/usersdb"
      RABBITMQ_URL: "amqp://guest:guest@rabbitmq:5672/"
      QUEUE_NAME: "user-events"
    depends_on:
      postgres:
        condition: service_healthy
      rabbitmq:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:8082/health"]
      interval: 5s
      timeout: 3s
      retries: 10
      start_period: 10s

  notification-worker:
    image: ${NOTIFICATION_IMAGE:-notification-worker:latest}
    build:
      context: ./services/notification-worker
    environment:
      RABBITMQ_URL: "amqp://guest:guest@rabbitmq:5672/"
      LISTEN_QUEUES: "order-events,user-events"
    depends_on:
      rabbitmq:
        condition: service_healthy
    # Workers may not have HTTP health checks -- use a different strategy
    healthcheck:
      test: ["CMD-SHELL", "test -f /tmp/worker-healthy || exit 1"]
      interval: 10s
      timeout: 3s
      retries: 5
      start_period: 15s

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: adminpass
    volumes:
      - ./init-databases.sql:/docker-entrypoint-initdb.d/01-init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U admin"]
      interval: 5s
      timeout: 3s
      retries: 10

  rabbitmq:
    image: rabbitmq:3.13-management-alpine
    ports:
      - "5672:5672"
      - "15672:15672"
    environment:
      RABBITMQ_DEFAULT_USER: guest
      RABBITMQ_DEFAULT_PASS: guest
    healthcheck:
      test: ["CMD", "rabbitmq-diagnostics", "-q", "check_running"]
      interval: 10s
      timeout: 5s
      retries: 10
      start_period: 20s

The database initialization script creates separate databases for each service:

-- init-databases.sql
CREATE USER orders WITH PASSWORD 'orderspass';
CREATE DATABASE ordersdb OWNER orders;

CREATE USER users WITH PASSWORD 'userspass';
CREATE DATABASE usersdb OWNER users;

Health check script:

#!/bin/bash
# validate-microservices.sh
set -euo pipefail

COMPOSE_FILE="docker-compose-microservices.yml"
PROJECT_NAME="micro-validation-$$"

cleanup() {
  docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" down -v --remove-orphans 2>/dev/null || true
}
trap cleanup EXIT

echo "=== Starting microservices stack ==="
docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" up -d --build --wait --wait-timeout 120

echo "=== Verifying infrastructure services ==="
docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" exec -T postgres \
  psql -U admin -c "SELECT datname FROM pg_database WHERE datname IN ('ordersdb', 'usersdb');"

# Verify RabbitMQ management API
curl -sf -u guest:guest http://localhost:15672/api/overview > /dev/null
echo "RabbitMQ: healthy"

echo "=== Verifying application services ==="
for svc in "api-gateway:8080" "order-service:8081" "user-service:8082"; do
  NAME="${svc%%:*}"
  PORT="${svc##*:}"
  if curl -sf "http://localhost:${PORT}/health" > /dev/null; then
    echo "${NAME}: healthy"
  else
    echo "ERROR: ${NAME} not responding"
    docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" logs "${NAME}"
    exit 1
  fi
done

echo "=== Testing inter-service communication ==="
# Create an order through the gateway, which should call order-service
RESPONSE=$(curl -sf -X POST http://localhost:8080/api/orders \
  -H "Content-Type: application/json" \
  -d '{"user_id": "test-user", "items": [{"sku": "TEST-001", "qty": 1}]}')
echo "Order creation response: ${RESPONSE}"

echo "=== Checking message queue ==="
# Verify that order-events queue has been created and has activity
QUEUE_INFO=$(curl -sf -u guest:guest http://localhost:15672/api/queues/%2F/order-events)
echo "Queue status: ${QUEUE_INFO}" | python3 -c "import sys,json; q=json.load(sys.stdin); print(f'Messages: {q.get(\"messages\",0)}, Consumers: {q.get(\"consumers\",0)}')" 2>/dev/null || echo "Queue info parsed"

echo "=== ALL MICROSERVICE CHECKS PASSED ==="

What this validates: Service startup order, database per service isolation, message queue connectivity, inter-service HTTP communication through the gateway, and worker consumer connectivity.

What this misses: Service mesh behavior, circuit breakers under failure conditions, service discovery beyond Docker DNS, and behavior under partial failures (one service down while others continue).

Stack 3: Full Observability (App + Prometheus + Grafana + Loki)#

Validates that an application exposes metrics correctly, that Prometheus scrapes them, that Grafana can query Prometheus, and that logs flow through Loki. Use this when validating observability instrumentation changes.

# docker-compose-observability.yml
version: "3.8"

services:
  app:
    image: ${APP_IMAGE:-my-app:latest}
    build:
      context: .
    ports:
      - "8080:8080"
    environment:
      METRICS_ENABLED: "true"
      LOG_FORMAT: "json"
    logging:
      driver: loki
      options:
        loki-url: "http://localhost:3100/loki/api/v1/push"
        loki-batch-size: "100"
        loki-retries: "3"
        loki-timeout: "5s"
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:8080/health"]
      interval: 5s
      timeout: 3s
      retries: 10
      start_period: 10s

  prometheus:
    image: prom/prometheus:v2.51.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=1h"
      - "--web.enable-lifecycle"
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:9090/-/healthy"]
      interval: 5s
      timeout: 3s
      retries: 10

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
      GF_AUTH_ANONYMOUS_ENABLED: "true"
      GF_AUTH_ANONYMOUS_ORG_ROLE: Admin
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
    depends_on:
      prometheus:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:3000/api/health"]
      interval: 5s
      timeout: 3s
      retries: 10

  loki:
    image: grafana/loki:2.9.5
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:3100/ready"]
      interval: 10s
      timeout: 5s
      retries: 10
      start_period: 15s

Prometheus configuration:

# prometheus.yml
global:
  scrape_interval: 5s
  evaluation_interval: 5s

scrape_configs:
  - job_name: "app"
    static_configs:
      - targets: ["app:8080"]
    metrics_path: "/metrics"
    scrape_interval: 5s

Grafana datasource provisioning:

# grafana/provisioning/datasources/datasources.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100

Health check script:

#!/bin/bash
# validate-observability.sh
set -euo pipefail

COMPOSE_FILE="docker-compose-observability.yml"
PROJECT_NAME="obs-validation-$$"

cleanup() {
  docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" down -v --remove-orphans 2>/dev/null || true
}
trap cleanup EXIT

# Loki Docker logging driver must be installed on the host
docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions 2>/dev/null || true

echo "=== Starting observability stack ==="
docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" up -d --build --wait --wait-timeout 120

echo "=== Generating traffic for metrics ==="
for i in $(seq 1 20); do
  curl -sf http://localhost:8080/health > /dev/null 2>&1 || true
  curl -sf http://localhost:8080/api/test > /dev/null 2>&1 || true
done
echo "Traffic generated. Waiting for scrape interval..."
sleep 10

echo "=== Checking Prometheus targets ==="
TARGETS=$(curl -sf http://localhost:9090/api/v1/targets)
UP_COUNT=$(echo "${TARGETS}" | python3 -c "import sys,json; t=json.load(sys.stdin); print(sum(1 for a in t['data']['activeTargets'] if a['health']=='up'))" 2>/dev/null || echo "0")
echo "Active healthy targets: ${UP_COUNT}"
if [ "${UP_COUNT}" -lt 1 ]; then
  echo "ERROR: No healthy Prometheus targets"
  echo "${TARGETS}" | python3 -m json.tool 2>/dev/null || echo "${TARGETS}"
  exit 1
fi

echo "=== Querying application metrics ==="
METRICS=$(curl -sf "http://localhost:9090/api/v1/query?query=up{job='app'}")
echo "Metric 'up' for app: ${METRICS}"

# Check for application-specific metrics
APP_METRICS=$(curl -sf "http://localhost:9090/api/v1/label/__name__/values" | python3 -c "import sys,json; names=json.load(sys.stdin)['data']; [print(n) for n in names if not n.startswith('go_') and not n.startswith('process_') and not n.startswith('promhttp_')]" 2>/dev/null || true)
if [ -n "${APP_METRICS}" ]; then
  echo "Application-specific metrics found:"
  echo "${APP_METRICS}"
else
  echo "WARNING: No application-specific metrics found (only Go runtime and process metrics)"
fi

echo "=== Checking Grafana datasources ==="
DATASOURCES=$(curl -sf http://localhost:3000/api/datasources)
echo "Configured datasources: ${DATASOURCES}"

echo "=== Checking Loki ==="
LOKI_READY=$(curl -sf http://localhost:3100/ready)
echo "Loki status: ${LOKI_READY}"

# Query Loki for logs from the app
LOKI_LOGS=$(curl -sf "http://localhost:3100/loki/api/v1/query_range" \
  --data-urlencode "query={compose_service=\"app\"}" \
  --data-urlencode "start=$(date -u -v-5M +%s 2>/dev/null || date -u -d '5 minutes ago' +%s)000000000" \
  --data-urlencode "end=$(date -u +%s)000000000" \
  --data-urlencode "limit=5" 2>/dev/null || echo "{}")
echo "Loki log query result: received"

echo "=== ALL OBSERVABILITY CHECKS PASSED ==="
echo "Prometheus: scraping app metrics"
echo "Grafana: datasources configured (Prometheus + Loki)"
echo "Loki: receiving logs from app container"

What this validates: Metrics endpoint exposure, Prometheus scrape configuration, Grafana datasource provisioning, Loki log ingestion, and the complete observability pipeline from application to dashboard.

What this misses: Alert routing (Alertmanager), dashboard correctness (just checks that datasources exist), long-term storage retention policies, and high-cardinality metric behavior.

Stack 4: Database Migration Testing#

Validates database migrations across multiple PostgreSQL versions. Runs the migration against each version to ensure compatibility. Critical for upgrades and for catching version-specific SQL syntax issues.

# docker-compose-migration-test.yml
version: "3.8"

services:
  postgres14:
    image: postgres:14-alpine
    environment:
      POSTGRES_USER: migtest
      POSTGRES_PASSWORD: migtest
      POSTGRES_DB: migtest
    ports:
      - "5414:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U migtest"]
      interval: 5s
      timeout: 3s
      retries: 10

  postgres15:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: migtest
      POSTGRES_PASSWORD: migtest
      POSTGRES_DB: migtest
    ports:
      - "5415:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U migtest"]
      interval: 5s
      timeout: 3s
      retries: 10

  postgres16:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: migtest
      POSTGRES_PASSWORD: migtest
      POSTGRES_DB: migtest
    ports:
      - "5416:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U migtest"]
      interval: 5s
      timeout: 3s
      retries: 10

  postgres17:
    image: postgres:17-alpine
    environment:
      POSTGRES_USER: migtest
      POSTGRES_PASSWORD: migtest
      POSTGRES_DB: migtest
    ports:
      - "5417:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U migtest"]
      interval: 5s
      timeout: 3s
      retries: 10

Migration validation script:

#!/bin/bash
# validate-migrations.sh
# Usage: ./validate-migrations.sh <migrations-dir>
set -euo pipefail

MIGRATIONS_DIR="${1:?Usage: $0 <migrations-dir>}"
COMPOSE_FILE="docker-compose-migration-test.yml"
PROJECT_NAME="mig-validation-$$"
DB_USER="migtest"
DB_PASS="migtest"
DB_NAME="migtest"

VERSIONS=("14:5414" "15:5415" "16:5416" "17:5417")
FAILED=()

cleanup() {
  docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" down -v --remove-orphans 2>/dev/null || true
}
trap cleanup EXIT

echo "=== Starting all PostgreSQL versions ==="
docker compose -f "${COMPOSE_FILE}" -p "${PROJECT_NAME}" up -d --wait --wait-timeout 60

echo "=== Running migrations against each version ==="
for version_port in "${VERSIONS[@]}"; do
  VERSION="${version_port%%:*}"
  PORT="${version_port##*:}"

  echo ""
  echo "--- PostgreSQL ${VERSION} (port ${PORT}) ---"

  # Sort migration files and apply in order
  for migration in $(ls "${MIGRATIONS_DIR}"/*.sql 2>/dev/null | sort); do
    BASENAME=$(basename "${migration}")
    echo "  Applying: ${BASENAME}"
    if PGPASSWORD="${DB_PASS}" psql -h localhost -p "${PORT}" -U "${DB_USER}" -d "${DB_NAME}" \
       -f "${migration}" -v ON_ERROR_STOP=1 2>&1; then
      echo "  OK: ${BASENAME}"
    else
      echo "  FAILED: ${BASENAME} on PostgreSQL ${VERSION}"
      FAILED+=("pg${VERSION}:${BASENAME}")
    fi
  done

  # Verify the final schema
  echo "  Verifying schema..."
  TABLES=$(PGPASSWORD="${DB_PASS}" psql -h localhost -p "${PORT}" -U "${DB_USER}" -d "${DB_NAME}" \
    -t -c "SELECT tablename FROM pg_tables WHERE schemaname = 'public' ORDER BY tablename;")
  echo "  Tables: $(echo ${TABLES} | tr '\n' ', ')"
done

echo ""
echo "=== MIGRATION RESULTS ==="
if [ ${#FAILED[@]} -eq 0 ]; then
  echo "All migrations passed on all PostgreSQL versions."
else
  echo "FAILURES:"
  for f in "${FAILED[@]}"; do
    echo "  - ${f}"
  done
  exit 1
fi

What this validates: Migration SQL compatibility across PostgreSQL major versions, schema creation order, constraint compatibility, and function/trigger syntax differences between versions.

What this misses: Performance characteristics of migrations on large datasets, locking behavior under concurrent load, and logical replication compatibility.

The Agent Workflow#

When an agent uses these stacks, it should follow this sequence:

Select the template. Based on what is being validated, choose the closest stack. If validating a web app change, use Stack 1. If validating a migration, use Stack 4.
Customize the template. Replace image names, environment variables, port mappings, and volume mounts to match the actual services. Do not modify the health check patterns unless the service has a different health endpoint.
Bring it up. Run docker compose up -d --build --wait. The --wait flag blocks until all health checks pass or timeout.
Run validation checks. Execute the health check script or run custom verification commands. Capture all output.
Capture results. Before tearing down, save logs if any checks failed: docker compose logs > validation-results.log.
Tear down. Always tear down, even on success. Use docker compose down -v --remove-orphans. The -v flag removes volumes so the next run starts clean. The --remove-orphans flag catches containers from modified compose files.
Report. State what was validated, what passed, and what the stack cannot test. Reference the fidelity limitations documented with each stack above.

The trap-based cleanup in every script ensures teardown happens even when validation fails. This is not optional. Orphaned containers consume resources and can cause port conflicts on subsequent runs.