Securing Docker-Based Validation Templates#

Validation templates define the environment agents use to test infrastructure changes. If a template runs containers as root, mounts the Docker socket, or skips resource limits, every agent that copies it inherits those risks. This reference covers the security patterns every docker-compose validation template must follow.

1. Non-Root Execution#

Containers run as root by default. A vulnerability in a root-running process gives an attacker full control inside the container and a much larger attack surface for container escapes.

In your Dockerfile, create a dedicated user and switch to it:

FROM node:22-alpine

RUN addgroup -S appgroup && adduser -S appuser -G appgroup

WORKDIR /app
COPY --chown=appuser:appgroup . .

USER appuser

CMD ["node", "server.js"]

For third-party images where you do not control the Dockerfile, set the user in docker-compose:

services:
  postgres:
    image: postgres:16-alpine
    user: "999:999"
    volumes:
      - pgdata:/var/lib/postgresql/data

The official PostgreSQL image already creates a postgres user with UID/GID 999. Setting user: "999:999" makes this explicit and prevents the entrypoint from running initial setup as root.

File permission issues are the main obstacle. If a volume was previously created by a root-running container, the non-root user cannot write to it. Fix this by either deleting and recreating the volume or running a one-time init container that fixes ownership:

services:
  fix-permissions:
    image: busybox
    command: chown -R 999:999 /data
    volumes:
      - pgdata:/data
    profiles:
      - init

Run it once with docker compose --profile init up fix-permissions, then start your stack normally.

2. Read-Only Root Filesystems#

A read-only root filesystem prevents attackers from writing binaries, scripts, or configuration changes inside the container. If a process is compromised, it cannot modify system files.

Set read_only: true on the service and provide tmpfs mounts for directories that legitimately need writes:

services:
  app:
    image: myapp:1.4.2
    read_only: true
    tmpfs:
      - /tmp:size=64m
      - /var/run:size=1m
    security_opt:
      - no-new-privileges:true

For PostgreSQL, the data directory is on a named volume (which is writable regardless of read_only), but the process also needs to write to /var/run/postgresql for its socket and /tmp for sorting:

services:
  postgres:
    image: postgres:16-alpine
    read_only: true
    tmpfs:
      - /tmp:size=256m
      - /var/run/postgresql:size=1m
    volumes:
      - pgdata:/var/lib/postgresql/data

Named volumes are not affected by read_only: true – they are separate mounts. This is what makes the pattern practical: data directories remain writable while the rest of the filesystem is locked down.

3. Capability Management#

Linux capabilities split root power into discrete units. Docker grants a default set of about 14 capabilities. Most containers need far fewer.

The correct approach is to drop everything and add back only what is required:

services:
  app:
    image: myapp:1.4.2
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

Here is what common services actually need:

Capability What It Allows Services That Need It
NET_BIND_SERVICE Bind to ports below 1024 Nginx/Apache on port 80/443
CHOWN Change file ownership Containers that run init scripts changing ownership at startup
SETUID / SETGID Switch user/group IDs Containers that start as root then drop to a non-root user
DAC_OVERRIDE Bypass file read/write permission checks Usually not needed – fix file permissions instead
SYS_PTRACE Trace processes Debugging containers only, never in templates
NET_RAW Use raw sockets Network diagnostic tools, ping

Most application containers need zero additional capabilities after dropping all. A Go or Node.js HTTP server listening on port 8080 needs nothing. Only add capabilities when a container fails to start without them, and document why.

4. Resource Limits#

Every container in a validation template must have memory and CPU limits. Without them, a runaway process (a memory leak, an infinite loop, a fork bomb) consumes all host resources and kills every other container. This is especially dangerous when agents are orchestrating multiple validation stacks.

services:
  app:
    image: myapp:1.4.2
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
        reservations:
          memory: 256M
          cpus: "0.25"

Guidelines for setting limits in validation templates:

  • PostgreSQL: 512M-1G memory limit, 1.0 CPU. Validation workloads are small.
  • Redis: 128M-256M memory limit, 0.5 CPU. Set maxmemory inside Redis to match.
  • Application containers: 256M-512M unless profiling says otherwise.
  • Build containers: May need more memory temporarily. Set higher limits but always set them.

The deploy.resources syntax works in docker-compose v3+ and with docker compose (Compose V2). If you see these settings ignored, confirm you are not running the legacy docker-compose v1 binary.

5. Image Security#

Use specific version tags. Never use :latest in a template. A template that worked yesterday can break or become vulnerable today because :latest changed.

# Wrong
image: postgres:latest

# Correct
image: postgres:16.4-alpine

# Best -- digest pinning for critical images
image: postgres@sha256:2c4d952e251...a3b8f7e

Digest pinning guarantees you get the exact image bytes. Use it for any image in a security-sensitive template. Get the digest with:

docker inspect --format='{{index .RepoDigests 0}}' postgres:16.4-alpine

Before adding any image to a template, scan it:

trivy image postgres:16.4-alpine
trivy image --severity HIGH,CRITICAL myapp:1.4.2

Prefer official images from Docker Hub or verified publishers. If you must use a third-party image, pin the digest and scan it. If the scan shows unfixed critical vulnerabilities, document them as accepted risks or find an alternative image.

6. Network Isolation#

Do not use network_mode: host. It bypasses all Docker network isolation and exposes every listening port on the host.

Create explicit networks. Use an internal network for service-to-service communication and a separate network for services that need external access:

services:
  app:
    image: myapp:1.4.2
    networks:
      - frontend
      - backend
    ports:
      - "8080:8080"

  postgres:
    image: postgres:16.4-alpine
    networks:
      - backend

  redis:
    image: redis:7.4-alpine
    networks:
      - backend

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true

The internal: true flag on the backend network means containers on that network cannot reach the internet. PostgreSQL and Redis have no reason to make outbound connections. The app sits on both networks: it can talk to the database and Redis on the backend, and receive external requests on the frontend.

Only expose ports that external clients need. PostgreSQL should not have ports: mapped in production-like templates. For local development where you want to connect a GUI client, use a compose override file.

7. No Docker Socket Mounting#

Mounting /var/run/docker.sock into a container gives that container full control over the Docker daemon. It can create privileged containers, mount the host filesystem, and effectively gain root on the host. Never mount the Docker socket in validation templates.

# Never do this
volumes:
  - /var/run/docker.sock:/var/run/docker.sock

Exception: kind and k3d. These tools create Kubernetes clusters inside Docker and legitimately need socket access to manage containers. This is an accepted risk with these mitigations:

  • Run kind/k3d containers in a dedicated, isolated network.
  • Do not run other workloads on the same Docker daemon during kind/k3d testing if possible.
  • Document the socket mount as an accepted risk in the template.

If you need Docker-in-Docker for building images inside a container, use alternatives:

  • Sysbox – A container runtime that enables rootless Docker-in-Docker without mounting the socket.
  • Rootless Docker – Run the Docker daemon inside the container without host socket access.
  • Kaniko / Buildah – Build container images without any Docker daemon at all.

8. Secrets Management#

Never hardcode passwords in docker-compose files. Templates are committed to version control. Hardcoded secrets end up in git history permanently.

Use an .env file that is excluded from version control:

# docker-compose.yml
services:
  postgres:
    image: postgres:16.4-alpine
    environment:
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
# .env (listed in .gitignore)
POSTGRES_PASSWORD=local_dev_password_here

For validation templates, generate random passwords at startup so the template works without any manual configuration:

#!/bin/bash
# init-env.sh -- run before docker compose up
ENV_FILE=".env"

if [ ! -f "$ENV_FILE" ]; then
  echo "POSTGRES_PASSWORD=$(openssl rand -base64 24)" > "$ENV_FILE"
  echo "REDIS_PASSWORD=$(openssl rand -base64 24)" >> "$ENV_FILE"
  echo "APP_SECRET_KEY=$(openssl rand -base64 32)" >> "$ENV_FILE"
  echo "Generated $ENV_FILE with random passwords"
fi

Include this script in every template and document in the README that users should run it first, or call it from a Makefile target.

For Docker Swarm mode, use Docker secrets:

services:
  postgres:
    image: postgres:16.4-alpine
    secrets:
      - db_password
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password

secrets:
  db_password:
    file: ./secrets/db_password.txt

9. Health Checks#

Every service in a validation template must have a health check. Without health checks, depends_on only waits for the container to start, not for the service inside to be ready. Agents that depend on service readiness will encounter race conditions.

# PostgreSQL
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
  interval: 5s
  timeout: 3s
  retries: 5
  start_period: 10s

# Redis
healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 5s
  timeout: 3s
  retries: 5

# HTTP service
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/healthz"]
  interval: 10s
  timeout: 5s
  retries: 3
  start_period: 15s

# Kubernetes API (for kind/k3d)
healthcheck:
  test: ["CMD", "kubectl", "cluster-info"]
  interval: 10s
  timeout: 5s
  retries: 12
  start_period: 30s

Set start_period for services that take time to initialize. PostgreSQL needs a few seconds for WAL recovery on startup. Kubernetes API servers in kind need 30 seconds or more. Without start_period, the health check starts counting retries immediately and may declare the service unhealthy before it has had time to boot.

Use depends_on with condition: service_healthy to enforce startup ordering:

services:
  app:
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

10. Complete Secure Template#

This is the reference template that combines every pattern above. Use it as the starting point for any new validation template.

# docker-compose.yml -- hardened validation template
# Run init-env.sh before first use to generate .env

services:
  app:
    image: myapp:1.4.2
    user: "1000:1000"
    read_only: true
    tmpfs:
      - /tmp:size=64m
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
        reservations:
          memory: 256M
          cpus: "0.25"
    networks:
      - frontend
      - backend
    ports:
      - "8080:8080"
    environment:
      DATABASE_URL: postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
      REDIS_URL: redis://:${REDIS_PASSWORD}@redis:6379/0
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/healthz"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 15s

  postgres:
    image: postgres:16.4-alpine
    user: "999:999"
    read_only: true
    tmpfs:
      - /tmp:size=256m
      - /var/run/postgresql:size=1m
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          memory: 1G
          cpus: "1.0"
        reservations:
          memory: 256M
          cpus: "0.25"
    networks:
      - backend
    environment:
      POSTGRES_DB: ${POSTGRES_DB:-appdb}
      POSTGRES_USER: ${POSTGRES_USER:-appuser}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
      interval: 5s
      timeout: 3s
      retries: 5
      start_period: 10s

  redis:
    image: redis:7.4-alpine
    user: "999:999"
    read_only: true
    tmpfs:
      - /tmp:size=32m
      - /var/run:size=1m
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    command: >
      redis-server
      --requirepass ${REDIS_PASSWORD}
      --maxmemory 128mb
      --maxmemory-policy allkeys-lru
    deploy:
      resources:
        limits:
          memory: 256M
          cpus: "0.5"
        reservations:
          memory: 64M
          cpus: "0.1"
    networks:
      - backend
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true

volumes:
  pgdata:

Pair this with the init-env.sh script from section 8 and a .gitignore that excludes .env. The template should work with a single command sequence:

./init-env.sh
docker compose up -d

Every container runs as non-root, has a read-only filesystem, drops all capabilities, sets resource limits, uses health checks, communicates over isolated networks, and loads secrets from environment variables. This is the baseline. Templates for specific use cases should start here and add only what they need, documenting any security exceptions and the reason for them.