Securing Docker-Based Validation Templates#
Validation templates define the environment agents use to test infrastructure changes. If a template runs containers as root, mounts the Docker socket, or skips resource limits, every agent that copies it inherits those risks. This reference covers the security patterns every docker-compose validation template must follow.
1. Non-Root Execution#
Containers run as root by default. A vulnerability in a root-running process gives an attacker full control inside the container and a much larger attack surface for container escapes.
In your Dockerfile, create a dedicated user and switch to it:
FROM node:22-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --chown=appuser:appgroup . .
USER appuser
CMD ["node", "server.js"]For third-party images where you do not control the Dockerfile, set the user in docker-compose:
services:
postgres:
image: postgres:16-alpine
user: "999:999"
volumes:
- pgdata:/var/lib/postgresql/dataThe official PostgreSQL image already creates a postgres user with UID/GID 999. Setting user: "999:999" makes this explicit and prevents the entrypoint from running initial setup as root.
File permission issues are the main obstacle. If a volume was previously created by a root-running container, the non-root user cannot write to it. Fix this by either deleting and recreating the volume or running a one-time init container that fixes ownership:
services:
fix-permissions:
image: busybox
command: chown -R 999:999 /data
volumes:
- pgdata:/data
profiles:
- initRun it once with docker compose --profile init up fix-permissions, then start your stack normally.
2. Read-Only Root Filesystems#
A read-only root filesystem prevents attackers from writing binaries, scripts, or configuration changes inside the container. If a process is compromised, it cannot modify system files.
Set read_only: true on the service and provide tmpfs mounts for directories that legitimately need writes:
services:
app:
image: myapp:1.4.2
read_only: true
tmpfs:
- /tmp:size=64m
- /var/run:size=1m
security_opt:
- no-new-privileges:trueFor PostgreSQL, the data directory is on a named volume (which is writable regardless of read_only), but the process also needs to write to /var/run/postgresql for its socket and /tmp for sorting:
services:
postgres:
image: postgres:16-alpine
read_only: true
tmpfs:
- /tmp:size=256m
- /var/run/postgresql:size=1m
volumes:
- pgdata:/var/lib/postgresql/dataNamed volumes are not affected by read_only: true – they are separate mounts. This is what makes the pattern practical: data directories remain writable while the rest of the filesystem is locked down.
3. Capability Management#
Linux capabilities split root power into discrete units. Docker grants a default set of about 14 capabilities. Most containers need far fewer.
The correct approach is to drop everything and add back only what is required:
services:
app:
image: myapp:1.4.2
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICEHere is what common services actually need:
| Capability | What It Allows | Services That Need It |
|---|---|---|
NET_BIND_SERVICE |
Bind to ports below 1024 | Nginx/Apache on port 80/443 |
CHOWN |
Change file ownership | Containers that run init scripts changing ownership at startup |
SETUID / SETGID |
Switch user/group IDs | Containers that start as root then drop to a non-root user |
DAC_OVERRIDE |
Bypass file read/write permission checks | Usually not needed – fix file permissions instead |
SYS_PTRACE |
Trace processes | Debugging containers only, never in templates |
NET_RAW |
Use raw sockets | Network diagnostic tools, ping |
Most application containers need zero additional capabilities after dropping all. A Go or Node.js HTTP server listening on port 8080 needs nothing. Only add capabilities when a container fails to start without them, and document why.
4. Resource Limits#
Every container in a validation template must have memory and CPU limits. Without them, a runaway process (a memory leak, an infinite loop, a fork bomb) consumes all host resources and kills every other container. This is especially dangerous when agents are orchestrating multiple validation stacks.
services:
app:
image: myapp:1.4.2
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
reservations:
memory: 256M
cpus: "0.25"Guidelines for setting limits in validation templates:
- PostgreSQL: 512M-1G memory limit, 1.0 CPU. Validation workloads are small.
- Redis: 128M-256M memory limit, 0.5 CPU. Set
maxmemoryinside Redis to match. - Application containers: 256M-512M unless profiling says otherwise.
- Build containers: May need more memory temporarily. Set higher limits but always set them.
The deploy.resources syntax works in docker-compose v3+ and with docker compose (Compose V2). If you see these settings ignored, confirm you are not running the legacy docker-compose v1 binary.
5. Image Security#
Use specific version tags. Never use :latest in a template. A template that worked yesterday can break or become vulnerable today because :latest changed.
# Wrong
image: postgres:latest
# Correct
image: postgres:16.4-alpine
# Best -- digest pinning for critical images
image: postgres@sha256:2c4d952e251...a3b8f7eDigest pinning guarantees you get the exact image bytes. Use it for any image in a security-sensitive template. Get the digest with:
docker inspect --format='{{index .RepoDigests 0}}' postgres:16.4-alpineBefore adding any image to a template, scan it:
trivy image postgres:16.4-alpine
trivy image --severity HIGH,CRITICAL myapp:1.4.2Prefer official images from Docker Hub or verified publishers. If you must use a third-party image, pin the digest and scan it. If the scan shows unfixed critical vulnerabilities, document them as accepted risks or find an alternative image.
6. Network Isolation#
Do not use network_mode: host. It bypasses all Docker network isolation and exposes every listening port on the host.
Create explicit networks. Use an internal network for service-to-service communication and a separate network for services that need external access:
services:
app:
image: myapp:1.4.2
networks:
- frontend
- backend
ports:
- "8080:8080"
postgres:
image: postgres:16.4-alpine
networks:
- backend
redis:
image: redis:7.4-alpine
networks:
- backend
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: trueThe internal: true flag on the backend network means containers on that network cannot reach the internet. PostgreSQL and Redis have no reason to make outbound connections. The app sits on both networks: it can talk to the database and Redis on the backend, and receive external requests on the frontend.
Only expose ports that external clients need. PostgreSQL should not have ports: mapped in production-like templates. For local development where you want to connect a GUI client, use a compose override file.
7. No Docker Socket Mounting#
Mounting /var/run/docker.sock into a container gives that container full control over the Docker daemon. It can create privileged containers, mount the host filesystem, and effectively gain root on the host. Never mount the Docker socket in validation templates.
# Never do this
volumes:
- /var/run/docker.sock:/var/run/docker.sockException: kind and k3d. These tools create Kubernetes clusters inside Docker and legitimately need socket access to manage containers. This is an accepted risk with these mitigations:
- Run kind/k3d containers in a dedicated, isolated network.
- Do not run other workloads on the same Docker daemon during kind/k3d testing if possible.
- Document the socket mount as an accepted risk in the template.
If you need Docker-in-Docker for building images inside a container, use alternatives:
- Sysbox – A container runtime that enables rootless Docker-in-Docker without mounting the socket.
- Rootless Docker – Run the Docker daemon inside the container without host socket access.
- Kaniko / Buildah – Build container images without any Docker daemon at all.
8. Secrets Management#
Never hardcode passwords in docker-compose files. Templates are committed to version control. Hardcoded secrets end up in git history permanently.
Use an .env file that is excluded from version control:
# docker-compose.yml
services:
postgres:
image: postgres:16.4-alpine
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}# .env (listed in .gitignore)
POSTGRES_PASSWORD=local_dev_password_hereFor validation templates, generate random passwords at startup so the template works without any manual configuration:
#!/bin/bash
# init-env.sh -- run before docker compose up
ENV_FILE=".env"
if [ ! -f "$ENV_FILE" ]; then
echo "POSTGRES_PASSWORD=$(openssl rand -base64 24)" > "$ENV_FILE"
echo "REDIS_PASSWORD=$(openssl rand -base64 24)" >> "$ENV_FILE"
echo "APP_SECRET_KEY=$(openssl rand -base64 32)" >> "$ENV_FILE"
echo "Generated $ENV_FILE with random passwords"
fiInclude this script in every template and document in the README that users should run it first, or call it from a Makefile target.
For Docker Swarm mode, use Docker secrets:
services:
postgres:
image: postgres:16.4-alpine
secrets:
- db_password
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
secrets:
db_password:
file: ./secrets/db_password.txt9. Health Checks#
Every service in a validation template must have a health check. Without health checks, depends_on only waits for the container to start, not for the service inside to be ready. Agents that depend on service readiness will encounter race conditions.
# PostgreSQL
healthcheck:
test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
interval: 5s
timeout: 3s
retries: 5
start_period: 10s
# Redis
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
# HTTP service
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/healthz"]
interval: 10s
timeout: 5s
retries: 3
start_period: 15s
# Kubernetes API (for kind/k3d)
healthcheck:
test: ["CMD", "kubectl", "cluster-info"]
interval: 10s
timeout: 5s
retries: 12
start_period: 30sSet start_period for services that take time to initialize. PostgreSQL needs a few seconds for WAL recovery on startup. Kubernetes API servers in kind need 30 seconds or more. Without start_period, the health check starts counting retries immediately and may declare the service unhealthy before it has had time to boot.
Use depends_on with condition: service_healthy to enforce startup ordering:
services:
app:
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy10. Complete Secure Template#
This is the reference template that combines every pattern above. Use it as the starting point for any new validation template.
# docker-compose.yml -- hardened validation template
# Run init-env.sh before first use to generate .env
services:
app:
image: myapp:1.4.2
user: "1000:1000"
read_only: true
tmpfs:
- /tmp:size=64m
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
reservations:
memory: 256M
cpus: "0.25"
networks:
- frontend
- backend
ports:
- "8080:8080"
environment:
DATABASE_URL: postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
REDIS_URL: redis://:${REDIS_PASSWORD}@redis:6379/0
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/healthz"]
interval: 10s
timeout: 5s
retries: 3
start_period: 15s
postgres:
image: postgres:16.4-alpine
user: "999:999"
read_only: true
tmpfs:
- /tmp:size=256m
- /var/run/postgresql:size=1m
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
deploy:
resources:
limits:
memory: 1G
cpus: "1.0"
reservations:
memory: 256M
cpus: "0.25"
networks:
- backend
environment:
POSTGRES_DB: ${POSTGRES_DB:-appdb}
POSTGRES_USER: ${POSTGRES_USER:-appuser}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
interval: 5s
timeout: 3s
retries: 5
start_period: 10s
redis:
image: redis:7.4-alpine
user: "999:999"
read_only: true
tmpfs:
- /tmp:size=32m
- /var/run:size=1m
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
command: >
redis-server
--requirepass ${REDIS_PASSWORD}
--maxmemory 128mb
--maxmemory-policy allkeys-lru
deploy:
resources:
limits:
memory: 256M
cpus: "0.5"
reservations:
memory: 64M
cpus: "0.1"
networks:
- backend
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 5s
timeout: 3s
retries: 5
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true
volumes:
pgdata:Pair this with the init-env.sh script from section 8 and a .gitignore that excludes .env. The template should work with a single command sequence:
./init-env.sh
docker compose up -dEvery container runs as non-root, has a read-only filesystem, drops all capabilities, sets resource limits, uses health checks, communicates over isolated networks, and loads secrets from environment variables. This is the baseline. Templates for specific use cases should start here and add only what they need, documenting any security exceptions and the reason for them.