Agent Sandboxing#

An AI agent that can execute code, run shell commands, or call APIs needs a sandbox. Without one, a single bad tool call – whether from a bug, a hallucination, or a prompt injection attack – can read secrets, modify production data, or pivot to other systems. This article is a decision framework for choosing the right sandboxing strategy based on your trust level, threat model, and performance requirements.

The Trust and Blast Radius Matrix#

Before selecting a sandbox technology, answer two questions:

How much do you trust the agent’s inputs? An agent processing vetted internal requests has different risk than one processing arbitrary user input or reading untrusted web content (which could contain prompt injection).

What is the blast radius of a failure? An agent that reads logs has low blast radius. An agent that deploys to production has high blast radius. The sandbox must be proportional to the potential damage.

Trust Level Low Blast Radius High Blast Radius
High trust (internal, vetted) Process-level restrictions Container with capability dropping
Medium trust (authenticated users) Container with network controls Container + read-only filesystem + strict syscall filtering
Low trust (untrusted input) gVisor container Firecracker microVM

This matrix drives the rest of the decisions. Start by placing your agent in the right cell.

Strategy 1: Process-Level Restrictions#

The lightest sandbox. Run the agent’s tool execution in a subprocess with OS-level restrictions. No containers, no VMs.

When to use: High-trust agents with low blast radius. Internal developer tools, code analysis that only reads files, documentation generators.

Implementation:

import resource
import subprocess

def execute_restricted(command: str, timeout: int = 30) -> str:
    """Run a command with process-level restrictions."""
    result = subprocess.run(
        ["sh", "-c", command],
        capture_output=True,
        text=True,
        timeout=timeout,
        env={
            "PATH": "/usr/bin:/bin",  # Restricted PATH
            "HOME": "/tmp/sandbox",
        },
        cwd="/tmp/sandbox",
        # Run as unprivileged user if possible
        user="nobody" if os.getuid() == 0 else None,
    )
    return result.stdout

Set resource limits for the child process:

def set_resource_limits():
    """Called in subprocess via preexec_fn."""
    resource.setrlimit(resource.RLIMIT_AS, (512 * 1024 * 1024, 512 * 1024 * 1024))  # 512MB memory
    resource.setrlimit(resource.RLIMIT_CPU, (30, 30))        # 30 seconds CPU
    resource.setrlimit(resource.RLIMIT_FSIZE, (10 * 1024 * 1024, 10 * 1024 * 1024))  # 10MB file writes
    resource.setrlimit(resource.RLIMIT_NPROC, (50, 50))      # 50 child processes max

Limitations: No filesystem isolation. No network isolation. The process can still read any file the user can read. Provides defense against runaway resource consumption, not against malicious behavior.

Strategy 2: Container-Based Isolation#

The workhorse approach. Run tool execution inside a Docker or OCI container with security restrictions layered on top.

When to use: Medium-trust agents, or any agent that executes arbitrary code. This is the default choice for most production agent systems.

Basic container sandbox:

import subprocess

def execute_in_container(
    command: str,
    image: str = "sandbox:latest",
    timeout: int = 30,
    network: bool = False,
    writable_paths: list[str] | None = None,
) -> str:
    docker_args = [
        "docker", "run", "--rm",
        f"--memory=512m",
        f"--cpus=1",
        "--pids-limit=100",
        "--security-opt=no-new-privileges",
    ]

    # Network isolation
    if not network:
        docker_args.append("--network=none")

    # Read-only filesystem by default
    docker_args.append("--read-only")
    docker_args.extend(["--tmpfs", "/tmp:size=64m,noexec"])

    # Mount specific writable paths if needed
    for path in (writable_paths or []):
        docker_args.extend(["-v", f"{path}:{path}:rw"])

    docker_args.extend([image, "sh", "-c", command])

    result = subprocess.run(
        docker_args,
        capture_output=True,
        text=True,
        timeout=timeout + 5,  # Give Docker a few seconds to start/stop
    )
    return result.stdout

Layering Container Security#

A basic container is not enough. Layer these restrictions based on your trust level.

Capability dropping. Linux capabilities grant specific root-like powers. Drop all of them unless explicitly needed.

docker run --rm \
  --cap-drop=ALL \
  --cap-add=DAC_OVERRIDE \  # Only if the tool needs to read files owned by other users
  sandbox:latest sh -c "$COMMAND"

The default Docker container retains capabilities like NET_RAW, SYS_CHROOT, and MKNOD. Dropping all and adding back only what is needed follows least privilege.

Seccomp profiles. Restrict which system calls the container can make. Docker’s default seccomp profile blocks about 44 syscalls. You can tighten it further.

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_AARCH64"],
  "syscalls": [
    {
      "names": ["read", "write", "open", "close", "stat", "fstat",
                "mmap", "mprotect", "munmap", "brk", "exit_group",
                "execve", "access", "getpid", "getuid"],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
docker run --rm --security-opt seccomp=agent-seccomp.json sandbox:latest ...

AppArmor or SELinux profiles. Add mandatory access control on top of the container’s own isolation. An AppArmor profile can restrict file access to specific paths even within the container.

Pre-Built vs On-Demand Containers#

Pre-built images are faster to start. Build an image with the tools the agent needs (python, kubectl, git) and reuse it across invocations. Startup time is under a second for a cached image.

On-demand images are more secure. Build a fresh image per task with only the specific tools needed. A code analysis task gets python and linters. A deployment task gets kubectl and helm. No unnecessary tools means no unnecessary attack surface.

The practical middle ground: maintain a small set of purpose-built images (5-10) and select the appropriate one based on the task type.

Strategy 3: gVisor (Application Kernel)#

gVisor interposes an application-level kernel between the container and the host kernel. System calls from the container are handled by gVisor’s Sentry component, not the real kernel. This adds a security boundary without the overhead of a full VM.

When to use: Low-trust agents running untrusted code. When container isolation is not sufficient because a kernel exploit could escape the container, but VM overhead is too high.

Configuration with Docker:

# Install gVisor runtime
# Then configure Docker to use it
docker run --rm \
  --runtime=runsc \
  --network=none \
  --memory=512m \
  --read-only \
  sandbox:latest sh -c "$COMMAND"

Performance impact: gVisor adds overhead to system calls (roughly 2-10x slower for syscall-heavy workloads). For compute-bound tasks (running tests, data processing) the impact is minimal. For I/O-heavy tasks (reading many files, network operations) the impact is noticeable.

When gVisor is the right choice: You need container-like deployment simplicity but stronger isolation than namespaces provide. Common for multi-tenant agent platforms where different users’ agents run on shared infrastructure.

Strategy 4: Firecracker MicroVMs#

Firecracker runs lightweight virtual machines with a minimal virtual machine monitor (VMM). Each microVM boots in under 125ms, uses as little as 5MB of memory overhead, and provides full hardware-level isolation via KVM.

When to use: Lowest-trust scenarios with highest blast radius. Agents processing untrusted input that execute code, agents running in multi-tenant environments where one user’s agent must not affect another’s, or any scenario where even a kernel exploit inside the sandbox must not compromise the host.

Architecture:

Host Kernel (KVM)
  |
  +-- Firecracker VMM
       |
       +-- MicroVM 1: Agent A's tool execution
       |     - Own kernel
       |     - Own filesystem (read-only rootfs + overlay)
       |     - No network (or restricted)
       |
       +-- MicroVM 2: Agent B's tool execution
             - Completely isolated from VM 1

Trade-offs: Firecracker requires KVM support on the host (no nested virtualization in most cloud VMs without special configuration). Boot time is fast for a VM (~125ms) but slow compared to Docker container start (~50ms). Memory overhead per VM is 5-30MB depending on the guest kernel.

When to avoid Firecracker: If your agent’s tool calls are frequent and short-lived (sub-second), the VM boot overhead dominates. Use containers or gVisor instead. Firecracker is best for longer-running tasks where the VM amortizes its startup cost.

Network Restrictions#

Network access is the highest-risk capability. An agent with network access can exfiltrate data, access internal services, or call external APIs the operator did not intend.

Level 0: No network. --network=none in Docker. The tool can only work with local data. This is the default for code analysis, file processing, and computation tasks.

Level 1: DNS only. Allow DNS resolution but no actual connections. Useful for tools that need to validate hostnames without making requests.

Level 2: Allow-listed endpoints. Network access restricted to specific IPs or domains. Implement with iptables rules inside the container or with a network proxy.

# Allow only the internal API and block everything else
iptables -A OUTPUT -d 10.0.1.50 -p tcp --dport 443 -j ACCEPT
iptables -A OUTPUT -d 10.0.1.50 -p tcp --dport 80 -j ACCEPT
iptables -A OUTPUT -j DROP

Level 3: Full network through a proxy. All traffic routes through a logging proxy that records every request. The agent has network access, but every connection is auditable.

Choose the most restrictive level that allows the tool to function.

Filesystem Controls#

Read-only root filesystem. The container’s filesystem is immutable. The agent can read installed tools and libraries but cannot modify them or install new ones.

Writable tmpfs only. Mount /tmp as a tmpfs with a size limit. The agent can write temporary files for computation but they disappear when the container exits and cannot fill up disk.

Bind-mount specific directories. If the agent needs to read project files, mount the project directory read-only into the container. Never mount the host’s root filesystem.

docker run --rm \
  --read-only \
  --tmpfs /tmp:size=64m \
  -v /home/user/project:/workspace:ro \
  sandbox:latest sh -c "cd /workspace && run-analysis"

Time Limits#

Every sandbox must enforce a wall-clock time limit. Without one, a misbehaving agent can consume resources indefinitely.

Container-level timeout:

docker run --rm --stop-timeout=5 ...
# Combined with:
timeout 60 docker run ...

In-process timeout:

async def sandboxed_execution(command: str, timeout: int = 60) -> str:
    proc = await asyncio.create_subprocess_exec(
        "docker", "run", "--rm", "--network=none", "sandbox:latest",
        "sh", "-c", command,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )
    try:
        stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
        return stdout.decode()
    except asyncio.TimeoutError:
        proc.kill()
        await proc.wait()
        raise TimeoutError(f"Sandbox execution exceeded {timeout}s limit")

Set aggressive defaults (30-60 seconds for most tool calls) and allow callers to request longer timeouts for known slow operations (test suites, builds). Never allow unlimited execution time.

Decision Summary#

Start with the question: what is the worst thing this agent could do if it were fully compromised?

  • If the worst case is reading some extra files: process-level restrictions are sufficient.
  • If the worst case is modifying local project files: a container with read-only mounts and no network covers it.
  • If the worst case is accessing internal services or other users’ data: containers with gVisor, strict network controls, and capability dropping.
  • If the worst case is compromising the host kernel: Firecracker microVMs.

Every layer of isolation adds latency and operational complexity. Choose the minimum isolation level that makes the worst case acceptable. Then verify it works by actually trying to break out of the sandbox – if you do not test the boundary, you do not have a boundary.