Docker Sandbox for Agent Tools
Isolated execution of claude, codex, and other agent tools using Docker Desktop's docker sandbox (v0.11.0+). Uses existing Claude Max and ChatGPT Pro subscriptions — no API key billing.
ADR: ADR-0023
Prerequisites
- Docker Desktop running (OrbStack works)
docker sandbox versionreturns ≥0.11.0- Auth secrets stored in
agent-secrets:claude_setup_token— fromclaude setup-token(1-year token, Max subscription)codex_auth_json— contents of~/.codex/auth.json(ChatGPT Pro subscription)
Quick Reference
# Create a sandbox
docker sandbox create --name my-sandbox claude /path/to/project
# Run a command in it
docker sandbox exec -e "CLAUDE_CODE_OAUTH_TOKEN=..." -w /path/to/project my-sandbox \
claude -p "implement the feature" --output-format text --dangerously-skip-permissions
# List sandboxes
docker sandbox ls
# Remove
docker sandbox rm my-sandbox
Auth Setup (One-Time)
Claude (Max subscription)
Run interactively on the host (needs browser for OAuth):
claude setup-token
This opens a browser, completes OAuth, and prints a token like sk-ant-oat01-.... Valid for 1 year.
Store it:
secrets add claude_setup_token --value "sk-ant-oat01-..."
Use in sandbox:
TOKEN=$(secrets lease claude_setup_token --ttl 1h --raw)
docker sandbox exec -e "CLAUDE_CODE_OAUTH_TOKEN=$TOKEN" my-sandbox claude auth status
# → loggedIn: true, authMethod: oauth_token
Codex (ChatGPT Pro subscription)
Authenticate codex locally (needs browser):
codex # Select "Sign in with ChatGPT", complete OAuth
The auth file at ~/.codex/auth.json is portable (not host-tied). Store it:
secrets add codex_auth_json --value "$(cat ~/.codex/auth.json)"
Inject into sandbox:
AUTH=$(secrets lease codex_auth_json --ttl 1h --raw)
docker sandbox exec my-sandbox bash -c "mkdir -p ~/.codex && cat > ~/.codex/auth.json << 'EOF'
${AUTH}
EOF"
Token Refresh
| Token | Lifetime | Refresh |
|---|---|---|
claude_setup_token | 1 year | Run claude setup-token again, update secret |
codex_auth_json | Until subscription change | Re-run codex login if auth fails, update secret |
Agent Loop Integration
Pre-warm Pattern
Create sandbox(es) at loop start, reuse for all stories, destroy at loop end.
PLANNER (loop start)
├── docker sandbox create --name loop-{loopId}-claude claude {workDir}
├── docker sandbox create --name loop-{loopId}-codex codex {workDir} # if needed
└── inject auth into both
IMPLEMENTOR / TEST-WRITER / REVIEWER (per story)
└── docker sandbox exec -w {workDir} -e CLAUDE_CODE_OAUTH_TOKEN=... loop-{loopId}-{tool} \
{tool command}
# ~90ms overhead, workspace changes visible on host immediately
COMPLETE / CANCEL (loop end)
├── docker sandbox rm loop-{loopId}-claude
└── docker sandbox rm loop-{loopId}-codex
Timing
| Operation | Time |
|---|---|
| Create (cached image) | ~14s |
| Exec (warm sandbox) | ~90ms |
| Stop | ~11s |
| Remove | ~150ms |
Net overhead per loop: ~14s create + ~90ms × N stories = negligible for loops running 5-10 stories at 5-15min each.
Workspace Mount
The workspace is bidirectional — same path on host and in sandbox:
- File created in sandbox → visible on host at same path
- File created on host → visible in sandbox
- Git operations work normally (host sees sandbox changes, sandbox sees host commits)
Sandbox Templates
| Template | Tools Included |
|---|---|
claude | claude 2.1.42, git, node 20, npm |
codex | codex 0.101.0, git, node 20, npm |
Neither includes bun. If bun is needed, use host-mode fallback or install it post-create.
Env Vars
Pass via docker sandbox exec -e:
docker sandbox exec \
-e "CLAUDE_CODE_OAUTH_TOKEN=$TOKEN" \
-e "NODE_ENV=development" \
-w /path/to/project \
my-sandbox \
claude -p "prompt" --output-format text --dangerously-skip-permissions
Network Control
Sandboxes have network access by default. Restrict with proxy rules:
# Allow only API endpoints
docker sandbox network proxy my-sandbox --policy deny
docker sandbox network proxy my-sandbox --allow-host api.anthropic.com
docker sandbox network proxy my-sandbox --allow-host api.openai.com
Fallback to Host Mode
If Docker is unavailable:
# Check availability
docker info >/dev/null 2>&1 || echo "Docker not available"
# Force host mode
export AGENT_LOOP_HOST=1
Saving Custom Templates
If you install additional tools in a sandbox, save it as a template:
# Install tools
docker sandbox exec my-sandbox bash -c 'npm i -g @anthropic-ai/claude-code @openai/codex'
# Save as template
docker sandbox save my-sandbox my-agent-template:v1
# Use the template for future sandboxes
docker sandbox create --name fast-sandbox -t my-agent-template:v1 claude /path/to/project
Implementation in utils.ts
New Functions (ADR-0023)
// Create sandbox for a loop
async function createLoopSandbox(
loopId: string,
tool: "claude" | "codex",
workDir: string
): Promise<string> // returns sandbox name
// Execute command in existing sandbox
async function execInSandbox(
sandboxName: string,
command: string[],
opts: { env?: Record<string, string>; workDir?: string; timeout?: number }
): Promise<{ exitCode: number; output: string }>
// Destroy loop sandbox(es)
async function destroyLoopSandbox(loopId: string): Promise<void>
Replacing spawnTool()
Current spawnTool() in implement.ts checks AGENT_LOOP_HOST and isDockerAvailable(). Update it to:
- Check if sandbox
loop-{loopId}-{tool}exists (created by planner) - If yes →
execInSandbox()with auth env vars - If no → fall back to
spawnToolHost()(current host-mode behavior)
Troubleshooting
"Not logged in" in sandbox
Auth not injected. Check:
docker sandbox exec my-sandbox bash -c 'claude auth status'
docker sandbox exec my-sandbox bash -c 'cat ~/.codex/auth.json | head -3'
Sandbox creation slow
First pull downloads ~500MB image. Subsequent creates use cached image (~14s). Use docker sandbox save to create a pre-configured template.
File not visible between host and sandbox
Only the workspace path is mounted. Files outside the workspace directory are not shared.
"docker sandbox: command not found"
Docker Desktop must be running. Check version: docker sandbox version. Requires Docker Desktop 4.40+ with sandbox extension.