docker-pilot

Safe, intelligent Docker container management — fleet status, lifecycle operations, cleanup, compose stacks, troubleshooting, and security hardening. Classifies every command by risk level (READ / RISKY / DESTRUCTIVE) with mandatory confirmation gates. Use when managing Docker containers, images, volumes, networks, compose stacks, or debugging container issues.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "docker-pilot" with this command: npx skills add wahajahmed010/docker-pilot

Docker Pilot 🚢

Safe, intelligent Docker management. Not just a command reference — an operational guide that classifies risk, protects critical services, and formats output for chat.

When to Use

Use when the task involves Docker, Dockerfiles, containers, images, Compose, volumes, networking, debugging, or any container lifecycle operation. This is the default Docker skill — apply it whenever Docker work appears.

Companion Skills

This skill extends the existing ClawHub docker skill (v1.0.4 by ivangdavila). Install both for full coverage:

  • clawhub install docker — Dockerfile patterns, image building, security hardening reference
  • clawhub install docker-pilot — Operational management, safety rails, fleet view, troubleshooting

Safety Architecture ⚠️

Every Docker command is classified by risk level. Follow these rules without exception.

🟢 READ (Safe — Can Always Run)

No side effects. Use freely.

docker ps                                          # Running containers
docker ps -a                                       # All containers (including stopped)
docker ps --format '{{json .}}'                    # JSON output (parseable)
docker images                                       # All images
docker images --filter "dangling=true"             # Dangling images only
docker system df                                   # Disk usage overview
docker system df -v                                # Detailed disk usage
docker logs --tail 50 CONTAINER                     # Recent logs
docker logs --since 1h CONTAINER                    # Last hour of logs
docker inspect CONTAINER                            # Full container config (JSON)
docker stats --no-stream                            # Resource snapshot (not streaming)
docker network ls                                   # List networks
docker network inspect NETWORK                      # Network details
docker volume ls                                    # List volumes
docker volume inspect VOLUME                        # Volume details
docker history IMAGE                                # Image layer history
docker diff CONTAINER                               # Filesystem changes in container
docker port CONTAINER                               # Port mappings
docker top CONTAINER                                # Processes in container
docker events --since 1h                            # Recent daemon events

Parsing tip: Always use --format '{{json .}}' with python3 -m json.tool for structured data. docker inspect returns an array — always index [0].

🟡 RISKY (Modifies State — Show Impact First)

Requires showing the user what will change before executing.

docker stop CONTAINER           # Cuts service — show uptime first
docker start CONTAINER          # Starts stopped container
docker restart CONTAINER        # Brief outage — confirm first
docker pull IMAGE               # Network + disk usage — check free space
docker tag SOURCE TARGET        # Namespace change — confirm intended tag
docker network create/connect   # Topology change — check port conflicts
docker volume create             # Low risk but irreversible mount
docker update --restart=always  # Changes restart behavior — good practice
docker container rename         # May break scripts — check dependencies
docker compose up -d            # Starts/modifies stack — show diff first
docker compose stop             # Stops stack — show what's running
docker compose restart          # Restarts stack — brief outage

Rule: Before any 🟡 command, show:

  1. Current state (what's running, what will be affected)
  2. Expected impact (downtime, resource usage)
  3. Ask for confirmation

🔴 DESTRUCTIVE (Irreversible — Mandatory Confirmation)

NEVER run without:

  1. Showing exactly what will be destroyed
  2. Getting explicit verbal confirmation from the user
  3. No chained destructive commands (docker rm $(docker ps -aq) is FORBIDDEN)
docker rm CONTAINER              # Deletes container — check volumes, networks first
docker rmi IMAGE                 # Deletes image — check dependent containers
docker volume rm VOLUME          # DATA LOSS — show contents, confirm twice
docker system prune              # Removes stopped containers + dangling images
docker system prune -a           # Removes ALL unused images — full audit required
docker system prune --volumes    # Removes unused volumes — DATA LOSS
docker compose down -v           # Destroys volumes — triple confirm
docker network rm NETWORK        # Breaks attached containers — show list
docker rm -f CONTAINER           # Force-remove running container — dangerous
docker exec CONTAINER rm -rf /   # Destructive inside container — catch pattern
docker swarm leave --force       # Dissolves swarm — catastrophic

Confirmation pattern:

⚠️ DESTRUCTIVE OPERATION
Will remove: [list items]
Impact: [data loss / service disruption / etc.]
Type "confirm" to proceed:

🛡️ Protected Services

Some services are critical infrastructure. Never stop, restart, or remove these without explicit override:

# Default protected services (customize per deployment)
protected_services:
  - adguardhome      # DNS for entire network — stopping breaks internet
  - unbound          # DNS resolver
  - nginx            # Reverse proxy — stopping breaks all web services
  - traefik          # Reverse proxy
  - pihole           # DNS/ad-blocking

Rule: Before stopping a protected service, check DNS fallback:

# Verify host has alternative DNS
cat /etc/resolv.conf | grep -v adguard | grep nameserver
# If no fallback — WARN USER: "Stopping this will break DNS resolution"

Fleet Status 📊

The primary interface for understanding what's running. Use this format for all status reports in chat:

Fleet Overview (Telegram-Formatted)

🐳 Docker Fleet — 5 containers

🟢 adguardhome     Up 4 days    43MB   DNS/ad-blocking  [PROTECTED]
🟢 buck-dashboard  Up 8 days    120MB  System dashboard
🟢 verdaccio       Up 21 days   58MB   NPM registry
🟢 mockserver      Up 21 days   42MB   API mocking
🟢 gitbox          Up 21 days   35MB   Git server

📦 Images: 45 total (37 dangling, ~3GB reclaimable)
💾 Disk: 68GB/233GB used (31%)
🔧 Compose: NOT INSTALLED

Commands to Generate Fleet View

# Container status with resource usage
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Image}}'

# Resource usage snapshot
docker stats --no-stream --format 'table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}'

# Image count and dangling
docker images | wc -l
docker images --filter "dangling=true" -q | wc -l

# Disk usage
docker system df

# Check if compose is installed
docker compose version 2>/dev/null || docker-compose version 2>/dev/null || echo "NOT INSTALLED"

Service Map

Map container names to functional roles. Maintain this in a local config:

# ~/.openclaw/workspace/docker-pilot/services.yaml (create if needed)
services:
  adguardhome:
    role: "DNS/ad-blocking"
    critical: true
    protected: true
    port: 53
    network: host
  buck-dashboard:
    role: "System dashboard"
    critical: false
    port: 8080
    network: bridge
  verdaccio:
    role: "NPM registry"
    critical: false
    port: 4873
    network: bridge
  mockserver:
    role: "API mocking"
    critical: false
    port: 1080
    network: bridge
  gitbox:
    role: "Git server"
    critical: false
    port: 8081
    network: bridge

Compose Setup 🔧

If docker compose is not installed, install it first:

# Check current status
docker compose version 2>/dev/null || echo "NOT INSTALLED"

# Install compose plugin (no daemon restart needed)
sudo apt install docker-compose-v2

# Verify
docker compose version

Why compose matters: Without compose, every container is a docker run command with 10+ flags that must be memorized or scripted. Compose gives you declarative, version-controlled, reproducible deployments.


Cleanup Playbook 🧹

Run this when disk usage is high or when docker system df shows bloat.

Step 1: Audit (Always READ first)

# Show what's reclaimable
docker system df

# Dangling images (tagged <none>)
docker images --filter "dangling=true"

# Stopped containers
docker ps --filter "status=exited" --filter "status=created"

# Unused networks
docker network ls --filter "type=custom"

# Unused volumes
docker volume ls --filter "dangling=true"

# Build cache size
docker system df -v | grep "Build Cache"

Step 2: Safe Cleanup (No data loss)

# Remove dangling images (no running container uses them)
docker image prune

# Remove stopped containers
docker container prune

# Remove unused networks
docker network prune

# Remove build cache
docker builder prune

Step 3: Aggressive Cleanup (⚠️ Confirm first)

# Remove ALL unused images (not just dangling)
docker image prune -a
# ⚠️ CONFIRM: "This removes images not used by any running container. Next pull will re-download."

# Remove unused volumes (DATA LOSS RISK)
docker volume prune
# ⚠️ CONFIRM: "This deletes volume data. Show volume contents first."
# Before: docker volume inspect VOLUME_NAME
# Show contents: docker run --rm -v VOLUME_NAME:/mnt alpine ls -la /mnt

# Nuclear option
docker system prune -a --volumes
# ⚠️ DOUBLE CONFIRM: "This removes everything not used by a running container including volumes."

Step 4: Verify

docker system df
docker ps
docker images

Health Checks 🩺

Add Health Checks to Running Containers

# Check if container has a health check
docker inspect --format='{{.Config.Health}}' CONTAINER

# Add health check to existing container (requires recreate)
docker update --health-cmd="curl -f http://localhost:8080/ || exit 1" \
  --health-interval=30s \
  --health-timeout=5s \
  --health-retries=3 \
  CONTAINER

Common Health Check Commands

# HTTP endpoint
curl -f http://localhost:PORT/ || exit 1

# TCP port
nc -z localhost PORT || exit 1

# DNS (for AdGuard)
dig +short google.com @localhost || exit 1

# Process check
pgrep -x PROCESS_NAME || exit 1

Restart Policies

# Set restart policy (prevents manual restart after reboot)
docker update --restart=always CONTAINER

# Check current policy
docker inspect --format='{{.HostConfig.RestartPolicy.Name}}' CONTAINER

# Policies:
#   no          — Never restart (default)
#   on-failure  — Restart only on non-zero exit
#   always      — Always restart, including on daemon start
#   unless-stopped — Always restart except when manually stopped

Log Management 📋

Configure Log Rotation (Prevents Disk Fill)

# Add log limits to existing container (requires recreate)
docker run --log-opt max-size=10m --log-opt max-file=3 ...

# Global daemon config: /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Smart Log Reading

# Last 50 lines
docker logs --tail 50 CONTAINER

# Last hour
docker logs --since 1h CONTAINER

# Follow with timeout (don't leave streaming)
docker logs -f --since 5m CONTAINER &  PID=$! ; sleep 30 ; kill $PID

# Search for errors
docker logs CONTAINER 2>&1 | grep -i "error\|exception\|fail\|fatal" | tail -20

# JSON log format (if container outputs JSON)
docker logs CONTAINER --since 1h | python3 -m json.tool | grep "error"

Troubleshooting Runbooks 🔍

Container Won't Start

# 1. Check exit code
docker inspect --format='{{.State.ExitCode}}' CONTAINER
# Common codes: 0=graceful, 1=app error, 137=OOM killed, 139=segfault, 125=docker error

# 2. Check logs
docker logs --tail 50 CONTAINER

# 3. Check if OOM killed
docker inspect --format='{{.State.OOMKilled}}' CONTAINER

# 4. Check resource limits
docker inspect --format='{{.HostConfig.Memory}}' CONTAINER

# 5. Try interactive debug
docker run --rm -it --entrypoint /bin/sh IMAGE

Port Conflict

# Find what's using a port
ss -tlnp | grep :PORT
# or
lsof -i :PORT

# Check if it's a Docker container
docker ps --filter "publish=PORT"

# Fix: change host port mapping or stop conflicting service

Disk Full

# 1. Check Docker disk usage
docker system df -v

# 2. Check host disk
df -h /var/lib/docker

# 3. Quick reclaim (safe)
docker image prune
docker container prune
docker builder prune

# 4. If still full (confirm first!)
docker image prune -a  # Remove ALL unused images

Image Pull Failure

# 1. Check network
curl -I https://registry-1.docker.io/v2/

# 2. Check auth
docker login

# 3. Check rate limits (Docker Hub)
# Anonymous: 100 pulls/6hr, Authenticated: 200 pulls/6hr

# 4. Try specific digest instead of tag
docker pull image@sha256:DIGEST

Crash Loop

# 1. See restart count
docker inspect --format='{{.RestartCount}}' CONTAINER

# 2. Read crash logs
docker logs --tail 100 CONTAINER

# 3. Common causes:
#    - Missing env vars: look for "required" or "must set" in logs
#    - File permissions: look for "permission denied"
#    - Port conflict: look for "address already in use"
#    - OOM: check docker inspect State.OOMKilled

Network Issues

# Containers can't reach each other
# Default bridge has NO DNS — use custom network
docker network create mynet
docker network connect mynet CONTAINER

# Container can't reach host
# Use host.docker.internal (Docker Desktop) or host IP
# On Linux: add to /etc/docker/daemon.json:
#   {"hosts": ["tcp://0.0.0.0:2375", "unix:///var/run/docker.sock"]}

# DNS not resolving in container
docker exec CONTAINER cat /etc/resolv.conf
docker exec CONTAINER nslookup google.com

Compose Stacks 📦

Creating a Compose File

# docker-compose.yml — declarative, version-controlled, reproducible
version: "3.8"

services:
  app:
    image: myapp:1.0
    restart: unless-stopped
    ports:
      - "8080:8080"
    environment:
      - NODE_ENV=production
    volumes:
      - app-data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "0.5"
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

volumes:
  app-data:

Compose Lifecycle

# Start stack
docker compose up -d

# View stack status
docker compose ps

# View logs
docker compose logs -f --tail 50

# Restart single service
docker compose restart app

# Pull and recreate (update)
docker compose pull && docker compose up -d

# Stop (keep data)
docker compose down

# Stop AND remove volumes (⚠️ DATA LOSS)
docker compose down -v

Compose Traps

  • depends_on waits for container start, NOT service ready — use condition: service_healthy
  • .env file must be next to docker-compose.yml — wrong directory = silently ignored
  • Volume mounts overwrite container files — empty host dir = empty container dir
  • docker compose run does NOT start dependencies
  • YAML anchors don't work across files — use multiple compose files instead

Security Hardening 🔒

Container Security

# Run as non-root (always prefer this)
docker run --user 1000:1000 ...

# Drop all capabilities, add only what's needed
docker run --cap-drop ALL --cap-add NET_BIND_SERVICE ...

# Read-only root filesystem
docker run --read-only --tmpfs /tmp ...

# Resource limits (always set these)
docker run -m 512m --cpus=0.5 ...

# No new privileges
docker run --security-opt=no-new-privileges ...

Image Security

# Pin versions (never use :latest in production)
docker pull nginx:1.25.3-alpine

# Scan for vulnerabilities
docker scout cves IMAGE

# Verify image integrity
docker pull image@sha256:DIGEST

NEVER Do These

  • docker run --privileged — disables ALL security
  • -v /:/host — mounts entire host filesystem
  • --pid=host — can see/kill host processes
  • --network=host on non-DNS containers — unnecessary exposure
  • ❌ Secrets in ENV or ARG — visible in docker inspect and docker history
  • docker rm $(docker ps -aq) — chained destructive command
  • docker system prune -a without audit first

Resource Monitoring 📈

Quick Health Check

# One-liner fleet health
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'

# Resource usage
docker stats --no-stream --format 'table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}'

# Per-container disk usage
docker system df -v

# Host resources
df -h /var/lib/docker
free -h

Alert Thresholds

MetricWarningCriticalAction
Disk usage>80%>90%Run cleanup playbook
Memory>80%>95%Add limits or restart heavy containers
Container restarts>3/hour>10/hourCheck logs, likely crash loop
Dangling images>10>30Run image prune
Log file size>100MB>1GBAdd log rotation

Dockerfile Patterns 📝

Layer Cache Optimization

# ✅ GOOD — requirements rarely change, code changes often
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

# ❌ BAD — invalidates cache on every code change
COPY . .
RUN pip install -r requirements.txt

Multi-Stage Build

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]

Image Size Traps

  • Multi-stage: forgotten --from=builder copies from wrong stage silently
  • COPY . . before RUN npm install = cache invalidated on every code change
  • ADD extracts archives automatically — use COPY unless you need extraction
  • rm -rf /var/lib/apt/lists in separate RUN = space not reclaimed (layers)
  • .git copied = megabytes of bloat — use .dockerignore

ARG vs ENV

  • ARG only available during build, visible in docker history — NEVER for secrets
  • ENV persists at runtime — use for configuration
  • ARG with empty override uses default, not empty string
  • ARG must be re-declared after each FROM in multi-stage

Telegram Formatting Guide 📱

When reporting Docker status in Telegram, use this format:

Fleet Status

🐳 **Docker Fleet** — 5 running

🟢 **adguardhome** — DNS/ad-blocking [PROTECTED]
   Up 4 days · 43MB RAM · :53

🟢 **buck-dashboard** — Dashboard
   Up 8 days · 120MB RAM · :8080

🟢 **verdaccio** — NPM registry
   Up 21 days · 58MB RAM · :4873

🟡 **mockserver** — API mocking
   Up 21 days · 42MB RAM · :1080

🟢 **gitbox** — Git server
   Up 21 days · 35MB RAM · :8081

📦 37 dangling images (3GB reclaimable)
💾 68GB/233GB disk (31%)

Alert Format

⚠️ **Container Alert**

🔴 **mockserver** — Exited (1) 2min ago
Last log: `Connection refused on port 1080`

Restart? (3 restarts in last hour)

Cleanup Report

🧹 **Docker Cleanup**

Removed:
- 12 dangling images (450MB)
- 3 stopped containers
- 1 unused network

Reclaimed: **1.2GB**
Current disk: 62GB/233GB (27%)

Quick Reference Card

TaskCommand
Fleet statusdocker ps --format 'table {{.Names}}\t{{.Status}}'
Resource usagedocker stats --no-stream
Disk usagedocker system df
Container logsdocker logs --tail 50 CONTAINER
Inspect JSONdocker inspect CONTAINER | python3 -m json.tool
Find danglingdocker images --filter "dangling=true" -q | wc -l
Safe cleanupdocker image prune && docker container prune && docker builder prune
Health checkdocker inspect --format='{{.State.Health.Status}}' CONTAINER
Restart policydocker update --restart=always CONTAINER
Compose updocker compose up -d
Compose logsdocker compose logs -f --tail 50

First-Run Setup

When this skill is activated for the first time on a new machine:

  1. Check compose: docker compose version — if missing, install it
  2. Scan fleet: docker ps -a + docker system df — understand current state
  3. Set restart policies: docker update --restart=unless-stopped for all running containers
  4. Configure log rotation: Add max-size/max-file to daemon.json or per-container
  5. Clean up: Run safe cleanup (image prune, container prune, builder prune)
  6. Build service map: Document what each container does
  7. Set up monitoring: Consider a cron to check fleet health periodically

Credits

Built on top of the docker skill by ivangdavila (v1.0.4). This skill adds:

  • 🛡️ Safety architecture (READ/RISKY/DESTRUCTIVE classification with confirmation gates)
  • 📊 Fleet status view with Telegram formatting
  • 🔍 Troubleshooting runbooks (crash loops, disk full, port conflicts, DNS)
  • 🧹 Step-by-step cleanup playbook
  • 🩺 Health check and restart policy configuration
  • 📋 Log management and rotation
  • 🛡️ Protected services list (never stop AdGuard without DNS fallback)
  • 📦 Compose setup guide and lifecycle management
  • 🔒 Security hardening checklist
  • 🚀 First-run setup guide

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

Agent Security Harness

470 executable security tests for AI agent systems — MCP, A2A, L402, x402 wire-protocol testing, decision governance, AIUC-1 compliance, NIST AI 800-2 aligned.

Registry SourceRecently Updated
Security

Network AI

Local Python orchestration skill: multi-agent workflows via shared blackboard file, permission gating, token budget scripts, and persistent project context....

Registry SourceRecently Updated
Security

AgentTrust — Security Scanner for AI Skills

Scan AI skills for malware, injections, data leaks, verify integrity, and check agent wallet reputation without API keys or accounts.

Registry SourceRecently Updated
Security

Mastercard Corp

Mastercard is a global payment network processing $9+ trillion annually, diversifying into cybersecurity, data analytics, and loyalty to reduce swipe fee dep...

Registry SourceRecently Updated
00Profile unavailable