Terminal Automation with pilotty
CRITICAL: Argument Positioning
All flags (--name, -s, --format, etc.) MUST come BEFORE positional arguments:
# CORRECT - flags before command/arguments
pilotty spawn --name myapp vim file.txt
pilotty key -s myapp Enter
pilotty snapshot -s myapp --format text
# WRONG - flags after command (they get passed to the app, not pilotty!)
pilotty spawn vim file.txt --name myapp # FAILS: --name goes to vim
pilotty key Enter -s myapp # FAILS: -s goes nowhere useful
This is the #1 cause of agent failures. When in doubt: flags first, then command/args.
Quick start
pilotty spawn vim file.txt # Start TUI app in managed session
pilotty wait-for "file.txt" # Wait for app to be ready
pilotty snapshot # Get screen state with UI elements
pilotty key i # Enter insert mode
pilotty type "Hello, World!" # Type text
pilotty key Escape # Exit insert mode
pilotty kill # End session
Core workflow
- Spawn:
pilotty spawn <command>starts the app in a background PTY - Wait:
pilotty wait-for <text>ensures the app is ready - Snapshot:
pilotty snapshotreturns screen state with detected UI elements - Understand: Parse
elements[]to identify buttons, inputs, toggles - Interact: Use keyboard commands (
key,type) to navigate and interact - Re-snapshot: Check
content_hashto detect screen changes
Commands
Session management
pilotty spawn <command> # Start TUI app (e.g., pilotty spawn htop)
pilotty spawn --name myapp <cmd> # Start with custom session name (--name before command)
pilotty kill # Kill default session
pilotty kill -s myapp # Kill specific session
pilotty list-sessions # List all active sessions
pilotty daemon # Manually start daemon (usually auto-starts)
pilotty shutdown # Stop daemon and all sessions
pilotty examples # Show end-to-end workflow example
Screen capture
pilotty snapshot # Full JSON with text content and elements
pilotty snapshot --format compact # JSON without text field
pilotty snapshot --format text # Plain text with cursor indicator
pilotty snapshot -s myapp # Snapshot specific session
# Wait for screen to change (eliminates need for sleep!)
HASH=$(pilotty snapshot | jq '.content_hash')
pilotty key Enter
pilotty snapshot --await-change $HASH # Block until screen changes
pilotty snapshot --await-change $HASH --settle 50 # Wait for 50ms stability
Input
pilotty type "hello" # Type text at cursor
pilotty type -s myapp "text" # Type in specific session
pilotty key Enter # Press Enter
pilotty key Ctrl+C # Send interrupt
pilotty key Escape # Send Escape
pilotty key Tab # Send Tab
pilotty key F1 # Function key
pilotty key Alt+F # Alt combination
pilotty key Up # Arrow key
pilotty key -s myapp Ctrl+S # Key in specific session
# Key sequences (space-separated, sent in order)
pilotty key "Ctrl+X m" # Emacs chord: Ctrl+X then m
pilotty key "Escape : w q Enter" # vim :wq sequence
pilotty key "a b c" --delay 50 # Send a, b, c with 50ms delay
pilotty key -s myapp "Tab Tab Enter" # Sequence in specific session
Interaction
pilotty click 5 10 # Click at row 5, col 10
pilotty click -s myapp 10 20 # Click in specific session
pilotty scroll up # Scroll up 1 line
pilotty scroll down 5 # Scroll down 5 lines
pilotty scroll up 10 -s myapp # Scroll in specific session
Terminal control
pilotty resize 120 40 # Resize terminal to 120 cols x 40 rows
pilotty resize 80 24 -s myapp # Resize specific session
pilotty wait-for "Ready" # Wait for text to appear (30s default)
pilotty wait-for "Error" -r # Wait for regex pattern
pilotty wait-for "Done" -t 5000 # Wait with 5s timeout
pilotty wait-for "~" -s editor # Wait in specific session
Global options
| Option | Description |
|---|---|
-s, --session <name> | Target specific session (default: "default") |
--format <fmt> | Snapshot format: full, compact, text |
-t, --timeout <ms> | Timeout for wait-for and await-change (default: 30000) |
-r, --regex | Treat wait-for pattern as regex |
--name <name> | Session name for spawn command |
--delay <ms> | Delay between keys in a sequence (default: 0, max: 10000) |
--await-change <hash> | Block snapshot until content_hash differs |
--settle <ms> | Wait for screen to be stable for this many ms (default: 0) |
Environment variables
PILOTTY_SESSION="mysession" # Default session name
PILOTTY_SOCKET_DIR="/tmp/pilotty" # Override socket directory
RUST_LOG="debug" # Enable debug logging
Snapshot Output
The snapshot command returns structured JSON with detected UI elements:
{
"snapshot_id": 42,
"size": { "cols": 80, "rows": 24 },
"cursor": { "row": 5, "col": 10, "visible": true },
"text": "Settings:\n [x] Notifications [ ] Dark mode\n [Save] [Cancel]",
"elements": [
{ "kind": "toggle", "row": 1, "col": 2, "width": 3, "text": "[x]", "confidence": 1.0, "checked": true },
{ "kind": "toggle", "row": 1, "col": 20, "width": 3, "text": "[ ]", "confidence": 1.0, "checked": false },
{ "kind": "button", "row": 2, "col": 2, "width": 6, "text": "[Save]", "confidence": 0.8 },
{ "kind": "button", "row": 2, "col": 10, "width": 8, "text": "[Cancel]", "confidence": 0.8 }
],
"content_hash": 12345678901234567890
}
Use --format text for a plain text view with cursor indicator:
--- Terminal 80x24 | Cursor: (5, 10) ---
bash-3.2$ [_]
The [_] shows cursor position. Use the text content to understand screen state and navigate with keyboard commands.
Element Detection
pilotty automatically detects interactive UI elements in terminal applications. Elements provide read-only context to help understand UI structure.
Element Kinds
| Kind | Detection Patterns | Confidence | Fields |
|---|---|---|---|
| toggle | [x], [ ], [*], ☑, ☐ | 1.0 | checked: bool |
| button | Inverse video, [OK], <Cancel>, (Submit) | 1.0 / 0.8 | focused: bool (if true) |
| input | Cursor position, ____ underscores | 1.0 / 0.6 | focused: bool (if true) |
Element Fields
| Field | Type | Description |
|---|---|---|
kind | string | Element type: button, input, or toggle |
row | number | Row position (0-based from top) |
col | number | Column position (0-based from left) |
width | number | Width in terminal cells (CJK chars = 2) |
text | string | Text content of the element |
confidence | number | Detection confidence (0.0-1.0) |
focused | bool | Whether element has focus (only present if true) |
checked | bool | Toggle state (only present for toggles) |
Confidence Levels
| Confidence | Meaning |
|---|---|
| 1.0 | High confidence: Cursor position, inverse video, checkbox patterns |
| 0.8 | Medium confidence: Bracket patterns [OK], <Cancel> |
| 0.6 | Lower confidence: Underscore input fields ____ |
Wait for Screen Changes (Recommended)
Stop guessing sleep durations! Use --await-change to wait for the screen to actually update:
# Capture baseline hash
HASH=$(pilotty snapshot | jq '.content_hash')
# Perform action
pilotty key Enter
# Wait for screen to change (blocks until hash differs)
pilotty snapshot --await-change $HASH
# Or wait for screen to stabilize (for apps that render progressively)
pilotty snapshot --await-change $HASH --settle 100
Flags:
| Flag | Description |
|---|---|
--await-change <HASH> | Block until content_hash differs from this value |
--settle <MS> | After change detected, wait for screen to be stable for MS |
-t, --timeout <MS> | Maximum wait time (default: 30000) |
Why this is better than sleep:
sleep 1is a guess - too short causes race conditions, too long slows automation--await-changewaits exactly as long as needed - no more, no less--settlehandles apps that render progressively (show partial, then complete)
Waiting for Streaming AI Responses
When interacting with AI-powered TUIs (like opencode, etc.) that stream responses, you need a longer --settle time since the screen keeps updating as tokens arrive:
# 1. Capture hash before sending prompt
HASH=$(pilotty snapshot -s myapp | jq -r '.content_hash')
# 2. Type prompt and submit
pilotty type -s myapp "write me a poem about ai agents"
pilotty key -s myapp Enter
# 3. Wait for streaming response to complete
# - Use longer settle (2-3s) since AI apps pause between chunks
# - Extend timeout for long responses (60s+)
pilotty snapshot -s myapp --await-change "$HASH" --settle 3000 -t 60000
# 4. Response may be scrolled - scroll up if needed to see full output
pilotty scroll -s myapp up 10
pilotty snapshot -s myapp --format text
Key parameters for streaming:
--settle 2000-3000: AI responses have pauses between chunks; 2-3 seconds ensures streaming is truly done-t 60000: Extend timeout beyond the 30s default for longer generations- The settle timer resets on each screen change, so it naturally waits until streaming stops
Manual Change Detection
For manual polling (not recommended), use content_hash directly:
# Get initial state
SNAP1=$(pilotty snapshot)
HASH1=$(echo "$SNAP1" | jq -r '.content_hash')
# Perform action
pilotty key Tab
# Check if screen changed
SNAP2=$(pilotty snapshot)
HASH2=$(echo "$SNAP2" | jq -r '.content_hash')
if [ "$HASH1" != "$HASH2" ]; then
echo "Screen changed - re-analyze elements"
fi
Using Elements Effectively
Elements are read-only context for understanding the UI. Use keyboard navigation for reliable interaction:
# 1. Get snapshot to understand UI structure
pilotty snapshot | jq '.elements'
# Output shows toggles (checked/unchecked) and buttons with positions
# 2. Navigate and interact with keyboard (reliable approach)
pilotty key Tab # Move to next element
pilotty key Space # Toggle checkbox
pilotty key Enter # Activate button
# 3. Verify state changed
pilotty snapshot | jq '.elements[] | select(.kind == "toggle")'
Key insight: Use elements to understand WHAT is on screen, use keyboard to interact with it.
Navigation Approach
pilotty uses keyboard-first navigation, just like a human would:
# 1. Take snapshot to see the screen
pilotty snapshot --format text
# 2. Navigate using keyboard
pilotty key Tab # Move to next element
pilotty key Enter # Activate/select
pilotty key Escape # Cancel/back
pilotty key Up # Move up in list/menu
pilotty key Space # Toggle checkbox
# 3. Type text when needed
pilotty type "search term"
pilotty key Enter
# 4. Click at coordinates for mouse-enabled TUIs
pilotty click 5 10 # Click at row 5, col 10
Key insight: Parse the snapshot text and elements to understand what's on screen, then use keyboard commands to navigate. This works reliably across all TUI applications.
Example: Edit file with vim
# 1. Spawn vim
pilotty spawn --name editor vim /tmp/hello.txt
# 2. Wait for vim to load and capture baseline hash
pilotty wait-for -s editor "hello.txt"
HASH=$(pilotty snapshot -s editor | jq '.content_hash')
# 3. Enter insert mode
pilotty key -s editor i
# 4. Type content
pilotty type -s editor "Hello from pilotty!"
# 5. Wait for screen to update, then exit (no sleep needed!)
pilotty snapshot -s editor --await-change $HASH --settle 50
pilotty key -s editor "Escape : w q Enter"
# 6. Verify session ended
pilotty list-sessions
Alternative using individual keys:
pilotty key -s editor Escape
pilotty type -s editor ":wq"
pilotty key -s editor Enter
Example: Dialog checklist interaction
# 1. Spawn dialog checklist (--name before command)
pilotty spawn --name opts dialog --checklist "Select features:" 12 50 4 \
"notifications" "Push notifications" on \
"darkmode" "Dark mode theme" off \
"autosave" "Auto-save documents" on \
"telemetry" "Usage analytics" off
# 2. Wait for dialog to render (use await-change, not sleep!)
pilotty snapshot -s opts --settle 200 # Wait for initial render to stabilize
# 3. Get snapshot and examine elements, capture hash
SNAP=$(pilotty snapshot -s opts)
echo "$SNAP" | jq '.elements[] | select(.kind == "toggle")'
HASH=$(echo "$SNAP" | jq '.content_hash')
# 4. Navigate to "darkmode" and toggle it
pilotty key -s opts Down # Move to second option
pilotty key -s opts Space # Toggle it on
# 5. Wait for change and verify
pilotty snapshot -s opts --await-change $HASH | jq '.elements[] | select(.kind == "toggle") | {text, checked}'
# 6. Confirm selection
pilotty key -s opts Enter
# 7. Clean up
pilotty kill -s opts
Example: Form filling with elements
# 1. Spawn a form application
pilotty spawn --name form my-form-app
# 2. Get snapshot to understand form structure
pilotty snapshot -s form | jq '.elements'
# Shows inputs, toggles, and buttons with positions for click command
# 3. Tab to first input (likely already focused)
pilotty type -s form "myusername"
# 4. Tab to password field
pilotty key -s form Tab
pilotty type -s form "mypassword"
# 5. Tab to remember me and toggle
pilotty key -s form Tab
pilotty key -s form Space
# 6. Tab to Login and activate
pilotty key -s form Tab
pilotty key -s form Enter
# 7. Check result
pilotty snapshot -s form --format text
Example: Monitor with htop
# 1. Spawn htop
pilotty spawn --name monitor htop
# 2. Wait for display
pilotty wait-for -s monitor "CPU"
# 3. Take snapshot to see current state
pilotty snapshot -s monitor --format text
# 4. Send commands
pilotty key -s monitor F9 # Kill menu
pilotty key -s monitor q # Quit
# 5. Kill session
pilotty kill -s monitor
Example: Interact with AI TUI (opencode, etc.)
AI-powered TUIs stream responses, requiring special handling:
# 1. Spawn the AI app
pilotty spawn --name ai opencode
# 2. Wait for the prompt to be ready
pilotty wait-for -s ai "Ask anything" -t 15000
# 3. Capture baseline hash
HASH=$(pilotty snapshot -s ai | jq -r '.content_hash')
# 4. Type prompt and submit
pilotty type -s ai "explain the architecture of this codebase"
pilotty key -s ai Enter
# 5. Wait for streaming response to complete
# - settle=3000: Wait 3s of no changes to ensure streaming is done
# - timeout=60000: Allow up to 60s for long responses
pilotty snapshot -s ai --await-change "$HASH" --settle 3000 -t 60000 --format text
# 6. If response is long and scrolled, scroll up to see full output
pilotty scroll -s ai up 20
pilotty snapshot -s ai --format text
# 7. Clean up
pilotty kill -s ai
Gotchas with AI apps:
- Use
--settle 2000-3000because AI responses pause between chunks - Extend timeout with
-t 60000for complex prompts - Long responses may scroll the terminal; use
scroll upto see the beginning - The settle timer resets on each screen update, so it waits for true completion
Sessions
Each session is isolated with its own:
- PTY (pseudo-terminal)
- Screen buffer
- Child process
# Run multiple apps (--name must come before the command)
pilotty spawn --name monitoring htop
pilotty spawn --name editor vim file.txt
# Target specific session
pilotty snapshot -s monitoring
pilotty key -s editor Ctrl+S
# List all
pilotty list-sessions
# Kill specific
pilotty kill -s editor
The first session spawned without --name is automatically named default.
Important: The
--nameflag must come before the command. Everything after the command is passed as arguments to that command.
Daemon Architecture
pilotty uses a background daemon for session management:
- Auto-start: Daemon starts on first command
- Auto-stop: Shuts down after 5 minutes with no sessions
- Session cleanup: Sessions removed when process exits (within 500ms)
- Shared state: Multiple CLI calls share sessions
You rarely need to manage the daemon manually.
Error Handling
Errors include actionable suggestions:
{
"code": "SESSION_NOT_FOUND",
"message": "Session 'abc123' not found",
"suggestion": "Run 'pilotty list-sessions' to see available sessions"
}
{
"code": "SPAWN_FAILED",
"message": "Failed to spawn process: command not found",
"suggestion": "Check that the command exists and is in PATH"
}
Common Patterns
Reliable action + wait (recommended)
# The pattern: capture hash, act, await change
HASH=$(pilotty snapshot | jq '.content_hash')
pilotty key Enter
pilotty snapshot --await-change $HASH --settle 50
# This replaces fragile patterns like:
# pilotty key Enter && sleep 1 && pilotty snapshot # BAD: guessing
Wait then act
pilotty spawn my-app
pilotty wait-for "Ready" # Ensure app is ready
pilotty snapshot # Then snapshot
Check state before action
pilotty snapshot --format text | grep "Error" # Check for errors
pilotty key Enter # Then proceed
Check for specific element
# Check if the first toggle is checked
pilotty snapshot | jq '.elements[] | select(.kind == "toggle") | {text, checked}' | head -1
# Find element at specific position
pilotty snapshot | jq '.elements[] | select(.row == 5 and .col == 10)'
Retry on timeout
pilotty wait-for "Ready" -t 5000 || {
pilotty snapshot --format text # Check what's on screen
# Adjust approach based on actual state
}
Deep-dive Documentation
For detailed patterns and edge cases, see:
| Reference | Description |
|---|---|
| references/session-management.md | Multi-session patterns, isolation, cleanup |
| references/key-input.md | Complete key combinations reference |
| references/element-detection.md | Detection rules, confidence, patterns |
Ready-to-use Templates
Executable workflow scripts:
| Template | Description |
|---|---|
| templates/vim-workflow.sh | Edit file with vim, save, exit |
| templates/dialog-interaction.sh | Handle dialog/whiptail prompts |
| templates/multi-session.sh | Parallel TUI orchestration |
| templates/element-detection.sh | Element detection demo |
Usage:
./templates/vim-workflow.sh /tmp/myfile.txt "File content here"
./templates/dialog-interaction.sh
./templates/multi-session.sh
./templates/element-detection.sh