Mobile Automation with agent-device
For agent-driven exploration: use refs. For deterministic replay scripts: use selectors.
Quick start
agent-device open Settings --platform ios
agent-device snapshot -i
agent-device press @e3
agent-device wait text "Camera"
agent-device alert wait 10000
agent-device diff snapshot -i
agent-device fill @e5 "test"
agent-device close
If not installed, run:
npx -y agent-device
Core workflow
- Open app or deep link:
open [app|url] [url](openhandles target selection + boot/activation in the normal flow) - Snapshot:
snapshotto get refs from accessibility tree - Interact using refs (
press @ref,fill @ref "text";clickis an alias ofpress) - Re-snapshot after navigation/UI changes
- Close session when done
Commands
Navigation
agent-device boot # Ensure target is booted/ready without opening app
agent-device boot --platform ios # Boot iOS target
agent-device boot --platform android # Boot Android emulator/device target
agent-device open [app|url] [url] # Boot device/simulator; optionally launch app or deep link URL
agent-device open [app] --relaunch # Terminate app process first, then launch (fresh runtime)
agent-device open [app] --activity com.example/.MainActivity # Android: open specific activity (app targets only)
agent-device open "myapp://home" --platform android # Android deep link
agent-device open "https://example.com" --platform ios # iOS deep link (opens in browser)
agent-device open MyApp "myapp://screen/to" --platform ios # iOS deep link in app context
agent-device close [app] # Close app or just end session
agent-device reinstall <app> <path> # Uninstall + install app in one command
agent-device session list # List active sessions
boot requires either an active session or an explicit selector (--platform, --device, --udid, or --serial).
boot is a fallback, not a regular step: use it when starting a new session only if open cannot find/connect to an available target.
Snapshot (page analysis)
agent-device snapshot # Full XCTest accessibility tree snapshot
agent-device snapshot -i # Interactive elements only (recommended)
agent-device snapshot -c # Compact output
agent-device snapshot -d 3 # Limit depth
agent-device snapshot -s "Camera" # Scope to label/identifier
agent-device snapshot --raw # Raw node output
agent-device diff snapshot # Structural diff against previous session baseline
XCTest is the iOS snapshot engine: fast, complete, and no Accessibility permission required.
Snapshot diff notes:
- First
diff snapshotcall initializes baseline for the current session. - Subsequent
diff snapshotcalls compare current UI to prior baseline and then update baseline. - Use this for compact change tracking between adjacent UI states.
Find (semantic)
agent-device find "Sign In" click
agent-device find text "Sign In" click
agent-device find label "Email" fill "user@example.com"
agent-device find value "Search" type "query"
agent-device find role button click
agent-device find id "com.example:id/login" click
agent-device find "Settings" wait 10000
agent-device find "Settings" exists
Settings helpers
agent-device settings wifi on
agent-device settings wifi off
agent-device settings airplane on
agent-device settings airplane off
agent-device settings location on
agent-device settings location off
agent-device settings faceid match
agent-device settings faceid nonmatch
agent-device settings faceid enroll
agent-device settings faceid unenroll
Note: iOS wifi/airplane toggles status bar indicators, not actual network state.
Airplane off clears status bar overrides.
iOS settings helpers are simulator-only.
Use match/nonmatch as the canonical command values.
Think of them as validate/invalidate outcomes when describing intent.
Logs (token-efficient debugging)
Use the detailed logs workflow reference:
skills/agent-device/references/logs.md
Recommended minimum:
agent-device logs doctor
agent-device logs start
agent-device logs path
App state
agent-device appstate
- Android:
appstatereports live foreground package/activity. - iOS:
appstateis session-scoped and reports the app tracked by the active session on the target device. - For iOS
appstate, ensure a matching session exists (for exampleopen --session <name> --platform ios --device "<name>" <app>).
Interactions (use @refs from snapshot)
agent-device press @e1 # Canonical tap command (`click` is an alias)
agent-device focus @e2
agent-device fill @e2 "text" # Clear then type (Android: verifies value and retries once on mismatch)
agent-device type "text" # Type into focused field without clearing
agent-device press 300 500 # Tap by coordinates
agent-device press 300 500 --count 12 --interval-ms 45
agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2
agent-device press @e1 --count 5 # Repeat taps on the same target
agent-device press @e1 --count 5 --double-tap # Use double-tap gesture per iteration
agent-device swipe 540 1500 540 500 120
agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong
agent-device longpress 300 500 800 # Long press on iOS and Android
agent-device scroll down 0.5
agent-device pinch 2.0 # Zoom in 2x (iOS simulator only)
agent-device pinch 0.5 200 400 # Zoom out at coordinates (iOS simulator only)
agent-device back
agent-device home
agent-device app-switcher
agent-device wait 1000
agent-device wait text "Settings"
agent-device is visible 'id="settings_anchor"' # selector assertions for deterministic checks
agent-device is text 'id="header_title"' "Settings"
agent-device alert get
Get information
agent-device get text @e1
agent-device get attrs @e1
agent-device screenshot out.png
Deterministic replay and updating
agent-device open App --relaunch # Fresh app process restart in the current session
agent-device open App --save-script # Save session script (.ad) on close (default path)
agent-device open App --save-script ./workflows/app-flow.ad # Save to custom file path
agent-device replay ./session.ad # Run deterministic replay from .ad script
agent-device replay -u ./session.ad # Update selector drift and rewrite .ad script in place
replay reads .ad recordings.
--relaunch controls launch semantics; --save-script controls recording. Combine only when both are needed.
--save-script path is a file path; parent directories are created automatically.
For ambiguous bare values, use --save-script=workflow.ad or ./workflow.ad.
Fast batching (JSON steps)
Use batch when an agent already has a known short sequence and wants fewer orchestration round trips.
agent-device batch \
--session sim \
--platform ios \
--udid 00008150-001849640CF8401C \
--steps-file /tmp/batch-steps.json \
--json
Inline JSON works for small payloads:
agent-device batch --steps '[{"command":"open","positionals":["settings"]},{"command":"wait","positionals":["100"]}]'
Step format:
[
{ "command": "open", "positionals": ["settings"], "flags": {} },
{ "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} },
{ "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} },
{ "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} }
]
Batch best practices:
- Batch one screen-local flow at a time.
- Add sync guards (
wait,is exists) after mutating steps (open,click,fill,swipe). - Treat prior refs/snapshot assumptions as stale after UI mutations.
- Prefer
--steps-fileover inline JSON. - Keep batches moderate (about 5-20 steps).
- Use failure context (
step,partialResults) to replan from the failed step.
Stale accessibility tree note:
- Rapid mutations can outrun accessibility tree updates.
- Mitigate with explicit waits and phase splitting (navigate, verify/extract, cleanup).
Trace logs (XCTest)
agent-device trace start # Start trace capture
agent-device trace start ./trace.log # Start trace capture to path
agent-device trace stop # Stop trace capture
agent-device trace stop ./trace.log # Stop and move trace log
Devices and apps
agent-device devices
agent-device apps --platform ios # iOS simulator + iOS device, includes default/system apps
agent-device apps --platform ios --all # explicit include-all (same as default)
agent-device apps --platform ios --user-installed
agent-device apps --platform android # includes default/system apps
agent-device apps --platform android --all # explicit include-all (same as default)
agent-device apps --platform android --user-installed
Best practices
pressis the canonical tap command;clickis an alias with the same behavior.press(andclick) acceptsx y,@ref, and selector targets.press/clicksupport gesture series controls:--count,--interval-ms,--hold-ms,--jitter-px,--double-tap.--double-tapcannot be combined with--hold-msor--jitter-px.swipesupports coordinate + timing controls and repeat patterns:swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern.swipetiming is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid longpress side effects.longpressis coordinate-based and supported on iOS and Android.- Pinch (
pinch <scale> [x y]) is iOS simulator-only; scale > 1 zooms in, < 1 zooms out. - Snapshot refs are the core mechanism for interactive agent flows.
- Use selectors for deterministic replay artifacts and assertions (e.g. in e2e test workflows).
- Prefer
snapshot -ito reduce output size. - Prefer scoped snapshots (
-s "<label>"or-s @ref) for screen-local tasks. - Add
-d <depth>when only upper tree levels matter; avoid full-tree snapshots by default. - Use
diff snapshotafter mutations to detect structural changes with less output than full re-read. - Refresh refs immediately after navigation/modal/list mutations before issuing next ref-targeted action.
- Use
--rawonly for debugging parser/tree edge-cases; avoid it for normal agent loops due to size. - On iOS, snapshots use XCTest and do not require Accessibility permission.
- If XCTest returns 0 nodes (foreground app changed), treat it as an explicit failure and retry the flow/app state.
open <app|url> [url]can be used within an existing session to switch apps or open deep links.open <app>updates session app bundle context;open <app> <url>opens a deep link on iOS.- Use
open <app> --relaunchduring React Native/Fast Refresh debugging when you need a fresh app process without ending the session. - Use
--session <name>for parallel sessions; avoid device contention. - Use
--activity <component>on Android to launch a specific activity (e.g. TV apps with LEANBACK); do not combine with URL opens. - On iOS devices,
http(s)://URLs fall back to Safari automatically; custom scheme URLs require an active app in the session. - iOS physical-device runner requires Xcode signing/provisioning; optional overrides:
AGENT_DEVICE_IOS_TEAM_ID,AGENT_DEVICE_IOS_SIGNING_IDENTITY,AGENT_DEVICE_IOS_PROVISIONING_PROFILE. - Default daemon request timeout is
45000ms. For slow physical-device setup/build, increaseAGENT_DEVICE_DAEMON_TIMEOUT_MS(for example120000). - For daemon startup troubleshooting, follow stale metadata hints for
~/.agent-device/daemon.json/~/.agent-device/daemon.lock. - Use
fillwhen you want clear-then-type semantics. - Use
typewhen you want to append/enter text without clearing. - On Android, prefer
fillfor important fields; it verifies entered text and retries once when IME reorders characters. - If using deterministic replay scripts, use
replay -uduring maintenance runs to update selector drift in replay scripts. Use plainreplayin CI.