Mobile Automation with agent-device
For exploration, use snapshot refs. For deterministic replay, use selectors. For structured exploratory QA bug hunts and reporting, use ../dogfood/SKILL.md.
Start Here (Read This First)
Use this skill as a router, not a full manual.
- Pick one mode:
- Normal interaction flow
- Debug/crash flow
- Replay maintenance flow
- Run one canonical flow below.
- Open references only if blocked.
Decision Map
- No target context yet:
devices-> pick target ->open. - Normal UI task:
open->snapshot -i->press/fill->diff snapshot -i->close - Debug/crash:
open <app>->logs clear --restart-> reproduce ->network dump->logs path-> targetedgrep - Replay drift:
replay -u <path>-> verify updated selectors - Remote multi-tenant run: allocate lease -> point client at remote daemon base URL -> run commands with tenant isolation flags -> heartbeat/release lease
- Device-scope isolation run: set iOS simulator set / Android allowlist -> run selectors within scope only
Target Selection Rules
- iOS local QA: use simulators unless the task explicitly requires a physical device.
- iOS local QA in mixed simulator/device environments: run
ensure-simulatorfirst and pass--device,--udid, or--ios-simulator-device-seton later commands. - Android local QA: use
installorreinstallfor.apk/.aabfiles, then relaunch by installed package name. - Android React Native + Metro flows: set runtime hints with
runtime setbeforeopen <package> --relaunch. - In mixed-device environments, always pin the exact target with
--serial,--device,--udid, or an isolation scope. - For session-bound automation runs, prefer a pre-bound session/platform instead of repeating selectors on every command: set
AGENT_DEVICE_SESSION, setAGENT_DEVICE_PLATFORM, and the daemon will enforce the shared lock policy across CLI, typed client, and RPC entry points. - Use
--session-lock reject|strip(orAGENT_DEVICE_SESSION_LOCK) only when you need to override the default reject behavior. Lock mode applies to nestedbatchsteps too.
Canonical Flows
1) Normal Interaction Flow
agent-device open Settings --platform ios
agent-device snapshot -i
agent-device press @e3
agent-device diff snapshot -i
agent-device fill @e5 "test"
agent-device close
1a) Local iOS Simulator QA Flow
agent-device ensure-simulator --platform ios --device "iPhone 16" --boot
agent-device open MyApp --platform ios --device "iPhone 16" --session qa-ios --relaunch
agent-device snapshot -i
agent-device press @e3
agent-device close
Use this when a physical iPhone is also connected and you want deterministic simulator-only automation.
1b) Android React Native + Metro QA Flow
agent-device reinstall MyApp /path/to/app-debug.apk --platform android --serial emulator-5554
agent-device runtime set --session qa-android --platform android --metro-host 10.0.2.2 --metro-port 8081
agent-device open com.example.myapp --platform android --serial emulator-5554 --session qa-android --relaunch
agent-device snapshot -i
agent-device close
Do not use open <apk|aab> --relaunch on Android. Install/reinstall binaries first, then relaunch by package.
1c) Session-Bound Automation Flow
export AGENT_DEVICE_SESSION=qa-ios
export AGENT_DEVICE_PLATFORM=ios
export AGENT_DEVICE_SESSION_LOCK=strip
agent-device open MyApp --relaunch
agent-device snapshot -i
agent-device batch --steps-file /tmp/qa-steps.json --json
agent-device close
Use this for orchestrators that must preserve one bound session/device across many plain CLI calls without a wrapper script. In strip mode, conflicting selectors such as --target, --device, --udid, --serial, and isolation-scope overrides are ignored instead of retargeting the run.
1d) Android Emulator Session-Bound Flow
export AGENT_DEVICE_SESSION=qa-android
export AGENT_DEVICE_PLATFORM=android
agent-device reinstall MyApp /path/to/app-debug.apk --serial emulator-5554
agent-device --session-lock reject open com.example.myapp --relaunch
agent-device snapshot -i
agent-device close --shutdown
Use this when an Android emulator session must stay pinned while an agent or test runner issues plain CLI commands over time.
2) Debug/Crash Flow
agent-device open MyApp --platform ios
agent-device logs clear --restart
agent-device network dump 25
agent-device logs path
Logging is off by default. Enable only for debugging windows.
logs clear --restart requires an active app session (open <app> first).
3) Replay Maintenance Flow
agent-device replay -u ./session.ad
4) Remote Tenant Lease Flow (HTTP JSON-RPC)
# Client points directly at the remote daemon HTTP base URL.
export AGENT_DEVICE_DAEMON_BASE_URL=http://mac-host.example:4310
export AGENT_DEVICE_DAEMON_AUTH_TOKEN=<token>
# Allocate lease
curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
-H "content-type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"jsonrpc":"2.0","id":"alloc-1","method":"agent_device.lease.allocate","params":{"runId":"run-123","tenantId":"acme","ttlMs":60000}}'
# Use lease in tenant-isolated command execution
agent-device \
--tenant acme \
--session-isolation tenant \
--run-id run-123 \
--lease-id <lease-id> \
session list --json
# Heartbeat and release
curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
-H "content-type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"jsonrpc":"2.0","id":"hb-1","method":"agent_device.lease.heartbeat","params":{"leaseId":"<lease-id>","ttlMs":60000}}'
curl -sS "${AGENT_DEVICE_DAEMON_BASE_URL}/rpc" \
-H "content-type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"jsonrpc":"2.0","id":"rel-1","method":"agent_device.lease.release","params":{"leaseId":"<lease-id>"}}'
Notes:
AGENT_DEVICE_DAEMON_BASE_URLmakes the CLI skip local daemon discovery/startup and call the remote HTTP daemon directly.AGENT_DEVICE_DAEMON_AUTH_TOKENis sent in both the JSON-RPC request token and HTTP auth headers.- In remote daemon mode,
--debugdoes not tail a localdaemon.log; inspect logs on the remote host instead.
Command Skeleton (Minimal)
Session and navigation
agent-device devices
agent-device devices --platform ios --ios-simulator-device-set /tmp/tenant-a/simulators
agent-device devices --platform android --android-device-allowlist emulator-5554,device-1234
agent-device ensure-simulator --device "iPhone 16" --ios-simulator-device-set /tmp/tenant-a/simulators
agent-device ensure-simulator --device "iPhone 16" --runtime com.apple.CoreSimulator.SimRuntime.iOS-18-4 --ios-simulator-device-set /tmp/tenant-a/simulators --boot
agent-device open [app|url] [url]
agent-device open [app] --relaunch
agent-device close [app]
agent-device install <app> <path-to-binary>
agent-device reinstall <app> <path-to-binary>
agent-device session list
Use boot only as fallback when open cannot find/connect to a ready target.
If the workspace repeats the same selectors or device/session flags, prefer a checked-in agent-device.json or --config <path> over repeating them inline.
Environment-level defaults follow the same fields via AGENT_DEVICE_* names, so persistent host-specific values belong there rather than in committed project config.
That includes bound-session defaults such as sessionLock / AGENT_DEVICE_SESSION_LOCK when automation should consistently reject or strip conflicting device routing flags.
For Android emulators by AVD name, use boot --platform android --device <avd-name>.
For Android emulators without GUI, add --headless.
Use --target mobile|tv with --platform (required) to pick phone/tablet vs TV targets (AndroidTV/tvOS).
For Android React Native + Metro flows, install or reinstall the APK first, set runtime hints with runtime set, then use open <package> --relaunch; do not use open <apk|aab> --relaunch.
For local iOS QA in mixed simulator/device environments, use ensure-simulator and pass --device or --udid so automation does not attach to a physical device by accident.
For session-bound automation, prefer AGENT_DEVICE_SESSION + AGENT_DEVICE_PLATFORM; that bound-session default now enables lock mode automatically.
Isolation scoping quick reference:
--ios-simulator-device-set <path>scopes iOS simulator discovery + command execution to one simulator set.--android-device-allowlist <serials>scopes Android discovery/selection to comma/space separated serials.- Scope is applied before selectors (
--device,--udid,--serial); out-of-scope selectors fail withDEVICE_NOT_FOUND. - With iOS simulator-set scope enabled, iOS physical devices are not enumerated.
- In bound-session
stripmode, conflicting per-call scope/selectors are ignored and the configured binding is restored for the request. Batch steps still inherit the parent--platformwhen they do not set their own.
Simulator provisioning quick reference:
- Use
ensure-simulatorto create or reuse a named iOS simulator inside a device set before starting a session. --device <name>is required (e.g."iPhone 16 Pro").--runtime <id>pins the runtime; omit to use the newest compatible one.--bootboots it immediately. Returnsudid,device,runtime,ios_simulator_device_set,created,booted.- Idempotent: safe to call repeatedly; reuses an existing matching simulator by default.
TV quick reference:
- AndroidTV:
open/appsuse TV launcher discovery automatically. - TV target selection works on emulators/simulators and connected physical devices (AndroidTV + AppleTV).
- tvOS: runner-driven interactions and snapshots are supported (
snapshot,wait,press,fill,get,scroll,back,home,app-switcher,recordand related selector flows). - tvOS
back/home/app-switchermap to Siri Remote actions (menu,home, double-home) in the runner. - tvOS follows iOS simulator-only command semantics for helpers like
pinch,settings, andpush.
Snapshot and targeting
agent-device snapshot -i
agent-device diff snapshot -i
agent-device find "Sign In" click
agent-device press @e1
agent-device fill @e2 "text"
agent-device is visible 'id="anchor"'
press is canonical tap command; click is an alias.
Utilities
agent-device appstate
agent-device clipboard read
agent-device clipboard write "token"
agent-device keyboard status
agent-device keyboard dismiss
agent-device perf --json
agent-device network dump [limit] [summary|headers|body|all]
agent-device push <bundle|package> <payload.json|inline-json>
agent-device trigger-app-event screenshot_taken '{"source":"qa"}'
agent-device get text @e1
agent-device screenshot out.png
agent-device settings permission grant notifications
agent-device settings permission reset camera
agent-device trace start
agent-device trace stop ./trace.log
Batch (when sequence is already known)
agent-device batch --steps-file /tmp/batch-steps.json --json
Performance Check
- Use
agent-device perf --json(ormetrics --json) afteropen. - For detailed metric semantics, caveats, and interpretation guidance, see references/perf-metrics.md.
Guardrails (High Value Only)
- Re-snapshot after UI mutations (navigation/modal/list changes).
- Prefer
snapshot -i; scope/depth only when needed. - Use refs for discovery, selectors for replay/assertions.
find "<query>" click --jsonreturns{ ref, locator, query, x, y }— all derived from the matched snapshot node. Do not rely on these fields from rawpress/clickresponses for observability; usefindinstead.- Use
fillfor clear-then-type semantics; usetypefor focused append typing. - Use
installfor in-place app upgrades (keep app data when platform permits), andreinstallfor deterministic fresh-state runs. - App binary format support for
install/reinstall: Android.apk/.aab, iOS.app/.ipa. - Android
.aabrequiresbundletoolinPATH, orAGENT_DEVICE_BUNDLETOOL_JAR=<path-to-bundletool-all.jar>withjavainPATH. - Android
.aaboptional: setAGENT_DEVICE_ANDROID_BUNDLETOOL_MODE=<mode>to control bundletoolbuild-apks --mode(default:universal). - iOS
.ipa: extract/install fromPayload/*.app; when multiple app bundles are present,<app>is used as a bundle id/name hint. - iOS
appstateis session-scoped; Androidappstateis live foreground state. iOS responses includedevice_udidandios_simulator_device_setfor isolation verification. - iOS
openresponses includedevice_udidandios_simulator_device_setto confirm which simulator handled the session. - Clipboard helpers:
clipboard read/clipboard write <text>are supported on Android and iOS simulators; iOS physical devices are not supported yet. - Android keyboard helpers:
keyboard status|get|dismissreport keyboard visibility/type and dismiss via keyevent when visible. network dumpis best-effort and parses HTTP(s) entries from the session app log file.- Biometric settings: iOS simulator supports
settings faceid|touchid <match|nonmatch|enroll|unenroll>; Android supportssettings fingerprint <match|nonmatch>where runtime tooling is available. - For AndroidTV/tvOS selection, always pair
--targetwith--platform(ios,android, orapplealias); target-only selection is invalid. pushsimulates notification delivery:- iOS simulator uses APNs-style payload JSON.
- Android uses broadcast action + typed extras (string/boolean/number).
trigger-app-eventrequires app-defined deep-link hooks and URL template configuration (AGENT_DEVICE_APP_EVENT_URL_TEMPLATEor platform-specific variants).trigger-app-eventrequires an active session or explicit selectors (--platform,--device,--udid,--serial); on iOS physical devices, custom-scheme triggers require active app context.- Canonical trigger behavior and caveats are documented in
website/docs/docs/commands.mdunder App event triggers. - Permission settings are app-scoped and require an active session app:
settings permission <grant|deny|reset> <camera|microphone|photos|contacts|notifications> [full|limited] - iOS simulator permission alerts: use
alert waitthenalert accept/dismiss—accept/dismissretry internally for up to 2 s so you do not need manual sleeps. See references/permissions.md. full|limitedmode applies only to iOSphotos; other targets reject mode.- On Android, non-ASCII
fill/typemay require an ADB keyboard IME on some system images; only install IME APKs from trusted sources and verify checksum/signature. - If using
--save-script, prefer explicit path syntax (--save-script=flow.ador./flow.ad). - For tenant-isolated remote runs, always pass
--tenant,--session-isolation tenant,--run-id, and--lease-idtogether. - Use short lease TTLs and heartbeat only while work is active; release leases immediately after run completion/failure.
- Env equivalents for scoped runs:
AGENT_DEVICE_IOS_SIMULATOR_DEVICE_SET(compatIOS_SIMULATOR_DEVICE_SET) andAGENT_DEVICE_ANDROID_DEVICE_ALLOWLIST(compatANDROID_DEVICE_ALLOWLIST). - For explicit remote client mode, prefer
AGENT_DEVICE_DAEMON_BASE_URL/--daemon-base-urlinstead of relying on local daemon metadata or loopback-only ports.
Common Failure Patterns
Failed to access Android app sandbox for /path/app-debug.apk: Android relaunch/runtime-hint flow received an APK path instead of an installed package name. Usereinstallfirst, thenopen <package> --relaunch.mkdir: Needs 1 argumentwhile writingReactNativeDevPrefs.xml: likely an olderagent-devicebuild or stale global install is still using the shell-based Android runtime-hint writer. Verify the exact binary being invoked.Failed to terminate iOS app: the flow may have selected a physical iPhone or an unavailable iOS target. Re-run withensure-simulator, then pin the simulator with--deviceor--udid.
Security and Trust Notes
- Prefer a preinstalled
agent-devicebinary over on-demand package execution. - If install is required, pin an exact version (for example:
npx --yes agent-device@<exact-version> --help). - Signing/provisioning environment variables are optional, sensitive, and only for iOS physical-device setup.
- Logs/artifacts are written under
~/.agent-device; replay scripts write to explicit paths you provide. - For remote daemon mode, prefer
AGENT_DEVICE_DAEMON_SERVER_MODE=http|dualon the host plus client-sideAGENT_DEVICE_DAEMON_BASE_URL, withAGENT_DEVICE_HTTP_AUTH_HOOKand tenant-scoped lease admission where needed. - Keep logging off unless debugging and use least-privilege/isolated environments for autonomous runs.
Common Mistakes
- Mixing debug flow into normal runs (keep logs off unless debugging).
- Continuing to use stale refs after screen transitions.
- Using URL opens with Android
--activity(unsupported combination). - Treating
bootas default first step instead of fallback.