Cartographer
Named after Inō Tadataka, who spent 17 years walking Japan's coastline to produce its first accurate map. This skill walks your app's UI so you don't have to.
Uses agent-device CLI to snapshot, navigate, and explore iOS/Android app UIs, then formats
results into structured catalogs usable for test automation (XCUITest, Espresso, Appium, etc.).
Prerequisites
agent-deviceinstalled:- npm:
npm install -g agent-device - mise:
mise use -g npm:agent-device
- npm:
- Simulator/emulator running with the target app installed
- App launched and in the desired state (e.g., logged in)
Check readiness:
which agent-device && echo "Ready" || echo "Install: npm install -g agent-device OR mise use -g npm:agent-device"
Discover available devices and apps:
agent-device devices # list simulators/emulators
agent-device apps # list installed apps on running device
Note: open accepts human-readable app names (e.g., Settings, Hulu) as well as bundle IDs / package names.
Three Modes
| Mode | When to use | Time |
|---|---|---|
| Snapshot | "What elements are on this screen?" | ~5 sec |
| Navigate + Snapshot | "Show me the Search tab locators" | ~15 sec |
| Explore | "Map the whole app's screens and navigation" | 2–10 min |
Always start with the simplest mode that answers the user's question.
Mode 1: Snapshot Current Screen
The daily driver. Captures every interactive element on whatever screen is currently visible.
Steps
-
Open session (if not already open):
iOS:
agent-device open com.hulu.plus --platform iosAndroid:
agent-device open com.hulu.plus --platform android --serial <device-serial>If the user hasn't specified a bundle/package name, ask — or use
agent-device appsto list installed apps. Both platforms use reverse-domain identifiers (e.g.,com.hulu.plus). Android uses--serialfor device targeting.openalso accepts human-readable names:agent-device open Hulu --platform ios. If the app is already open, useagent-device open --platform iosto attach. -
Snapshot interactive elements:
agent-device snapshot -i --json-i= interactive only (buttons, cells, fields, switches).--json= structured output. -
Scroll to discover off-screen elements (if element count seems low or the screen is scrollable):
agent-device scroll # scroll down agent-device snapshot -i --json # capture newly visible elementsRepeat until two consecutive snapshots return the same elements.
-
Parse and format into the element catalog format (see
references/output-formats.md). -
Present results as a clean table grouped by element type, AND offer JSON export.
Variants
Full snapshot (all elements, not just interactive):
agent-device snapshot --json
Scoped to a container (e.g., "what's in the tab bar"):
agent-device snapshot -i -s "Tab Bar" --json
Mode 2: Navigate + Snapshot
For when the user knows which screen they want but doesn't want to manually navigate there.
Steps
-
Open session:
iOS:
agent-device open <bundle_id> --platform iosAndroid:
agent-device open <package_name> --platform android --serial <device-serial> -
Navigate to the target screen:
Tab navigation (most common):
agent-device snapshot -i --json # See current state + tab bar refs agent-device press @<tab_ref> # Tap the target tab agent-device snapshot -i --json # Confirm arrivalDeep navigation (screen behind taps):
agent-device snapshot -i --json # See what's tappable agent-device press @<element_ref> # Tap to navigate agent-device snapshot -i --json # See new screenFind and tap by text:
agent-device find "Settings" clickScroll to find off-screen elements:
agent-device scrollintoview "Sign In" agent-device find "Sign In" clickCombine navigate + snapshot with
batchto reduce round-trips:agent-device batch --steps '[ {"command": "press", "positionals": ["@<tab_ref>"]}, {"command": "snapshot", "flags": {"-i": true, "--json": true}} ]' -
Snapshot the target screen and format results.
Navigation tips
- Always re-snapshot after any interaction — refs go stale after UI changes
- For tabs, look for elements with role
[button]inside a[tab-bar]container agent-device backfor standard back navigation
Mode 3: Explore (BFS Navigation Map)
Full app mapping. Systematically discovers screens and maps navigation flows.
Before starting, confirm
- Which app? (bundle ID / package name)
- Starting screen? (usually Home)
- Depth limit? (default: 2)
- Max screens? (default: 15)
- Max elements to tap per screen? (default: 10)
- Time limit? (default: 5 minutes)
Safety Rules — NEVER tap elements matching these
Blocked identifiers (case-insensitive substring):
log_out, logout, sign_out, delete, remove_account,
subscribe, purchase, buy, upgrade, cancel_subscription, deactivate
Blocked labels (case-insensitive substring):
log out, sign out, delete account, subscribe,
purchase, upgrade, cancel subscription
If in doubt, skip the element. Log what was skipped.
Exploration Algorithm
VISITED = {}
EDGES = []
QUEUE = [(start_screen, depth=0)]
while QUEUE not empty AND within limits:
current = QUEUE.pop(0)
if current.depth >= max_depth: skip
1. snapshot -i --json → interactive elements
2. Fingerprint the screen (see below)
3. Store in VISITED
4. Select up to N tappable elements:
- Prioritize WITH identifiers over without
- Prioritize buttons over cells
- Deduplicate: cell_item_0..49 → tap one
- Skip dangerous elements
5. For each selected element:
a. Record pre-tap fingerprint
b. agent-device press @<ref>
c. Wait, then snapshot again
d. Fingerprint new state
Same screen → continue
Known screen → record edge, go back
NEW screen → capture, record edge, enqueue, go back
e. agent-device back
f. Verify return (compare fingerprint)
g. If lost: perform tab reset (see below), then break
Fingerprinting
Use agent-device diff to compare snapshots — no output means the screen is unchanged:
agent-device diff # compare current snapshot to baseline; no output = same screen
For richer fingerprinting when diff alone is insufficient, compare manually:
- Navigation bar title
- Selected tab
- Structural signature (top-level element types + counts)
- Anchor identifiers (distinctive IDs that mark specific screens)
Same fingerprint = same screen. Structural, not content-based — dynamic content won't cause false positives.
Tab Reset
Tab reset = navigate to a known safe state using the home screen or app root:
agent-device home # go to device home screen
agent-device open <bundle_id> --platform ios # re-enter app at root (iOS)
# or
agent-device open <package_name> --platform android --serial <device-serial> # (Android)
Use tab reset when agent-device back fails or leaves the app in an unknown state. If reset also fails, log the stuck state and move on to the next queued element.
After exploration
Export all three artifacts:
element_catalog.jsonnavigation_map.mermaidexploration_summary.txt
For exact schemas, read references/output-formats.md.
Element Roles
When classifying elements from snapshots:
| Role | Meaning | Typical element types |
|---|---|---|
nav | Navigation trigger — tapping likely changes screen | button, cell, link (hittable) |
tab | Tab bar item — top-level navigation | button inside tab-bar |
back | Back/close/dismiss button | button with back/close/dismiss/done in id/label |
input | Text input field | text-field, secure-text-field, search-field |
toggle | State toggle | switch, segmented-control, slider |
scroll | Scrollable container | collection-view, table, scroll-view |
content | Static/informational | static-text, image, non-interactive |
dangerous | Must never tap | matches blocked patterns |
Response Guidelines
- Show identifiers prominently — the user's #1 need is locator strings
- Group by element type — buttons together, cells together, fields together
- Flag elements without identifiers — these are gaps the dev team should fix
- Mode 1 & 2: formatted table in chat + offer JSON export
- Mode 3: always export files (JSON + Mermaid + summary)
- Keep it practical — locators, not lectures
Example response format
## Search Screen — 42 interactive elements
### Buttons (12)
| Identifier | Label | Hittable |
|---|---|---|
| tab_bar_home_button | Home | ✅ |
| tab_bar_search_button | Search | ✅ |
### Text Fields (1)
| Identifier | Label | Placeholder |
|---|---|---|
| search_bar | Search | Search for shows... |
### ⚠️ Elements without identifiers (11)
- [button] label="See All"
- [cell] label="Trending Now"
Error Handling
| Problem | Solution |
|---|---|
agent-device not installed | npm install -g agent-device or mise use -g npm:agent-device |
| No simulator running | xcrun simctl boot "iPhone 16" |
| App not installed | Tell user bundle ID not found; run agent-device apps to check |
| 0 elements returned | App not in foreground — agent-device open <bundle_id> |
agent-device back fails in explore | agent-device home then agent-device open <bundle_id> |
| Session dies | agent-device open <bundle_id> to restart |
When done: agent-device close