Agent Browser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Works with: Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini, opencode.

Quick Navigation

Topic Reference

Installation installation.md

Commands commands.md

Refs refs.md

Advanced advanced.md

When to Use

Automating browser tasks in AI agent workflows
Web scraping with AI-friendly output
Testing web applications with LLM agents
Managing multiple browser sessions with isolated auth

Core Concepts

Refs (Element References)

The snapshot command returns an accessibility tree where each element has a unique ref like @e1 , @e2 :

Deterministic - ref points to exact element from snapshot
Fast - no DOM re-query needed
AI-friendly - LLMs can reliably parse and use refs

Architecture

Client-daemon architecture:

Rust CLI - parses commands, communicates with daemon
Daemon - runs the browser automation engine:
Default: Node.js daemon (Playwright)
Native (v0.16.0+): native Rust daemon (direct Chrome DevTools Protocol), enabled via --native , AGENT_BROWSER_NATIVE=1 , or "native": true
Lightpanda (v0.17.0+): Lightpanda engine, selected via --engine lightpanda or AGENT_BROWSER_ENGINE=lightpanda (implies native mode)

Daemon starts automatically and persists between commands. Startup errors are surfaced directly (v0.17.0+).

v0.8.6 improves daemon reliability by cleaning stale socket/PID files and retrying transient connection errors.

Quick Example

Navigate and get snapshot

agent-browser open example.com agent-browser snapshot # Get accessibility tree with refs agent-browser click @e2 # Click by ref from snapshot agent-browser fill @e3 "test@example.com" # Fill input by ref agent-browser get text @e1 # Get text by ref agent-browser screenshot page.png # Save screenshot agent-browser close

AI Workflow Pattern

Optimal workflow for AI agents:

1. Navigate and get snapshot

agent-browser open example.com agent-browser snapshot -i --json # AI parses tree and refs

2. AI identifies target refs from snapshot

3. Execute actions using refs

agent-browser click @e2 agent-browser fill @e3 "input text"

4. Get new snapshot if page changed

agent-browser snapshot -i --json

Headed Mode (Debugging)

agent-browser open example.com --headed

Local File Access (v0.9.1)

agent-browser open file:///path/to/doc.pdf --allow-file-access

Cursor-Aware Snapshots (v0.9.1)

agent-browser snapshot -C agent-browser snapshot --cursor

Session Persistence (v0.10.0)

Automatically save and restore cookies/localStorage across restarts with a named session:

agent-browser --session-name myapp open myapp.com agent-browser --session-name myapp open myapp.com

State management commands:

agent-browser state list agent-browser state show myapp agent-browser state rename myapp myapp-prod agent-browser state clear myapp-prod agent-browser state cleanup

Release Updates (v0.12.0–v0.14.0)

Added keyboard commands for raw keyboard input at the currently focused element (no selector needed).
Added persistent color scheme selection via --color-scheme and AGENT_BROWSER_COLOR_SCHEME .
Improved IPC reliability (EAGAIN/backpressure-aware writes) and lowered default Playwright timeout to 25s (configurable via AGENT_BROWSER_DEFAULT_TIMEOUT ).
Improved CDP reconnection and fixed state load when no browser is running.
Reduced --annotate warning noise when the flag isn’t explicitly passed.

New Tab Clicks (v0.10.0)

agent-browser click @e12 --new-tab

Mobile Safari (iOS)

agent-browser -p ios device list agent-browser -p ios open https://example.com --device "iPhone 15" agent-browser tap 200 400 agent-browser swipe 200 600 200 200 500

JSON Output

Use --json for machine-readable output:

agent-browser snapshot --json agent-browser get text @e1 --json agent-browser is visible @e2 --json

Critical Prohibitions

Do not use CSS/XPath selectors when refs are available (use @e1, @e2, etc.)
Do not forget to close sessions when done
Do not assume element positions without taking a fresh snapshot
Do not use old refs after page navigation or content changes (re-snapshot)

Common Commands

Navigation

agent-browser open <url> agent-browser back / forward / reload agent-browser close

Interaction

agent-browser click <sel> agent-browser click <sel> --new-tab agent-browser fill <sel> <text> agent-browser press <key> agent-browser hover <sel> agent-browser select <sel> <val> agent-browser download <sel> <path> # v0.7+

Info

agent-browser get text <sel> agent-browser get url agent-browser get title agent-browser is visible <sel>

Snapshots & Screenshots

agent-browser snapshot -i --json agent-browser screenshot [path]

Links

Documentation
Changelog
GitHub
npm

agent-browser

Safety Notice

Copy this and send it to your AI assistant to learn

Navigate and get snapshot

1. Navigate and get snapshot

2. AI identifies target refs from snapshot

3. Execute actions using refs

4. Get new snapshot if page changed

Navigation

Interaction

Info

Snapshots & Screenshots

Source Transparency

Related Skills

react-testing-library

social-writer

commits

mantine-dev