Browser Automation with agent-browser

Core Workflow

Every browser automation follows this pattern:

Navigate: agent-browser open <url>
Snapshot: agent-browser snapshot -i (get element refs like @e1 , @e2 )
Interact: Use refs to click, fill, select
Re-snapshot: After navigation or DOM changes, get fresh refs

agent-browser open https://example.com/form agent-browser snapshot -i

Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result

Command Chaining

Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

Chain open + wait + snapshot in one call

agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i

Chain multiple interactions

agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3

Navigate and capture

agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png

When to chain: Use && when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).

Essential Commands

Navigation

agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser close # Close browser

Snapshot

agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer) agent-browser snapshot -s "#selector" # Scope to CSS selector

Interaction (use @refs from snapshot)

agent-browser click @e1 # Click element agent-browser click @e1 --new-tab # Click and open in new tab agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser keyboard type "text" # Type at current focus (no selector) agent-browser keyboard inserttext "text" # Insert without key events agent-browser scroll down 500 # Scroll page agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container

Get information

agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title

Wait

agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds

Downloads

agent-browser download @e1 ./file.pdf # Click element to trigger download agent-browser wait --download ./output.zip # Wait for any download to complete agent-browser --download-path ./downloads open <url> # Set default download directory

Capture

agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser screenshot --annotate # Annotated screenshot with numbered element labels agent-browser pdf output.pdf # Save as PDF

Diff (compare page states)

agent-browser diff snapshot # Compare current vs last snapshot agent-browser diff snapshot --baseline before.txt # Compare current vs saved file agent-browser diff screenshot --baseline before.png # Visual pixel diff agent-browser diff url <url1> <url2> # Compare two pages agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy agent-browser diff url <url1> <url2> --selector "#main" # Scope to element

Common Patterns

Form Submission

agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "jane@example.com" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle

Authentication with State Persistence

Login once and save state

agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json

Reuse in future sessions

agent-browser state load auth.json agent-browser open https://app.example.com/dashboard

Data Extraction

agent-browser open https://example.com/products agent-browser snapshot -i agent-browser get text @e5 # Get specific element text agent-browser get text body > page.txt # Get all page text

JSON output for parsing

agent-browser snapshot -i --json agent-browser get text @e1 --json

Ref Lifecycle (Important)

Refs (@e1 , @e2 , etc.) are invalidated when the page changes. Always re-snapshot after:

Clicking links or buttons that navigate
Form submissions
Dynamic content loading (dropdowns, modals)

agent-browser click @e5 # Navigates to new page agent-browser snapshot -i # MUST re-snapshot agent-browser click @e1 # Use new refs

Annotated Screenshots (Vision Mode)

Use --annotate to take a screenshot with numbered labels overlaid on interactive elements.

agent-browser screenshot --annotate

Output includes the image path and a legend:

[1] @e1 button "Submit"

[2] @e2 link "Home"

[3] @e3 textbox "Email"

agent-browser click @e2 # Click using ref from annotated screenshot

JavaScript Evaluation

Simple expressions

agent-browser eval 'document.title'

Complex JS: use --stdin with heredoc (RECOMMENDED)

agent-browser eval --stdin <<'EVALEOF' JSON.stringify( Array.from(document.querySelectorAll("img")) .filter(i => !i.alt) .map(i => ({ src: i.src.split("/").pop(), width: i.width })) ) EVALEOF

Session Management and Cleanup

Always close your browser session when done:

agent-browser close # Close default session agent-browser --session agent1 close # Close specific session

agent-browser

Safety Notice

Copy this and send it to your AI assistant to learn

Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

Chain open + wait + snapshot in one call

Chain multiple interactions

Navigate and capture

Navigation

Snapshot

Interaction (use @refs from snapshot)

Get information

Wait

Downloads

Capture

Diff (compare page states)

Login once and save state

Reuse in future sessions

JSON output for parsing

Output includes the image path and a legend:

[1] @e1 button "Submit"

[2] @e2 link "Home"

[3] @e3 textbox "Email"

Simple expressions

Complex JS: use --stdin with heredoc (RECOMMENDED)

Source Transparency

Related Skills

agent-browser

dogfood

electron

agent-browser