agent-browser

Automate headless browser interactions using agent-browser CLI. Use when scraping web pages, automating form submissions, testing web apps, or performing browser automation tasks. Works with element refs (@e1, @e2) optimized for AI agent reasoning.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-browser" with this command: npx skills add ckorhonen/claude-skills/ckorhonen-claude-skills-agent-browser

agent-browser CLI

A headless browser automation CLI designed for AI agents, with fast Rust-based execution and element refs optimized for LLM reasoning.

Overview

agent-browser provides programmatic browser control through a CLI that's purpose-built for AI agent workflows. It uses deterministic element references (@e1, @e2, etc.) from accessibility trees, making it ideal for LLM-based automation where consistent element targeting is critical.

Key Features:

  • Fast Rust-based CLI with Node.js fallback
  • Element refs (@e1, @e2) for stable LLM reasoning
  • Session isolation for parallel browser instances
  • JSON output for programmatic parsing
  • Accessibility tree snapshots for element discovery
  • CDP connection support for existing browser instances

When to Use

  • Automating web interactions (form filling, clicking, navigation)
  • Scraping web content with accessibility tree parsing
  • Testing web applications programmatically
  • Multi-agent scenarios requiring isolated browser sessions
  • Screenshot capture and PDF generation
  • Monitoring web page state changes

Prerequisites

  • Node.js >= 18 or Bun runtime
  • Chromium (installed via agent-browser install)

Installation

# Install globally
bun install -g agent-browser

# Download Chromium
agent-browser install

# Linux with system dependencies
agent-browser install --with-deps

Verify installation:

agent-browser --version

Quick Start

# Navigate to a page
agent-browser open https://example.com

# Get accessibility snapshot with element refs
agent-browser snapshot -i  # -i = interactive elements only

# Click an element by ref
agent-browser click @e2

# Fill a form field
agent-browser fill @e3 "test@example.com"

# Take a screenshot
agent-browser screenshot output.png

Core Workflow

1. Open Page and Snapshot

# Open URL
agent-browser open https://example.com

# Get interactive elements with refs
agent-browser snapshot -i --json

The snapshot returns elements like:

@e1 link "Home"
@e2 textbox "Email"
@e3 button "Submit"

2. Interact Using Refs

# Click by ref
agent-browser click @e2

# Type into focused element
agent-browser type @e2 "hello@example.com"

# Fill (clears first, then types)
agent-browser fill @e2 "hello@example.com"

# Press keyboard key
agent-browser press Enter

3. Verify State

# Check element visibility
agent-browser is visible @e3

# Get element text
agent-browser get text @e1

# Get page URL
agent-browser get url

Command Reference

Navigation

CommandDescription
open <url>Navigate to URL
backGo back in history
forwardGo forward in history
reloadReload current page
closeClose browser

Interactions

CommandDescription
click <ref>Click element
dblclick <ref>Double-click element
type <ref> <text>Type text into element
fill <ref> <text>Clear and fill element
press <key>Press keyboard key (Enter, Tab, Control+a)
hover <ref>Hover over element
focus <ref>Focus element
check <ref>Check checkbox
uncheck <ref>Uncheck checkbox
select <ref> <val>Select dropdown option
scroll <dir> [px]Scroll (up/down/left/right)
wait <ref|ms>Wait for element or milliseconds

Information

CommandDescription
snapshotGet accessibility tree with refs
snapshot -iInteractive elements only
snapshot -cCompact (remove empty elements)
snapshot -d <n>Limit tree depth
get text <ref>Get element text content
get html <ref>Get element HTML
get value <ref>Get input value
get urlGet current page URL
get titleGet page title
screenshot [path]Take screenshot
screenshot --fullFull page screenshot
pdf <path>Save page as PDF

State Checks

CommandDescription
is visible <ref>Check if element is visible
is enabled <ref>Check if element is enabled
is checked <ref>Check if checkbox is checked

Find Elements

# Find by role and click
agent-browser find role button click --name Submit

# Find by text
agent-browser find text "Sign In" click

# Find by label
agent-browser find label "Email" fill "test@example.com"

# Find by placeholder
agent-browser find placeholder "Search..." type "query"

Session Management

Sessions provide isolated browser instances for parallel execution:

# Use named session
agent-browser --session login-flow open https://app.com

# Different session for another task
agent-browser --session checkout open https://app.com/cart

# List active sessions
agent-browser session list

# Environment variable (persistent across commands)
export AGENT_BROWSER_SESSION=my-session
agent-browser open https://example.com

JSON Output

Use --json for machine-readable output:

# Snapshot as JSON
agent-browser snapshot -i --json

# Get element info as JSON
agent-browser get text @e1 --json

# Parse in scripts
agent-browser get url --json | jq -r '.url'

Advanced Features

CDP Connection

Connect to an existing Chrome instance:

# Start Chrome with debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

# Connect agent-browser
agent-browser --cdp 9222 snapshot

Video Recording

# Start recording
agent-browser record start recording.webm https://example.com

# Perform actions...
agent-browser click @e1
agent-browser fill @e2 "test"

# Stop and save
agent-browser record stop

Network Interception

# Block requests to URL pattern
agent-browser network route "**/analytics/*" --abort

# Mock API response
agent-browser network route "**/api/user" --body '{"name": "Test"}'

# View captured requests
agent-browser network requests

Custom Headers

# Set auth headers
agent-browser --headers '{"Authorization": "Bearer token123"}' open https://api.example.com

Browser Extensions

# Load extension
agent-browser --extension /path/to/extension open https://example.com

AI Agent Patterns

Form Automation

#!/bin/bash
# Login automation script

agent-browser open https://app.com/login

# Get form elements
agent-browser snapshot -i --json > elements.json

# Fill credentials
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"

# Submit
agent-browser click @e4

# Wait for navigation
agent-browser wait 2000

# Verify login
agent-browser get url --json | jq -r '.url'

Data Extraction

#!/bin/bash
# Scrape product data

agent-browser open https://shop.com/products

# Get page content
agent-browser snapshot --json > snapshot.json

# Extract specific element text
agent-browser get text '[data-testid="price"]' --json

# Screenshot for verification
agent-browser screenshot products.png

Multi-Page Workflow

#!/bin/bash
SESSION="checkout-flow"

# Step 1: Add to cart
agent-browser --session $SESSION open https://shop.com/product/123
agent-browser --session $SESSION click @add-to-cart-button
agent-browser --session $SESSION wait 1000

# Step 2: Go to checkout
agent-browser --session $SESSION open https://shop.com/checkout
agent-browser --session $SESSION snapshot -i

# Step 3: Fill shipping
agent-browser --session $SESSION fill @shipping-name "John Doe"
agent-browser --session $SESSION fill @shipping-address "123 Main St"

# Cleanup
agent-browser --session $SESSION close

Options Reference

OptionDescription
--session <name>Isolated browser session
--jsonJSON output format
--headedShow browser window (not headless)
--cdp <port>Connect via Chrome DevTools Protocol
--headers <json>HTTP headers for requests
--proxy <url>Proxy server
--executable-path <path>Custom browser executable
--extension <path>Load browser extension
--full, -fFull page screenshot
--debugDebug output

Environment Variables

VariableDescription
AGENT_BROWSER_SESSIONDefault session name
AGENT_BROWSER_EXECUTABLE_PATHCustom browser path
AGENT_BROWSER_STREAM_PORTWebSocket streaming port

Troubleshooting

"Browser not found"

# Install Chromium
agent-browser install

Element ref not found

# Refresh snapshot after page changes
agent-browser snapshot -i

# Check if element is visible
agent-browser is visible @e1

Session conflicts

# List active sessions
agent-browser session list

# Use unique session names
agent-browser --session unique-$(date +%s) open https://example.com

Headless issues

# Run with visible browser for debugging
agent-browser --headed open https://example.com

Timeout waiting for element

# Explicit wait before interaction
agent-browser wait 2000
agent-browser wait @e1  # Wait for element to appear

Best Practices

  • Always snapshot first: Get element refs before interacting
  • Use --json for parsing: Machine-readable output for scripts
  • Session isolation: Use --session for parallel workflows
  • Explicit waits: Add wait commands after navigation/clicks
  • Interactive snapshots: Use -i flag to reduce noise
  • Verify state: Check element visibility before clicking
  • Screenshot on failure: Capture state when automation fails

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

codex-advisor

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

playdate-dev

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

github-actions-templates

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-browser

No summary provided by upstream source.

Repository SourceNeeds Review
134-jezweb