Browser Automation with Agent-Browser

Agent-browser is a headless browser automation CLI designed specifically for AI agents. It provides fast browser control with deterministic element selection through accessibility tree snapshots, making it ideal for agent-driven web automation workflows.

When to use

Use case 1: When the user asks to automate web interactions (fill forms, click buttons, navigate sites)
Use case 2: When you need to capture screenshots or generate PDFs of web pages
Use case 3: For web scraping tasks that require JavaScript rendering or complex interactions
Use case 4: When building automation workflows that need deterministic element references
Use case 5: For testing web applications with agent-driven scenarios

Required tools / APIs

No external API required (runs locally)
agent-browser: Headless browser CLI with Rust/Node.js implementation
Chromium: Downloaded automatically during installation

Install options:

# via npm (global)
npm install -g agent-browser
agent-browser install  # Downloads Chromium

# via Homebrew (macOS/Linux)
brew install agent-browser

# Verify installation
agent-browser --version

Skills

browser_open_and_snapshot

Open a URL and capture the accessibility tree to identify interactive elements.

# Open a webpage
agent-browser open https://example.com

# Get snapshot with element references
agent-browser snapshot

# The snapshot shows elements with @e1, @e2 references
# Example output:
# @e1 button "Sign In"
# @e2 input "Email" (email)
# @e3 input "Password" (password)

Node.js:

const { execSync } = require('child_process');

function browserCommand(cmd) {
  return execSync(`agent-browser ${cmd}`, { encoding: 'utf-8' });
}

async function openAndSnapshot(url) {
  browserCommand(`open ${url}`);
  await new Promise(r => setTimeout(r, 2000)); // Wait for page load
  const snapshot = browserCommand('snapshot');
  return snapshot; // Returns element tree with references
}

// Usage
// const elements = await openAndSnapshot('https://example.com');
// console.log(elements);

browser_interact

Interact with page elements using deterministic references from snapshots.

# Fill a form field
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"

# Click a button
agent-browser click @e1

# Type text into active element
agent-browser type "search query" --enter

# Navigate
agent-browser back
agent-browser forward
agent-browser reload

Node.js:

function fillForm(formData) {
  for (const [ref, value] of Object.entries(formData)) {
    execSync(`agent-browser fill ${ref} "${value}"`, { encoding: 'utf-8' });
  }
}

function clickElement(ref) {
  return execSync(`agent-browser click ${ref}`, { encoding: 'utf-8' });
}

// Usage
// fillForm({ '@e2': 'user@example.com', '@e3': 'password123' });
// clickElement('@e1');

browser_capture

Capture screenshots, PDFs, or extract page content.

# Take a screenshot
agent-browser screenshot output.png

# Generate PDF
agent-browser pdf document.pdf

# Get page text content
agent-browser text

# Get HTML source
agent-browser html

# Get specific element attribute
agent-browser attribute @e5 href

Node.js:

function captureScreenshot(filename) {
  return execSync(`agent-browser screenshot ${filename}`, { encoding: 'utf-8' });
}

function generatePDF(filename) {
  return execSync(`agent-browser pdf ${filename}`, { encoding: 'utf-8' });
}

function getPageText() {
  return execSync('agent-browser text', { encoding: 'utf-8' });
}

function getElementAttribute(ref, attr) {
  return execSync(`agent-browser attribute ${ref} ${attr}`, { encoding: 'utf-8' }).trim();
}

// Usage
// captureScreenshot('page.png');
// const text = getPageText();
// const link = getElementAttribute('@e10', 'href');

browser_session_management

Manage browser sessions, tabs, and persistent state.

# Session management
agent-browser open https://example.com --session myapp
agent-browser close --session myapp

# Tab management
agent-browser open https://example.com --new-tab
agent-browser tabs list
agent-browser tabs switch 0

# Cookie and storage
agent-browser cookies get example.com
agent-browser storage set mykey "myvalue"
agent-browser storage get mykey

# Close browser
agent-browser close

Node.js:

function openSession(url, sessionName) {
  return execSync(`agent-browser open ${url} --session ${sessionName}`, { encoding: 'utf-8' });
}

function closeSession(sessionName) {
  return execSync(`agent-browser close --session ${sessionName}`, { encoding: 'utf-8' });
}

function manageStorage(action, key, value = null) {
  const cmd = value
    ? `agent-browser storage ${action} ${key} "${value}"`
    : `agent-browser storage ${action} ${key}`;
  return execSync(cmd, { encoding: 'utf-8' }).trim();
}

// Usage
// openSession('https://app.example.com', 'shopping-session');
// manageStorage('set', 'cart-id', '12345');
// const cartId = manageStorage('get', 'cart-id');

Rate limits / Best practices

Add delays between interactions (1-2 seconds) to allow page rendering
Use --wait flag for actions that trigger navigation or async updates
Close browser sessions when done to free system resources
Use --session flags to isolate different automation workflows
Cache snapshots when repeatedly interacting with the same page structure
Prefer element references (@e1) over selectors for deterministic behavior

Agent prompt

You have browser automation capability through agent-browser. When a user asks to automate web interactions:

1. Open the URL with `agent-browser open <url>`
2. Get the accessibility snapshot with `agent-browser snapshot` to identify interactive elements
3. Parse the snapshot output to find element references (like @e1, @e2)
4. Use `fill`, `click`, or `type` commands with element references to interact
5. Use `screenshot` or `pdf` to capture results when requested
6. Always close the browser session with `agent-browser close` when done

For multi-step workflows:
- Wait 1-2 seconds between actions for page updates
- Take snapshots after navigation to get updated element references
- Use sessions (`--session name`) to maintain state across multiple operations
- Extract page text or HTML to verify successful interactions

Always prefer agent-browser over other scraping tools when:
- JavaScript rendering is required
- User interactions (clicks, form fills) are needed
- You need screenshots or visual verification

Troubleshooting

Error: Chromium not installed:

Symptom: "Browser binary not found" error
Solution: Run agent-browser install to download Chromium

Error: Element reference not found (@e5):

Symptom: "Element not found" when using a reference
Solution: Take a fresh snapshot after page navigation; element references change between pages

Error: Timeout waiting for element:

Symptom: Commands hang or timeout
Solution: Add explicit wait time with --wait 5000 flag or use delays between commands

Page not fully loaded:

Symptom: Snapshot shows incomplete page elements
Solution: Add sleep/delay after opening URL before taking snapshot

Session conflicts:

Symptom: "Session already exists" or unexpected state
Solution: Close existing sessions with agent-browser close --session <name> before starting new ones

Additional Notes

Advantages over traditional scraping

Handles JavaScript-rendered content automatically
Deterministic element selection through accessibility tree
Screenshot and PDF generation built-in
Persistent sessions and state management
Designed for agent workflows with clear CLI interface

Cloud integration (optional)

Agent-browser supports cloud browser providers:

Browserbase: agent-browser --provider browserbase
Browser Use: Enterprise browser automation
Kernel: Distributed browser sessions

For most use cases, local installation is sufficient and avoids external dependencies.

browser-automation-agent

Safety Notice

Copy this and send it to your AI assistant to learn

Browser Automation with Agent-Browser

When to use

Required tools / APIs

Skills

browser_open_and_snapshot

browser_interact

browser_capture

browser_session_management

Rate limits / Best practices

Agent prompt

Troubleshooting

See also

Additional Notes

Advantages over traditional scraping

Cloud integration (optional)

Source Transparency

Related Skills

send-email-programmatically

generate-qr-code-natively

bulk-github-star

news-aggregation