Browser Use

Autonomous browser automation for AI agents. Two tools: agent-browser (CLI Playwright for step-by-step control) and browser-use (Python autonomous agent that decides what to do on pages). Navigate, click, fill forms, scrape data, manage sessions, and run complex multi-step browser tasks.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Browser Use" with this command: npx skills add quentintou/openclaw-skill-browser-use/quentintou-openclaw-skill-browser-use-browser-use

Browser Use — Autonomous Browser Automation

Two complementary tools for browser automation:

ToolBest forHow it works
agent-browserStep-by-step control, scraping, form fillingCLI commands, you drive each action
browser-useComplex autonomous tasksPython agent that decides actions itself

Quick Start

agent-browser (recommended for most tasks)

# Navigate and inspect
agent-browser open "https://example.com"
agent-browser snapshot -i          # Get interactive elements with @refs

# Interact using refs
agent-browser click @e3            # Click element
agent-browser fill @e2 "text"      # Fill input (clears first)
agent-browser press Enter          # Press key

# Extract data
agent-browser get text @e1         # Get element text
agent-browser get attr @e1 href    # Get attribute
agent-browser screenshot /tmp/p.png # Screenshot

# Done
agent-browser close

browser-use (autonomous agent)

# Run a full autonomous browsing task
browser-use-agent "Find the pricing for Notion and compare plans"

The agent will navigate, click, read pages, and return a structured result.

agent-browser — Full Reference

Navigation

agent-browser open <url>           # Navigate to URL
agent-browser back                 # Go back
agent-browser forward              # Go forward
agent-browser reload               # Reload page
agent-browser close                # Close browser

Snapshot (page analysis)

agent-browser snapshot             # Full accessibility tree
agent-browser snapshot -i          # Interactive elements only (recommended)
agent-browser snapshot -c          # Compact output
agent-browser snapshot -d 3        # Limit depth to 3
agent-browser snapshot -s "#main"  # Scope to CSS selector
agent-browser snapshot -i --json   # JSON output for parsing

Interactions (use @refs from snapshot)

agent-browser click @e1            # Click
agent-browser dblclick @e1         # Double-click
agent-browser fill @e2 "text"      # Clear and type (use this for inputs)
agent-browser type @e2 "text"      # Type without clearing
agent-browser press Enter          # Press key
agent-browser press Control+a      # Key combination
agent-browser hover @e1            # Hover
agent-browser check @e1            # Check checkbox
agent-browser uncheck @e1          # Uncheck checkbox
agent-browser select @e1 "value"   # Select dropdown option
agent-browser scroll down 500      # Scroll page
agent-browser scrollintoview @e1   # Scroll element into view
agent-browser drag @e1 @e2         # Drag and drop
agent-browser upload @e1 file.pdf  # Upload files

Extract Data

agent-browser get text @e1         # Get element text
agent-browser get html @e1         # Get innerHTML
agent-browser get value @e1        # Get input value
agent-browser get attr @e1 href    # Get attribute
agent-browser get title            # Page title
agent-browser get url              # Current URL
agent-browser get count ".item"    # Count matching elements

Wait

agent-browser wait @e1             # Wait for element
agent-browser wait 2000            # Wait milliseconds
agent-browser wait --text "Done"   # Wait for text to appear
agent-browser wait --url "/dash"   # Wait for URL pattern
agent-browser wait --load networkidle  # Wait for network idle

Screenshots, PDF & Recording

agent-browser screenshot path.png      # Save screenshot
agent-browser screenshot --full        # Full page screenshot
agent-browser pdf output.pdf           # Save as PDF
agent-browser record start ./demo.webm # Start recording
agent-browser record stop              # Stop and save

Sessions (parallel browsers)

agent-browser --session s1 open "https://site1.com"
agent-browser --session s2 open "https://site2.com"
agent-browser session list

State (persist auth/cookies)

agent-browser state save auth.json     # Save session (cookies, storage)
agent-browser state load auth.json     # Restore session

Cookies & Storage

agent-browser cookies                  # Get all cookies
agent-browser cookies set name value   # Set cookie
agent-browser cookies clear            # Clear cookies
agent-browser storage local            # Get all localStorage
agent-browser storage local set k v    # Set value

Tabs & Frames

agent-browser tab                      # List tabs
agent-browser tab new [url]            # New tab
agent-browser tab 2                    # Switch to tab
agent-browser frame "#iframe"          # Switch to iframe
agent-browser frame main               # Back to main frame

Browser Settings

agent-browser set viewport 1920 1080
agent-browser set device "iPhone 14"
agent-browser set geo 37.7749 -122.4194
agent-browser set offline on
agent-browser set media dark

JavaScript

agent-browser eval "document.title"    # Run JS in page context

browser-use — Autonomous Agent

For complex tasks where you want the agent to figure out the browsing steps:

browser-use-agent "Your task description here"

Custom Script (advanced)

# Run via: /opt/browser-use/bin/python3 script.py
import asyncio, os
from browser_use import Agent, Browser
from langchain_anthropic import ChatAnthropic

async def run():
    browser = Browser()
    llm = ChatAnthropic(
        model='claude-sonnet-4-20250514',
        api_key=os.environ['ANTHROPIC_API_KEY']
    )
    agent = Agent(
        task="Compare pricing on 3 competitor sites",
        llm=llm,
        browser=browser,
    )
    result = await agent.run(max_steps=15)
    await browser.close()
    return result

asyncio.run(run())

You can swap the LLM for any langchain-compatible model (OpenAI, Anthropic, etc).

Standard Workflow

# 1. Open page
agent-browser open "https://example.com"

# 2. Snapshot to see what's on the page
agent-browser snapshot -i

# 3. Interact with elements using @refs from snapshot
agent-browser fill @e1 "search query"
agent-browser click @e2

# 4. Wait for new page to load
agent-browser wait --load networkidle

# 5. Re-snapshot (refs change after navigation!)
agent-browser snapshot -i

# 6. Extract what you need
agent-browser get text @e5

# 7. Close when done
agent-browser close

Important Rules

  1. Always snapshot -i after navigation — refs change on every page load
  2. Use fill not type for inputs — fill clears existing text first
  3. Wait after clicks that trigger navigationwait --load networkidle
  4. Close the browser when doneagent-browser close
  5. Google/Bing block headless browsers (CAPTCHA) — use DuckDuckGo or web_search instead
  6. Save auth state for sites requiring login — state save/load
  7. Use --json when you need machine-parseable output
  8. Use sessions for parallel browsing — --session <name>

Troubleshooting

  • Element not found: Re-run snapshot -i to get current refs
  • Page not loaded: Add wait --load networkidle after navigation
  • CAPTCHA on search engines: Use DuckDuckGo or the web_search tool instead
  • Auth expired: Re-login and state save again
  • Display errors: The install script sets up Xvfb for headless rendering

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

browser-use

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

browser-use

No summary provided by upstream source.

Repository SourceNeeds Review
General

browser-use

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, or extract information from web pages.

Repository Source
80.9K49.7K
browser-use