Browser Automation
This skill provides comprehensive guidance for browser automation using the browser_subagent tool.
Overview
The browser_subagent enables autonomous browser control with automatic session recording. All interactions are captured as WebP videos saved to the artifacts directory.
Core Capabilities
-
Navigation - Open URLs, navigate pages, handle redirects
-
Interaction - Click, type, scroll, hover, drag-and-drop
-
Extraction - Read DOM content, capture screenshots, scrape data
-
Verification - Test user flows, validate UI changes
-
Recording - Automatic video capture of all sessions
When to Use
Use browser_subagent (vs read_url_content ) when:
-
JavaScript execution is required
-
User interaction is needed (forms, clicks, navigation)
-
Authentication or session state is required
-
Dynamic content loads after page render
-
Visual verification or screenshots are needed
-
Recording demonstrations or tutorials
Use read_url_content for static HTML content where JavaScript isn't needed.
Tool Parameters
TaskName (required)
Human-readable title for the browser task.
-
Should be properly capitalized
-
Example: "Testing Login Flow", "Scraping Product Data"
-
Avoid URLs or technical jargon
Task (required)
Detailed instructions for the browser subagent. Be explicit about:
-
What to do
-
When to stop
-
What information to return
Critical: The subagent is autonomous and one-shot. Provide comprehensive instructions upfront.
RecordingName (required)
Filename for the video recording.
-
All lowercase with underscores
-
Maximum 3 words
-
Describes what the recording contains
-
Example: login_flow_demo , checkout_process
Best Practices
- Clear Task Instructions
Bad: "Check the website"
Task: Go to example.com and check it
Good: Specific, with clear completion criteria
Task: Navigate to https://example.com, wait for the page to fully load, verify that the main heading contains "Welcome", capture a screenshot of the page, then return the page title and the text content of the main heading.
- Return Conditions
Always specify what the subagent should return:
Task: Navigate to the product page at https://shop.example.com/products/123 and extract the following data:
- Product title
- Price
- Availability status
- Number of reviews
Return this information in a structured format when complete.
- Error Handling
Instruct the subagent how to handle failures:
Task: Attempt to log in to https://app.example.com with username "testuser" and password "testpass123". If login succeeds, navigate to the dashboard and return the user's display name. If login fails, capture a screenshot of the error message and return the error text.
- Wait Conditions
Specify wait conditions for dynamic content:
Task: Navigate to https://example.com/search, type "widgets" into the search box, click the search button, and wait until the results list appears (look for element with class "search-results"). Once results load, count the number of result items and return that count.
- Multi-Step Flows
Break down complex flows into clear steps:
Task: Complete the following checkout flow:
- Navigate to https://shop.example.com
- Click "Add to Cart" on the first product
- Click the cart icon in the top right
- Click "Proceed to Checkout"
- Fill in the shipping form with test data
- Capture a screenshot of the order summary
- Return the total price shown on the order summary
Common Patterns
Authentication Testing
TaskName: "Testing User Login" Task: Navigate to https://app.example.com/login, enter "user@example.com" in the email field, enter "password123" in the password field, click the "Sign In" button, wait for navigation to complete. If login succeeds and you see a dashboard, return "Login successful". If there's an error message, return the error text. RecordingName: login_test
Data Scraping
TaskName: "Scraping Product Listings" Task: Navigate to https://shop.example.com/products, wait for all product cards to load, then extract the title and price from each product card. Return a list of products with their titles and prices. If pagination exists, only scrape the first page. RecordingName: product_scrape
Form Submission
TaskName: "Submitting Contact Form" Task: Navigate to https://example.com/contact, fill in the form with:
- Name: "Test User"
- Email: "test@example.com"
- Message: "This is a test message" Then click the submit button and wait for the confirmation message. Return the confirmation message text. RecordingName: contact_form
Screenshot Capture
TaskName: "Capturing Homepage Design" Task: Navigate to https://example.com, wait for complete page load including all images, scroll to show the full page layout, capture a full-page screenshot, and return confirmation that the screenshot was saved. RecordingName: homepage_capture
UI Verification
TaskName: "Verifying Responsive Layout" Task: Navigate to https://example.com, resize the browser window to mobile width (375px), capture a screenshot, then resize to desktop width (1920px), capture another screenshot. Return the dimensions used and confirm both screenshots were captured. RecordingName: responsive_check
Element Selectors
The browser subagent can find elements using:
-
CSS selectors
-
Text content
-
ARIA labels
-
Position/proximity
-
Visual descriptions
Be specific when describing elements:
Good:
-
"Click the blue 'Submit' button at the bottom of the form"
-
"Type into the input field labeled 'Email Address'"
-
"Click the first product card in the grid"
Avoid:
-
"Click the button" (which button?)
-
"Fill in the field" (which field?)
Waiting Strategies
Wait for Navigation
After clicking "Submit", wait for the page to navigate to the success page.
Wait for Elements
Wait until the spinner disappears and the results table is visible.
Wait for Content
Wait until the product count shows a number greater than 0.
Fixed Delays (use sparingly)
Wait 3 seconds for animations to complete.
Recording Best Practices
Naming Convention
-
Use lowercase with underscores
-
Be descriptive but concise
-
Maximum 3 words
-
Examples:
-
login_flow
-
checkout_test
-
nav_demo
-
form_submit
Recording Purpose
Recordings are automatically saved and useful for:
-
Debugging failed automation
-
Demonstrating user flows
-
Documenting test results
-
Creating tutorials
-
Reviewing UI behavior
Advanced Techniques
Session State
The browser maintains state during a single subagent execution:
-
Cookies persist across navigation
-
Login sessions remain active
-
Form data can carry forward
Multiple Tabs
If needed, the subagent can work with multiple tabs:
Task: Open https://example.com in the current tab, then open a new tab and navigate to https://example.com/compare. Switch between tabs to compare data from both pages.
File Downloads
Task: Navigate to https://example.com/downloads, click the "Download Report" button, wait for the download to complete, and return confirmation.
Iframes
Task: Navigate to https://example.com, locate the embedded iframe containing the video player, switch context to that iframe, then click the play button.
Error Recovery
If the browser tool encounters issues:
-
The subagent will report what went wrong
-
Read the error message carefully
-
Adjust your Task instructions
-
Try again with more specific instructions or wait conditions
Common issues:
-
Element not found: Be more specific about element description
-
Timeout: Add explicit wait conditions or increase wait time
-
Navigation failed: Check URL validity, network issues
Performance Tips
-
Be Specific: Clear selectors are faster than vague descriptions
-
Minimize Waits: Only wait when necessary; don't add arbitrary delays
-
Single Purpose: One task per browser_subagent call
-
Return Fast: Return as soon as the required information is collected
Examples
See examples/ directory for complete working examples:
-
examples/login_test.md
-
Authentication flow
-
examples/form_automation.md
-
Form submission
-
examples/data_extraction.md
-
Web scraping
-
examples/ui_testing.md
-
Visual verification
Limitations
-
Each subagent call is independent (no session sharing between calls)
-
Cannot execute arbitrary JavaScript (but can interact with page elements)
-
Video recordings use system resources (keep sessions focused)
-
Some sites may block automation (CAPTCHA, bot detection)
Integration with Workflows
Browser automation pairs well with:
-
Testing workflows: Automate E2E tests
-
Data collection: Scrape and process information
-
Documentation: Record user flows automatically
-
Verification: Validate deployments
When NOT to Use
Avoid browser automation when:
-
Static HTML scraping is sufficient → use read_url_content
-
API endpoints are available → use direct API calls
-
File processing is needed → use file manipulation tools
-
The task requires human judgment (CAPTCHA, visual verification)