cursor-ide-browser-skills

Cursor IDE Browser Automation

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "cursor-ide-browser-skills" with this command: npx skills add trotsky1997/my-claude-agent-skills/trotsky1997-my-claude-agent-skills-cursor-ide-browser-skills

Cursor IDE Browser Automation

Browser automation tool for Cursor IDE using MCP (Model Context Protocol) server cursor-ide-browser and accessibility snapshots for precise element interaction.

Core Mechanism

Accessibility Snapshot First: Always get a snapshot before interacting with elements. The snapshot provides structured page information with element references (ref ) needed for all interactions.

// Standard workflow browser_navigate(url="https://example.com") browser_snapshot() // Required: Get element references browser_click(element="Button", ref="ref-from-snapshot")

Essential Workflow

  • Navigate to target page

  • Snapshot to get element references (required before any interaction)

  • Convert to Markdown (⭐ Recommended) for easier searching, locating and reading

  • Search with grep in md to find information or locate interactive elements

  • Interact using refs from snapshot

  • Wait for dynamic content if needed

  • Verify with screenshots or console messages

Quick example:

browser_navigate(url="https://example.com") browser_snapshot() // Creates .log file mcp_snapshot-query_convert_to_markdown(file_path="snapshot.log") grep(pattern="button|登录", path="snapshot.md") // Find elements browser_click(element="Login", ref="ref-from-grep-results")

Key Tools

Navigation:

  • browser_navigate(url, position?)

  • Navigate to URL

  • browser_navigate_back()

  • Go back

Page Information:

  • browser_snapshot()

  • Required before interactions - Get accessibility tree with element refs

  • browser_take_screenshot(fullPage?, filename?)

  • Capture visual

  • browser_console_messages()

  • Get console logs

  • browser_network_requests()

  • Get network activity

Element Interaction:

  • browser_click(element, ref, doubleClick?, button?, modifiers?)

  • Click element

  • browser_type(element, ref, text, submit?, slowly?)

  • Type text

  • browser_hover(element, ref)

  • Hover

  • browser_select_option(element, ref, values)

  • Select dropdown

  • browser_press_key(key)

  • Press key (supports PageDown, PageUp, ArrowDown, ArrowUp, Space, End, Home for scrolling)

Synchronization:

  • browser_wait_for(text?, textGone?, time?)
  • Wait for text or time

Tab Management:

  • browser_tabs(action, index?, position?)
  • Manage tabs (list/new/close/select)

Element References

  • element : Human-readable description (for permission confirmation)

  • ref : Technical reference from snapshot (required for interaction)

  • Refs are page-state specific - get a new snapshot after navigation or page changes

Snapshot Files

Snapshots are automatically saved as YAML files:

  • Location: C:\Users{username}.cursor\browser-logs\snapshot-{timestamp}.log

  • Format: YAML accessibility tree with role , ref , name , children

  • Usage: Extract ref values for element interactions

Querying Snapshots

⭐ Recommended Workflow: Convert to Markdown + Grep

Best practice for finding information and locating interactive elements:

  • Get snapshot → Creates .log file

  • Convert to Markdown → More readable format with structured content

  • Use grep → Fast text search across the entire document

  • Extract refs → Use found refs for interactions

// Step 1: Get page snapshot browser_snapshot() // Creates: snapshot-2026-01-10T23-43-30-351Z.log

// Step 2: Convert to Markdown (RECOMMENDED) mcp_snapshot-query_convert_to_markdown( file_path="snapshot-2026-01-10T23-43-30-351Z.log", include_ref=true ) # save to snapshot-2026-01-10T23-43-30-351Z.md

// Step 3: Search with grep (much easier than querying raw YAML) grep(pattern="搜索|button|登录", path="snapshot.md", -i=true) grep(pattern="^\[.\]\(ref-|^\\.\\ `ref-", path="snapshot.md") // Find all links/buttons

// Step 4: Use found refs for interaction browser_click(element="Login button", ref="ref-found-from-grep")

Why this workflow is preferred:

  • ✅ More readable: Markdown format is human-friendly

  • ✅ Faster search: grep is more efficient than parsing YAML

  • ✅ Better context: See surrounding content with -C flag

  • ✅ Easy element discovery: Links and buttons clearly formatted

  • ✅ Preserves refs: All element references included for interaction

Alternative: Direct Query Tools

For programmatic element finding, use snapshot-query MCP tools:

Command line:

browser_snapshot() # Generate snapshot uvx snapshot-query snapshot.log find-name "search" # Find element

MCP tools:

mcp_snapshot-query_find_by_name(file_path="snapshot.log", name="搜索") mcp_snapshot-query_find_by_role(file_path="snapshot.log", role="button") mcp_snapshot-query_find_by_text(file_path="snapshot.log", text="登录") mcp_snapshot-query_find_by_regex(file_path="snapshot.log", pattern="\d+\s*ft", field="name") mcp_snapshot-query_find_by_name_bm25(file_path="snapshot.log", name="search query", top_k=5) mcp_snapshot-query_count_elements(file_path="snapshot.log") mcp_snapshot-query_get_element_path(file_path="snapshot.log", ref="ref-xxx") mcp_snapshot-query_extract_all_refs(file_path="snapshot.log")

Integrated workflow:

browser_snapshot() // Creates snapshot file // Query snapshot to find element ref const result = mcp_snapshot-query_find_by_name(file_path="snapshot.log", name="Login") browser_click(element="Login", ref=result.ref) // Use ref from query

⭐ snapshot-query works with OCR results too:

The snapshot-query tools can process OCR results from fast-paddleocr-mcp . After OCR processing, you get a .snapshot.log file that can be queried just like browser snapshots:

// OCR generates webpage.png.snapshot.log mcp_fast-paddleocr-mcp_ocr_image(image_path="webpage.png", language="ch")

// Query OCR results with snapshot-query mcp_snapshot-query_find_by_text( file_path="webpage.png.snapshot.log", text="8 ft", case_sensitive=false )

// Use regex to find measurements mcp_snapshot-query_find_by_regex( file_path="webpage.png.snapshot.log", pattern="\d+\s*ft|cm|meters?", field="name" )

// Semantic search for better results mcp_snapshot-query_find_by_name_bm25( file_path="webpage.png.snapshot.log", name="height measurement", top_k=5 )

// Convert to Markdown for analysis mcp_snapshot-query_convert_to_markdown( file_path="webpage.png.snapshot.log", include_ref=true )

See references/snapshot-query.md for complete snapshot-query documentation.

Common Patterns

Login flow:

browser_navigate(url="https://example.com/login") browser_snapshot() // Find username input ref from snapshot browser_type(element="Username", ref="ref-username", text="user") // Find password input ref from snapshot browser_type(element="Password", ref="ref-password", text="pass") // Find login button ref from snapshot browser_click(element="Login", ref="ref-login-btn") browser_wait_for(text="Welcome")

Search and extract (with Markdown workflow):

browser_navigate(url="https://www.baidu.com/s?wd=哈梅内伊有几个孩子") browser_snapshot() // Creates snapshot.log // Convert to Markdown for easier searching mcp_snapshot-query_convert_to_markdown( file_path="snapshot.log", include_ref=true ) // Search for information using grep grep(pattern="六名|6个|子女", path="snapshot.md", -i=true, -C=3) // Find interactive elements (links/buttons) grep(pattern="^\[.\]\(ref-|^\\.\\ `ref-", path="snapshot.md") // Click on found link using ref browser_click(element="Article link", ref="ref-45py92vjdrs") browser_wait_for(text="Results") browser_take_screenshot(filename="results.png")

Debug page issues:

browser_snapshot() browser_console_messages() // Check for errors browser_network_requests() // Check failed requests

Scrolling web pages:

browser_press_key("PageDown") // Scroll down one page browser_press_key("PageUp") // Scroll up one page browser_press_key("ArrowDown") // Scroll down line by line browser_press_key("ArrowUp") // Scroll up line by line browser_press_key("Space") // Scroll down one screen browser_press_key("End") // Scroll to bottom browser_press_key("Home") // Scroll to top browser_wait_for(time=1) // Wait after scrolling for content to load

OCR processing with fast-paddleocr-mcp:

// Take screenshot of webpage browser_take_screenshot(filename="webpage.png", fullPage=false)

// Process with OCR (generates .md and .snapshot.log files) mcp_fast-paddleocr-mcp_ocr_image( image_path="webpage.png", language="ch" // Use "ch" for Chinese+English, "en" for English only )

// Query OCR results with snapshot-query mcp_snapshot-query_find_by_text( file_path="webpage.png.snapshot.log", text="tallest", case_sensitive=false )

// Use BM25 semantic search for better results mcp_snapshot-query_find_by_name_bm25( file_path="webpage.png.snapshot.log", name="height tallest person", top_k=5 )

// Convert OCR snapshot to Markdown for easier analysis mcp_snapshot-query_convert_to_markdown( file_path="webpage.png.snapshot.log", include_ref=true )

Cross-verification workflow:

// Navigate to multiple sources for verification browser_navigate(url="https://source1.com/article") browser_snapshot() // Extract information from source 1

browser_navigate(url="https://source2.com/article") browser_snapshot() // Extract information from source 2

// Compare and verify information consistency // Prefer authoritative sources (Wikipedia, official records, etc.)

Important Notes

  • Always snapshot before interaction - Refs are required and page-specific

  • ⭐ Convert to Markdown first - Use convert_to_markdown

  • grep for finding information and elements (much easier than querying raw YAML)
  • Wait for dynamic content - Use browser_wait_for() for async operations

  • Refs expire - Get new snapshot after navigation or page changes

  • Multi-tab support - Use viewId parameter or browser_tabs() to manage tabs

  • Position control - Use position="side" when user mentions side panel

  • OCR limitations - OCR may merge adjacent text (e.g., "otherreliablesourcesccordingtoG"). Key information is usually extracted correctly, but verify important details

  • Cross-verification - For critical information, verify across multiple authoritative sources (Wikipedia, official records, etc.)

  • Tool combination - Combine browser automation + OCR + snapshot-query for comprehensive web content analysis

Best Practices & Lessons Learned

Workflow Optimization

  • Standard workflow: Navigate → Snapshot → Convert to Markdown → Search → Interact

  • OCR workflow: Screenshot → OCR → Query with snapshot-query → Extract information

  • Verification workflow: Multiple sources → Extract → Compare → Verify consistency

Tool Integration

  • Browser + OCR: Use browser_take_screenshot()
  • fast-paddleocr-mcp to extract text from visual content
  • OCR + snapshot-query: OCR generates .snapshot.log files that can be queried with all snapshot-query tools

  • Markdown + grep: Convert snapshots/OCR results to Markdown for easier searching

Key Insights

  • snapshot-query is universal: Works with both browser snapshots and OCR results

  • Markdown conversion is recommended: Much easier to search and read than raw YAML

  • BM25 semantic search: Use find_by_name_bm25() for better relevance when exact matches are unclear

  • Cross-verification: Always verify critical information from multiple authoritative sources

  • OCR accuracy: Works well for key information but may merge adjacent text - verify important details

Detailed Reference

  • Complete tool reference: See references/tools.md for all tools with full parameters

  • Examples and patterns: See references/examples.md for detailed workflows

  • Snapshot file format: See references/snapshot-format.md for YAML structure details

  • Snapshot querying: See references/snapshot-query.md for querying snapshot files

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

pixi

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

uv-python-manager

No summary provided by upstream source.

Repository SourceNeeds Review
Research

deepresearch

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

python-debugging

No summary provided by upstream source.

Repository SourceNeeds Review