web-scraper

Search the web and read page contents without API keys. Use when you need to search via DuckDuckGo/Brave/Google (multi-page), extract readable text from URLs, browse interactively with a persistent visible browser (with tabs, click, screenshot, text search), download files/PDFs, or dismiss cookie banners. Supports JSON/markdown/text output. Powered by Playwright + Chromium.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "web-scraper" with this command: npx skills add liranudi/openclaw-web-scraper/liranudi-openclaw-web-scraper-web-scraper

Web Scraper

Four scripts, zero API keys. All output is JSON by default.

Dependencies: requests, beautifulsoup4, playwright (with Chromium). Optional: pdfplumber or PyPDF2 for PDF text extraction.

Install: pip install requests beautifulsoup4 playwright && playwright install chromium

1. Search the Web

python3 scripts/google_search.py "query" --pages N --engine ENGINE
  • --engineduckduckgo (default), brave, or google
  • Returns [{title, url, snippet}, ...]

2. Read a Page (one-shot)

python3 scripts/read_page.py "https://url" [--max-chars N] [--visible] [--format json|markdown|text] [--no-dismiss]
  • --formatjson (default), markdown, or text
  • Auto-dismisses cookie consent banners (skip with --no-dismiss)

3. Persistent Browser Session

python3 scripts/browser_session.py open "https://url"              # Open + extract
python3 scripts/browser_session.py navigate "https://other"        # Go to new URL
python3 scripts/browser_session.py extract [--format FMT]          # Re-read page
python3 scripts/browser_session.py screenshot [path] [--full]      # Save screenshot
python3 scripts/browser_session.py click "Submit"                  # Click by text/selector
python3 scripts/browser_session.py search "keyword"                # Search text in page
python3 scripts/browser_session.py tab new "https://url"           # Open new tab
python3 scripts/browser_session.py tab list                        # List all tabs
python3 scripts/browser_session.py tab switch 1                    # Switch to tab index
python3 scripts/browser_session.py tab close [index]               # Close tab
python3 scripts/browser_session.py dismiss-cookies                 # Manually dismiss cookies
python3 scripts/browser_session.py close                           # Close browser
  • Cookie consent auto-dismissed on open/navigate
  • Multiple tabs supported — open, switch, close independently
  • Search returns matching lines with line numbers
  • Extract supports json/markdown/text output

4. Download Files

python3 scripts/download_file.py "https://example.com/doc.pdf" [--output DIR] [--filename NAME]
  • Auto-detects filename from URL/headers
  • PDFs: extracts text if pdfplumber/PyPDF2 installed
  • Returns {status, path, filename, size_bytes, content_type, extracted_text}

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

web-scraper

No summary provided by upstream source.

Repository SourceNeeds Review
General

web-scraper

No summary provided by upstream source.

Repository SourceNeeds Review
General

web-scraping

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

web-scraper

No summary provided by upstream source.

Repository SourceNeeds Review