scrapling

Web scraping and data extraction using the Python Scrapling library. Use to scrape static HTML pages, JavaScript-rendered pages (Playwright), and anti-bot or Cloudflare-protected sites (stealth browser). Supports CSS selectors, XPath, adaptive DOM relocation so selectors survive site redesigns, session-based scraping with cookie persistence, and outputs to JSON or Markdown. Use when asked to scrape a URL, extract text/links/tables/prices from a webpage, crawl a site, or automate web data collection.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "scrapling" with this command: npx skills add PiyushZinc/scrapling-extract

Scrapling

Extract structured website data with resilient selection patterns, adaptive relocation, and the right Scrapling fetcher mode for each target.

Workflow

  1. Identify target type before writing code:
    • Use Fetcher for static pages and API-like HTML responses.
    • Use DynamicFetcher when JavaScript rendering is required.
    • Use StealthyFetcher when anti-bot protection or browser fingerprinting issues are likely.
  2. Choose output contract first:
    • Return JSON for pipelines/automation.
    • Return Markdown/text for summarization or RAG ingestion.
    • Keep stable field names even if selector strategy changes.
  3. Implement selectors in this order:
    • Start with CSS selectors and pseudo-elements (for example ::text, ::attr(href)).
    • Fall back to XPath for ambiguous DOM structure.
    • Enable adaptive relocation for brittle or changing pages.
  4. Add safety controls:
    • Respect target site terms and legal boundaries.
    • Add timeouts, retries, and explicit error handling.
    • Log status code, URL, and selector misses for debugging.
  5. Validate on at least 2 pages:
    • Test one happy path and one edge case page.
    • Confirm required fields are non-empty.
    • Keep extraction deterministic (no hidden random choices).

Quick Setup

  1. Install base package:
    • pip install scrapling
  2. Install fetchers when browser-based fetching is needed:
    • pip install "scrapling[fetchers]"
    • scrapling install
    • python3 -m playwright install (required for DynamicFetcher and StealthyFetcher)
  3. Install optional extras as needed:
    • pip install "scrapling[shell]" for shell + extract commands
    • pip install "scrapling[ai]" for MCP capabilities

Execution Patterns

Pattern: One-off terminal extraction

Use Scrapling CLI for fastest no-code extraction:

scrapling extract get "https://example.com" content.md --css-selector "main"

Pattern: Python extraction script

Use the bundled helper:

# Static page (default)
python scripts/extract_with_scrapling.py --url "https://example.com" --css "h1::text"

# JavaScript-rendered page
python scripts/extract_with_scrapling.py --url "https://example.com" --fetcher dynamic --css "h1::text"

# Anti-bot protected page
python scripts/extract_with_scrapling.py --url "https://example.com" --fetcher stealthy --css "h1::text"

Pattern: Session-based scraping

Use session classes when cookies/state must persist across requests.

from scrapling.fetchers import FetcherSession

session = FetcherSession()
login_page = session.post("https://example.com/login", data={"user": "...", "pass": "..."})
protected_page = session.get("https://example.com/dashboard")
headline = protected_page.css_first("h1::text")

Use StealthySession or DynamicSession as drop-in replacements for anti-bot or JS-rendered targets.

Pattern: DOM change resilience

Use auto_save=True on initial capture and retry with adaptive selection on later runs when selectors break.

from scrapling.fetchers import Fetcher

# First run: saves DOM snapshot so adaptive relocation can work later
page = Fetcher.auto_match("https://example.com", auto_save=True, disable_adaptive=False)
price = page.css_first(".price::text")

# Later runs: automatically relocates the selector even if the DOM changed
page = Fetcher.auto_match("https://example.com", auto_save=False, disable_adaptive=False)
price = page.css_first(".price::text")

References

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

AutoClaw Browser Automation

Complete browser automation skill with MCP protocol support and Chrome extension

Registry SourceRecently Updated
3640Profile unavailable
Coding

Ghost Browser

Automated Chrome browser using nodriver for AI agent web tasks. Full CLI control with LLM-optimized commands — text-based interaction, markdown output, sessi...

Registry SourceRecently Updated
2740Profile unavailable
Automation

Web Scraping & Data Extraction Engine

Complete web scraping methodology — legal compliance, architecture design, anti-detection, data pipelines, and production operations. Use when building scrap...

Registry SourceRecently Updated
6100Profile unavailable
Automation

AgentGo Cloud Browser

Automates browser interactions using AgentGo's distributed cloud browser cluster via playwright@1.51.0. Use when the user needs to navigate websites, interac...

Registry SourceRecently Updated
1350Profile unavailable