scrapling

Use Scrapling to scrape websites with adaptive parsing, Cloudflare bypass, and MCP support. Handles dynamic content, anti-bot detection, and provides clean HTML/JSON output.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "scrapling" with this command: npx skills add nanpaidashi/scrapling-ai

Scrapling Skill

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

USE this skill when:

  • Scrape static or dynamic websites
  • Bypass Cloudflare, captcha, or bot detection
  • Extract structured data (HTML/JSON) from web pages
  • Handle JavaScript-rendered content
  • Get clean HTML without extra scripts/CSS

When NOT to Use

DON'T use this skill when:

  • Simple HTTP requests → use web_fetch
  • Need full browser automation → use browser tool
  • API-based data → use direct API calls
  • Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

  • Default timeout: 10 seconds
  • Auto-detects best output format (html/json/text)
  • Handles dynamic content via headless browser when needed
  • Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

Use the scrapling CLI to scrape websites with adaptive parsing and anti-bot bypass.

When to Use

USE this skill when:

  • Scrape static or dynamic websites
  • Bypass Cloudflare, captcha, or bot detection
  • Extract structured data (HTML/JSON) from web pages
  • Handle JavaScript-rendered content
  • Get clean HTML without extra scripts/CSS

When NOT to Use

DON'T use this skill when:

  • Simple HTTP requests → use web_fetch
  • Need full browser automation → use browser tool
  • API-based data → use direct API calls
  • Local file processing → use file tools

Setup

# Install CLI
pipx install scrapling
scrapling --version

Common Commands

Basic Scrape

# Get clean HTML
scrapling https://example.com -o html

# Get JSON structure
scrapling https://example.com -o json

# Save to file
scrapling https://example.com -o output.html

With Headers/Timeouts

# Custom headers
scrapling https://example.com --headers "User-Agent: Mozilla/5.0"

# Timeout (seconds)
scrapling https://slow-site.com --timeout 30

Extract Specific Elements

# XPath extraction
scrapling https://example.com -e "//div[@class='content']" -o html

# CSS selector
scrapling https://example.com -e "div.content" -o html

JSON Output with Fields

# Extract title, meta description
scrapling https://example.com \
  --fields 'title,meta_description' \
  -o json

MCP Integration

Scrapling supports MCP (Model Context Protocol) for AI agents:

# Start MCP server
scrapling mcp start

Then configure your agent to use the scrape tool via MCP.

Examples

Scrape News Article

scrapling https://example.com/news/article-123 \
  --fields 'title,author,publish_date,content' \
  -o json

Extract Product Data

scrapling https://shop.example.com/products \
  -e "//div[@class='product']" \
  -o html

Handle Cloudflare

# Scrapling auto-bypasses most protections
scrapling https://protected-site.com -o html

Notes

  • Default timeout: 10 seconds
  • Auto-detects best output format (html/json/text)
  • Handles dynamic content via headless browser when needed
  • Rate limit friendly; add delays between requests

JSON Output Format

{
  "title": "Page Title",
  "meta_description": "Description text",
  "content": "<clean HTML>",
  "links": ["http://...", "..."],
  "images": [{"src": "...", "alt": "..."}]
}

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Discord

Use when you need to control Discord from Clawdbot via the discord tool: send messages, react, post or upload stickers, upload emojis, run polls, manage threads/pins/search, fetch permissions or member/role/channel info, or handle moderation actions in Discord DMs or channels.

Registry SourceRecently Updated
33.6K72steipete
Automation

AgentCall

Give your agent real phone numbers for SMS, OTP verification, and voice calls via the AgentCall API.

Registry SourceRecently Updated
Automation

clawbus-skill

clawbus skill marketplace for AI agents. Search, download, install, and activate skills from the clawbus library. Use when the user asks for a capability you...

Registry SourceRecently Updated
Automation

chat2workflow

A design-only workflow designer for the Dify and Coze platforms. Through multi-round conversation, it produces a structured workflow JSON (nodes, edges, vari...

Registry SourceRecently Updated