scrape

Extract data from any website using Scrapling's MCP tools. Bypasses Cloudflare, handles dynamic JS-rendered pages, supports CSS selectors to pre-filter content (saves tokens).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "scrape" with this command: npx skills add davekilleen/dex/davekilleen-dex-scrape

scrape

Extract data from any website using Scrapling's MCP tools. Bypasses Cloudflare, handles dynamic JS-rendered pages, supports CSS selectors to pre-filter content (saves tokens).

Usage

/scrape <url> /scrape <url> with selector .article-content /scrape stealth <url> /scrape bulk <url1> <url2> <url3>

When to Use (Tool Selection)

Scenario Tool Why

Simple page, no anti-bot scrapling_get

Fastest. HTTP with browser TLS fingerprint

JS-rendered / SPA content scrapling_fetch

Uses real Chromium browser

Cloudflare / anti-bot protected scrapling_stealthy_fetch

Stealth mode, solves captchas

Multiple pages, same pattern scrapling_bulk_get / scrapling_bulk_fetch

Parallel processing

Implementation

Step 1: Parse User Intent

Determine from the user's request:

  • URL(s) to scrape

  • CSS selector (if specified — e.g., .main-content , #article , table.data )

  • Stealth needed? (Cloudflare sites, login-protected, anti-bot mentions)

  • Bulk? (multiple URLs)

Step 2: Choose the Right Scrapling MCP Tool

Default path (try in order, escalate on failure):

  • Start with scrapling_get — fast HTTP, handles most sites

  • If content is empty/blocked → escalate to scrapling_fetch (real browser)

  • If still blocked → escalate to scrapling_stealthy_fetch (stealth mode)

User explicitly asks for stealth: Go straight to scrapling_stealthy_fetch

Multiple URLs: Use the bulk_ variants for parallel processing

Step 3: Call the MCP Tool

Use the Scrapling MCP server tools. All tools accept:

  • url (required): The URL to scrape

  • css_selector (optional): CSS selector to extract specific elements — always use this when possible to reduce token consumption

Single page:

Call scrapling MCP tool: get Arguments: { "url": "<url>", "css_selector": "<selector if provided>" }

Stealth:

Call scrapling MCP tool: stealthy_fetch Arguments: { "url": "<url>", "css_selector": "<selector if provided>" }

Bulk:

Call scrapling MCP tool: bulk_get Arguments: { "urls": ["<url1>", "<url2>"], "css_selector": "<selector>" }

Step 4: Process Results

The MCP returns extracted content (HTML or text depending on selector).

If user wants raw data: Present it formatted If user wants summary: Summarize the extracted content If user wants vault storage: Save to 00-Inbox/Scrape - [Title].md :

[Page Title]

Source: [URL] Scraped: YYYY-MM-DD Selector: [CSS selector used, if any]

Content

[Extracted content]

Step 5: Handle Failures

Error Action

Empty content Escalate to next fetcher tier

Connection refused Check URL is valid

Cloudflare challenge Auto-escalate to stealthy_fetch

Timeout Retry with longer timeout, suggest fetch for slow JS sites

Smart Defaults

  • News/blog articles: Auto-suggest article, .post-content, .entry-content selectors

  • Product pages: Auto-suggest .product, .price, .description selectors

  • Tables: Auto-suggest table selector, offer to convert to markdown table

  • Lists: Auto-suggest ul, ol selectors

Examples

User: /scrape https://example.com/blog/ai-trends → Use scrapling_get, auto-detect article content

User: /scrape stealth https://protected-site.com/data → Use scrapling_stealthy_fetch with Cloudflare bypass

User: scrape this page and grab just the pricing table: https://saas.com/pricing → Use scrapling_get with css_selector="table" or ".pricing"

User: scrape these 5 competitor pages and compare their features → Use scrapling_bulk_get, extract feature lists, present comparison

Prerequisites

Scrapling must be installed with MCP support:

pip install "scrapling[ai]" scrapling install

The scrapling MCP server must be configured in .mcp.json .

Relationship to Other Tools

  • WebFetch (native): Basic URL fetch, no anti-bot, no selectors. Use for simple known-good pages.

  • Firecrawl MCP: Cloud-based, requires API key, good for crawling entire sites. Use when you need recursive crawl.

  • Scrapling: Local, free, stealthy, selector-based. Default choice for single-page or small-batch scraping.

  • Apify: Marketplace of specialized scrapers. Use for platform-specific extraction (LinkedIn, Twitter, etc.)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

dex

No summary provided by upstream source.

Repository SourceNeeds Review
General

dex-plan

No summary provided by upstream source.

Repository SourceNeeds Review
Web3

helius-dflow

Build Solana trading applications combining DFlow trading APIs with Helius infrastructure. Covers spot swaps (imperative and declarative), prediction markets, real-time market streaming, Proof KYC, transaction submission via Sender, fee optimization, shred-level streaming via LaserStream, and wallet intelligence.

Archived SourceRecently Updated
General

neo

Browse websites, read web pages, interact with web apps, call website APIs, and automate web tasks. Use Neo when: user asks to check a website, read a web page, post on social media (Twitter/X), interact with any web app, look up information on a specific site, scrape data from websites, automate browser tasks, or when you need to call any website's API. Keywords: website, web page, browse, URL, http, API, twitter, tweet, post, scrape, web app, open site, check site, read page, social media, online service.

Archived SourceRecently Updated