scrapling

Adaptive web scraping framework with anti-bot bypass and spider crawling.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "scrapling" with this command: npx skills add zendenho7/scrapling

Scrapling - Adaptive Web Scraping

"Effortless web scraping for the modern web."


Credits

Core Library

API Reverse Engineering Methodology


Installation

# Core library (parser only)
pip install scrapling

# With fetchers (HTTP + browser automation) - RECOMMENDED
pip install "scrapling[fetchers]"
scrapling install

# With shell (CLI tools) - RECOMMENDED
pip install "scrapling[shell]"

# With AI (MCP server) - OPTIONAL
pip install "scrapling[ai]"

# Everything
pip install "scrapling[all]"

# Browser for stealth/dynamic mode
playwright install chromium

# For Cloudflare bypass (advanced)
pip install cloudscraper

Agent Instructions

When to Use Scrapling

Use Scrapling when:

  • Research topics from websites
  • Extract data from blogs, news sites, docs
  • Crawl multiple pages with Spider
  • Gather content for summaries
  • Extract brand data from any website
  • Reverse engineer APIs from websites

Do NOT use for:

  • X/Twitter (use x-tweet-fetcher skill)
  • Login-protected sites (unless credentials provided)
  • Paywalled content (respect robots.txt)
  • Sites that prohibit scraping in their TOS

Quick Commands

1. Basic Fetch (Most Common)

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://example.com')

# Extract content
title = page.css('h1::text').get()
paragraphs = page.css('p::text').getall()

2. Stealthy Fetch (Anti-Bot/Cloudflare)

from scrapling.fetchers import StealthyFetcher

StealthyFetcher.adaptive = True
page = StealthyFetcher.fetch('https://example.com', headless=True, solve_cloudflare=True)

3. Dynamic Fetch (Full Browser Automation)

from scrapling.fetchers import DynamicFetcher

page = DynamicFetcher.fetch('https://example.com', headless=True, network_idle=True)

4. Adaptive Parsing (Survives Design Changes)

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://example.com')

# First scrape - saves selectors
items = page.css('.product', auto_save=True)

# Later - if site changes, use adaptive=True to relocate
items = page.css('.product', adaptive=True)

5. Spider (Multiple Pages)

from scrapling.spiders import Spider, Response

class MySpider(Spider):
    name = "demo"
    start_urls = ["https://example.com"]
    concurrent_requests = 3
    
    async def parse(self, response: Response):
        for item in response.css('.item'):
            yield {"item": item.css('h2::text').get()}
        
        # Follow links
        next_page = response.css('.next a')
        if next_page:
            yield response.follow(next_page[0].attrib['href'])

MySpider().start()

6. CLI Usage

# Simple fetch to file
scrapling extract get https://example.com content.html

# Stealthy fetch (bypass anti-bot)
scrapling extract stealthy-fetch https://example.com content.html

# Interactive shell
scrapling shell https://example.com

Common Patterns

Extract Article Content

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://example.com/article')

# Try multiple selectors for title
title = (
    page.css('[itemprop="headline"]::text').get() or
    page.css('article h1::text').get() or
    page.css('h1::text').get()
)

# Get paragraphs
content = page.css('article p::text, .article-body p::text').getall()

print(f"Title: {title}")
print(f"Paragraphs: {len(content)}")

Research Multiple Pages

from scrapling.spiders import Spider, Response

class ResearchSpider(Spider):
    name = "research"
    start_urls = ["https://news.ycombinator.com"]
    concurrent_requests = 5
    
    async def parse(self, response: Response):
        for item in response.css('.titleline a::text').getall()[:10]:
            yield {"title": item, "source": "HN"}
        
        more = response.css('.morelink::attr(href)').get()
        if more:
            yield response.follow(more)

ResearchSpider().start()

Crawl Entire Site (Easy Mode)

Auto-crawl all pages on a domain by following internal links:

from scrapling.spiders import Spider, Response
from urllib.parse import urljoin, urlparse

class EasyCrawl(Spider):
    """Auto-crawl all pages on a domain."""
    
    name = "easy_crawl"
    start_urls = ["https://example.com"]
    concurrent_requests = 3
    
    def __init__(self):
        super().__init__()
        self.visited = set()
    
    async def parse(self, response: Response):
        # Extract content
        yield {
            'url': response.url,
            'title': response.css('title::text').get(),
            'h1': response.css('h1::text').get(),
        }
        
        # Follow internal links (limit to 50 pages)
        if len(self.visited) >= 50:
            return
        
        self.visited.add(response.url)
        
        links = response.css('a::attr(href)').getall()[:20]
        for link in links:
            full_url = urljoin(response.url, link)
            if full_url not in self.visited:
                yield response.follow(full_url)

# Usage
result = EasyCrawl()
result.start()

Sitemap Crawl

Crawl pages from sitemap.xml (with fallback to link discovery):

from scrapling.fetchers import Fetcher
from scrapling.spiders import Spider, Response
from urllib.parse import urljoin, urlparse
import re

def get_sitemap_urls(url: str, max_urls: int = 100) -> list:
    """Extract URLs from sitemap.xml - also checks robots.txt."""
    
    parsed = urlparse(url)
    base_url = f"{parsed.scheme}://{parsed.netloc}"
    
    sitemap_urls = [
        f"{base_url}/sitemap.xml",
        f"{base_url}/sitemap-index.xml",
        f"{base_url}/sitemap_index.xml",
        f"{base_url}/sitemap-news.xml",
    ]
    
    all_urls = []
    
    # First check robots.txt for sitemap URL
    try:
        robots = Fetcher.get(f"{base_url}/robots.txt")
        if robots.status == 200:
            sitemap_in_robots = re.findall(r'Sitemap:\s*(\S+)', robots.text, re.IGNORECASE)
            for sm in sitemap_in_robots:
                sitemap_urls.insert(0, sm)
    except:
        pass
    
    # Try each sitemap location
    for sitemap_url in sitemap_urls:
        try:
            page = Fetcher.get(sitemap_url, timeout=10)
            if page.status != 200:
                continue
            
            text = page.text
            
            # Check if it's XML
            if '<?xml' in text or '<urlset' in text or '<sitemapindex' in text:
                urls = re.findall(r'<loc>([^<]+)</loc>', text)
                all_urls.extend(urls[:max_urls])
                print(f"Found {len(urls)} URLs in {sitemap_url}")
        except:
            continue
    
    return list(set(all_urls))[:max_urls]

def crawl_from_sitemap(domain_url: str, max_pages: int = 50):
    """Crawl pages from sitemap."""
    
    print(f"Fetching sitemap for {domain_url}...")
    urls = get_sitemap_urls(domain_url)
    
    if not urls:
        print("No sitemap found. Use EasyCrawl instead!")
        return []
    
    print(f"Found {len(urls)} URLs, crawling first {max_pages}...")
    
    results = []
    for url in urls[:max_pages]:
        try:
            page = Fetcher.get(url, timeout=10)
            results.append({
                'url': url,
                'status': page.status,
                'title': page.css('title::text').get(),
            })
        except Exception as e:
            results.append({'url': url, 'error': str(e)[:50]})
    
    return results

# Usage
print("=== Sitemap Crawl ===")
results = crawl_from_sitemap('https://example.com', max_pages=10)
for r in results[:3]:
    print(f"  {r.get('title', r.get('error', 'N/A'))}")

# Alternative: Easy crawl all links
print("\n=== Easy Crawl (Link Discovery) ===")
result = EasyCrawl(start_urls=["https://example.com"], max_pages=10).start()
print(f"Crawled {len(result.items)} pages")

Firecrawl-Style Crawl (Best of Both Worlds)

Inspired by Firecrawl's behavior - combines sitemap discovery with link following:

from scrapling.fetchers import Fetcher
from scrapling.spiders import Spider, Response
from urllib.parse import urljoin, urlparse
import re

def firecrawl_crawl(url: str, max_pages: int = 50, use_sitemap: bool = True):
    """
    Firecrawl-style crawling:
    - use_sitemap=True: Discover URLs from sitemap first (default)
    - use_sitemap=False: Only follow HTML links (like sitemap:"skip")
    
    Matches Firecrawl's crawl behavior.
    """
    
    parsed = urlparse(url)
    domain = parsed.netloc
    
    # ========== Method 1: Sitemap Discovery ==========
    if use_sitemap:
        print(f"[Firecrawl] Discovering URLs from sitemap...")
        
        sitemap_urls = [
            f"{url.rstrip('/')}/sitemap.xml",
            f"{url.rstrip('/')}/sitemap-index.xml",
        ]
        
        all_urls = []
        
        # Try sitemaps
        for sm_url in sitemap_urls:
            try:
                page = Fetcher.get(sm_url, timeout=15)
                if page.status == 200:
                    # Handle bytes
                    text = page.body.decode('utf-8', errors='ignore') if isinstance(page.body, bytes) else str(page.body)
                    
                    if '<urlset' in text:
                        urls = re.findall(r'<loc>([^<]+)</loc>', text)
                        all_urls.extend(urls[:max_pages])
                        print(f"[Firecrawl] Found {len(urls)} URLs in {sm_url}")
            except:
                continue
        
        if all_urls:
            print(f"[Firecrawl] Total: {len(all_urls)} URLs from sitemap")
            
            # Crawl discovered URLs
            results = []
            for page_url in all_urls[:max_pages]:
                try:
                    page = Fetcher.get(page_url, timeout=15)
                    results.append({
                        'url': page_url,
                        'status': page.status,
                        'title': page.css('title::text').get() if page.status == 200 else None,
                    })
                except Exception as e:
                    results.append({'url': page_url, 'error': str(e)[:50]})
            
            return results
    
    # ========== Method 2: Link Discovery (sitemap: skip) ==========
    print(f"[Firecrawl] Sitemap skip - using link discovery...")
    
    class LinkCrawl(Spider):
        name = "firecrawl_link"
        start_urls = [url]
        concurrent_requests = 3
        
        def __init__(self):
            super().__init__()
            self.visited = set()
            self.domain = domain
            self.results = []
        
        async def parse(self, response: Response):
            if len(self.results) >= max_pages:
                return
            
            self.results.append({
                'url': response.url,
                'status': response.status,
                'title': response.css('title::text').get(),
            })
            
            # Follow internal links
            links = response.css('a::attr(href)').getall()[:20]
            for link in links:
                full_url = urljoin(response.url, link)
                parsed_link = urlparse(full_url)
                
                if parsed_link.netloc == self.domain and full_url not in self.visited:
                    self.visited.add(full_url)
                    if len(self.visited) < max_pages:
                        yield response.follow(full_url)
    
    result = LinkCrawl()
    result.start()
    return result.results

# Usage
print("=== Firecrawl-Style (sitemap: include) ===")
results = firecrawl_crawl('https://www.cloudflare.com', max_pages=5, use_sitemap=True)
print(f"Crawled: {len(results)} pages")

print("\n=== Firecrawl-Style (sitemap: skip) ===")
results = firecrawl_crawl('https://example.com', max_pages=5, use_sitemap=False)
print(f"Crawled: {len(results)} pages")

Handle Errors

from scrapling.fetchers import Fetcher, StealthyFetcher

try:
    page = Fetcher.get('https://example.com')
except Exception as e:
    # Try stealth mode
    page = StealthyFetcher.fetch('https://example.com', headless=True)
    
if page.status == 403:
    print("Blocked - try StealthyFetcher")
elif page.status == 200:
    print("Success!")

Session Management

from scrapling.fetchers import FetcherSession

with FetcherSession(impersonate='chrome') as session:
    page = session.get('https://quotes.toscrape.com/', stealthy_headers=True)
    quotes = page.css('.quote .text::text').getall()

Multiple Session Types in Spider

from scrapling.spiders import Spider, Request, Response
from scrapling.fetchers import FetcherSession, AsyncStealthySession

class MultiSessionSpider(Spider):
    name = "multi"
    start_urls = ["https://example.com/"]
    
    def configure_sessions(self, manager):
        manager.add("fast", FetcherSession(impersonate="chrome"))
        manager.add("stealth", AsyncStealthySession(headless=True), lazy=True)
    
    async def parse(self, response: Response):
        for link in response.css('a::attr(href)').getall():
            if "protected" in link:
                yield Request(link, sid="stealth")
            else:
                yield Request(link, sid="fast", callback=self.parse)

Advanced Parsing & Navigation

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://quotes.toscrape.com/')

# Multiple selection methods
quotes = page.css('.quote')           # CSS
quotes = page.xpath('//div[@class="quote"]')  # XPath
quotes = page.find_all('div', class_='quote')  # BeautifulSoup-style

# Navigation
first_quote = page.css('.quote')[0]
author = first_quote.css('.author::text').get()
parent = first_quote.parent

# Find similar elements
similar = first_quote.find_similar()

Advanced: API Reverse Engineering

"Web scraping is 80% reverse engineering."

This section covers advanced techniques to discover and replicate APIs directly from websites — often revealing data that's "hidden" behind paid APIs.

1. API Endpoint Discovery

Many websites load data via client-side requests. Use browser DevTools to find them:

Steps:

  1. Open browser DevTools (F12)
  2. Go to Network tab
  3. Reload the page
  4. Look for XHR or Fetch requests
  5. Check if endpoints return JSON data

What to look for:

  • Requests to /api/* endpoints
  • Responses containing structured data (JSON)
  • Same endpoints used on both free and paid sections

Example pattern:

# Found in Network tab:
GET https://api.example.com/v1/users/transactions
Response: {"data": [...], "pagination": {...}}

2. JavaScript Analysis

Auth tokens often generated client-side. Find them in .js files:

Steps:

  1. In Network tab, look at Initiator column
  2. Click the .js file making the request
  3. Search for auth header name (e.g., sol-aut, Authorization, X-API-Key)
  4. Find the function generating the token

Common patterns:

  • Plain text function names: generateToken(), createAuthHeader()
  • Obfuscated: Search for the header name directly
  • Random string generation: Math.random(), crypto.getRandomValues()

3. Replicating Discovered APIs

Once you've found the endpoint and auth pattern:

import requests
import random
import string

def generate_auth_token():
    """Replicate discovered token generation logic."""
    chars = string.ascii_letters + string.digits
    token = ''.join(random.choice(chars) for _ in range(40))
    # Insert fixed string at random position
    fixed = "B9dls0fK"
    pos = random.randint(0, len(token))
    return token[:pos] + fixed + token[pos:]

def scrape_api_endpoint(url):
    """Hit discovered API endpoint with replicated auth."""
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept': 'application/json',
        'sol-aut': generate_auth_token(),  # Replicate discovered header
    }
    
    response = requests.get(url, headers=headers)
    return response.json()

4. Cloudscraper Bypass (Cloudflare)

For Cloudflare-protected endpoints, use cloudscraper:

pip install cloudscraper
import cloudscraper

def create_scraper():
    """Create a cloudscraper session that bypasses Cloudflare."""
    scraper = cloudscraper.create_scraper(
        browser={
            'browser': 'chrome',
            'platform': 'windows',
            'desktop': True
        }
    )
    return scraper

# Usage
scraper = create_scraper()
response = scraper.get('https://api.example.com/endpoint')
data = response.json()

5. Complete API Replication Pattern

import cloudscraper
import random
import string
import json

class APIReplicator:
    """Replicate discovered API from website."""
    
    def __init__(self, base_url):
        self.base_url = base_url
        self.session = cloudscraper.create_scraper()
    
    def generate_token(self, pattern="random"):
        """Replicate discovered token generation."""
        if pattern == "solscan":
            # 40-char random + fixed string at random position
            chars = string.ascii_letters + string.digits
            token = ''.join(random.choice(chars) for _ in range(40))
            fixed = "B9dls0fK"
            pos = random.randint(0, len(token))
            return token[:pos] + fixed + token[pos:]
        else:
            # Generic random token
            return ''.join(random.choices(string.ascii_letters + string.digits, k=32))
    
    def get(self, endpoint, headers=None, auth_header=None, auth_pattern="random"):
        """Make API request with discovered auth."""
        url = f"{self.base_url}{endpoint}"
        
        # Build headers
        request_headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Accept': 'application/json',
        }
        
        # Add discovered auth header
        if auth_header:
            request_headers[auth_header] = self.generate_token(auth_pattern)
        
        # Merge custom headers
        if headers:
            request_headers.update(headers)
        
        response = self.session.get(url, headers=request_headers)
        return response

# Usage example
api = APIReplicator("https://api.solscan.io")
data = api.get(
    "/account/transactions",
    auth_header="sol-aut",
    auth_pattern="solscan"
)
print(data)

6. Discovery Checklist

When approaching a new site:

StepActionTool
1Open DevTools Network tabF12
2Reload page, filter by XHR/FetchNetwork filter
3Look for JSON responsesResponse tab
4Check if same endpoint used for "premium" dataCompare requests
5Find auth header in JS filesInitiator column
6Extract token generation logicJS debugger
7Replicate in PythonReplicator class
8Test against APIRun script

Brand Data Extraction (Firecrawl Alternative)

Extract brand data, colors, logos, and copy from any website:

from scrapling.fetchers import Fetcher
from urllib.parse import urljoin
import re

def extract_brand_data(url: str) -> dict:
    """Extract structured brand data from any website - Firecrawl style."""
    
    # Try stealth mode first (handles anti-bot)
    try:
        page = Fetcher.get(url)
    except:
        from scrapling.fetchers import StealthyFetcher
        page = StealthyFetcher.fetch(url, headless=True)
    
    # Helper to get text from element
    def get_text(elements):
        return elements[0].text if elements else None
    
    # Helper to get attribute
    def get_attr(elements, attr_name):
        return elements[0].attrib.get(attr_name) if elements else None
    
    # Brand name (try multiple selectors)
    brand_name = (
        get_text(page.css('[property="og:site_name"]')) or
        get_text(page.css('h1')) or
        get_text(page.css('title'))
    )
    
    # Tagline
    tagline = (
        get_text(page.css('[property="og:description"]')) or
        get_text(page.css('.tagline')) or
        get_text(page.css('.hero-text')) or
        get_text(page.css('header h2'))
    )
    
    # Logo URL
    logo_url = (
        get_attr(page.css('[rel="icon"]'), 'href') or
        get_attr(page.css('[rel="apple-touch-icon"]'), 'href') or
        get_attr(page.css('.logo img'), 'src')
    )
    if logo_url and not logo_url.startswith('http'):
        logo_url = urljoin(url, logo_url)
    
    # Favicon
    favicon = get_attr(page.css('[rel="icon"]'), 'href')
    favicon_url = urljoin(url, favicon) if favicon else None
    
    # OG Image
    og_image = get_attr(page.css('[property="og:image"]'), 'content')
    og_image_url = urljoin(url, og_image) if og_image else None
    
    # Screenshot (using external service)
    screenshot_url = f"https://image.thum.io/get/width/1200/crop/800/{url}"
    
    # Description
    description = (
        get_text(page.css('[property="og:description"]')) or
        get_attr(page.css('[name="description"]'), 'content')
    )
    
    # CTA text
    cta_text = (
        get_text(page.css('a[href*="signup"]')) or
        get_text(page.css('.cta')) or
        get_text(page.css('[class*="button"]'))
    )
    
    # Social links
    social_links = {}
    for platform in ['twitter', 'facebook', 'instagram', 'linkedin', 'youtube', 'github']:
        link = get_attr(page.css(f'a[href*="{platform}"]'), 'href')
        if link:
            social_links[platform] = link
    
    # Features (from feature grid/cards)
    features = []
    feature_cards = page.css('[class*="feature"], .feature-card, .benefit-item')
    for card in feature_cards[:6]:
        feature_text = get_text(card.css('h3, h4, p'))
        if feature_text:
            features.append(feature_text.strip())
    
    return {
        'brandName': brand_name,
        'tagline': tagline,
        'description': description,
        'features': features,
        'logoUrl': logo_url,
        'faviconUrl': favicon_url,
        'ctaText': cta_text,
        'socialLinks': social_links,
        'screenshotUrl': screenshot_url,
        'ogImageUrl': og_image_url
    }

# Usage
brand_data = extract_brand_data('https://example.com')
print(brand_data)

Brand Data CLI

# Extract brand data using the Python function above
python3 -c "
import json
import sys
sys.path.insert(0, '/path/to/skill')
from brand_extraction import extract_brand_data
data = extract_brand_data('$URL')
print(json.dumps(data, indent=2))
"

Feature Comparison

FeatureStatusNotes
Basic fetch✅ WorkingFetcher.get()
Stealthy fetch✅ WorkingStealthyFetcher.fetch()
Dynamic fetch✅ WorkingDynamicFetcher.fetch()
Adaptive parsing✅ Workingauto_save + adaptive
Spider crawling✅ Workingasync def parse()
CSS selectors✅ Working.css()
XPath✅ Working.xpath()
Session management✅ WorkingFetcherSession, StealthySession
Proxy rotation✅ WorkingProxyRotator class
CLI tools✅ Workingscrapling extract
Brand data extraction✅ Workingextract_brand_data()
API reverse engineering✅ WorkingAPIReplicator class
Cloudscraper bypass✅ Workingcloudscraper integration
Easy site crawl✅ WorkingEasyCrawl class
Sitemap crawl✅ Workingget_sitemap_urls()
MCP server❌ ExcludedNot needed

Examples Tested

IEEE Spectrum

page = Fetcher.get('https://spectrum.ieee.org/...')
title = page.css('h1::text').get()
content = page.css('article p::text').getall()

✅ Works

Hacker News

page = Fetcher.get('https://news.ycombinator.com')
stories = page.css('.titleline a::text').getall()

✅ Works

Example Domain

page = Fetcher.get('https://example.com')
title = page.css('h1::text').get()

✅ Works


🔧 Quick Troubleshooting

IssueSolution
403/429 BlockedUse StealthyFetcher or cloudscraper
CloudflareUse StealthyFetcher or cloudscraper
JavaScript requiredUse DynamicFetcher
Site changedUse adaptive=True
Paid API exposedUse API reverse engineering
CaptchaCannot bypass - skip or use official API
Auth requiredDo NOT bypass - use official API

Skill Graph

Related skills:

  • [[content-research]] - Research workflow
  • [[blogwatcher]] - RSS/feed monitoring
  • [[youtube-watcher]] - Video content
  • [[chirp]] - Twitter/X interactions
  • [[newsletter-digest]] - Content summarization
  • [[x-tweet-fetcher]] - X/Twitter (use instead of Scrapling)

Changelog

v1.0.8 (2026-02-25)

  • Added: Firecrawl-Style Crawl - Combines sitemap discovery + link following
  • Added: use_sitemap parameter - Matches Firecrawl's sitemap:"include"/"skip" behavior
  • Verified: cloudflare.com returns 2,447 URLs from sitemap!

v1.0.7 (2026-02-25)

  • Fixed: EasyCrawl Spider syntax - Updated to work with scrapling's actual Spider API
  • Verified: Spider crawling works - Tested and crawled 20+ pages from example.com

v1.0.6 (2026-02-25)

  • Added: Easy Site Crawl - Auto-crawl all pages on a domain with EasyCrawl spider
  • Added: Sitemap Crawl - Extract URLs from sitemap.xml and crawl them
  • Feature parity with Firecrawl for site crawling capabilities

v1.0.5 (2026-02-25)

  • Enhanced: API Reverse Engineering methodology
    • Detailed step-by-step process from @paoloanzn's work
    • Real Solscan case study with exact timeline
    • Added: Step-by-step methodology section
    • Added: Real example documentation (Solscan March 2025 vs Feb 2026)
    • Added: Discovery checklist with 10 steps
    • Documented: How to find auth headers in JS files
    • Documented: Token generation pattern extraction
    • Updated: Cloudscraper integration with multi-attempt pattern
    • Verified: Solscan now patched (Cloudflare on both endpoints)

v1.0.4 (2026-02-25)

  • Fixed: Brand Data Extraction API - Corrected selectors for scrapling's Response object
  • Fixed .html.text / .body
  • Fixed .title()page.css('title')
  • Fixed .logo img::src.logo img::attr(src)
  • Tested and verified working

v1.0.3 (2026-02-25)

  • Added: API Reverse Engineering section
    • API Endpoint Discovery (Network tab analysis)
    • JavaScript Analysis (finding auth logic)
    • Cloudscraper integration for Cloudflare bypass
    • Complete APIReplicator class
    • Discovery checklist
  • Added cloudscraper to installation

v1.0.2 (2026-02-25)

  • Synced with upstream GitHub README exactly
  • Added Brand Data Extraction section
  • Clean, core-only version

v1.0.1 (2026-02-25)

  • Synced with original Scrapling GitHub README

Last updated: 2026-02-25

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Jina Reader

Extract clean, readable markdown content from any URL using Jina Reader API. Use when you need to fetch and parse web pages without dealing with HTML, JavaScript rendering, or paywalls. Ideal for research, article summarization, content analysis, and working with search results from tavily-search, web_search, or searxng skills.

Registry SourceRecently Updated
7420Profile unavailable
Research

First-Principles-Deconstructor

Analyzes products by stripping marketing jargon to identify their fundamental physical laws, constraints, and optimal efficient designs from first principles.

Registry SourceRecently Updated
250Profile unavailable
Research

Outreach Demo

Research a business website, produce a concise prospect report, recommend concrete OpenClaw use cases, and draft a tailored outreach email. Use when demonstr...

Registry SourceRecently Updated
560Profile unavailable
Research

OpenClaw Glasses (多源搜索+意图感知+权重自适)

Bilingual search-layer skill for OpenClaw that turns ordinary web lookup into multi-source retrieval, intent-aware ranking, adaptive weighting, thread-pullin...

Registry SourceRecently Updated
720Profile unavailable