scrapeninja

ScrapeNinja API for web scraping. Use when user mentions "ScrapeNinja", "scrape", "web scraping", or data extraction.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "scrapeninja" with this command: npx skills add vm0-ai/vm0-skills/vm0-ai-vm0-skills-scrapeninja

ScrapeNinja

High-performance web scraping API with Chrome TLS fingerprint, rotating proxies, smart retries, and optional JavaScript rendering.

Official docs: https://scrapeninja.net/docs/


When to Use

Use this skill when you need to:

  • Scrape websites with anti-bot protection (Cloudflare, Datadome)
  • Extract data without running a full browser (fast /scrape endpoint)
  • Render JavaScript-heavy pages (/scrape-js endpoint)
  • Use rotating proxies with geo selection (US, EU, Brazil, etc.)
  • Extract structured data with Cheerio extractors
  • Intercept AJAX requests
  • Take screenshots of pages

Prerequisites

  1. Get an API key from RapidAPI or APIRoad:

Set environment variable:

# For RapidAPI
export SCRAPENINJA_TOKEN="your-rapidapi-key"

# For APIRoad (use X-Apiroad-Key header instead)
export SCRAPENINJA_TOKEN="your-apiroad-key"

How to Use

1. Basic Scrape (Non-JS, Fast)

High-performance scraping with Chrome TLS fingerprint, no JavaScript:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com"
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'

With custom headers and retries:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "headers": ["Accept-Language: en-US"],
  "retryNum": 3,
  "timeout": 15
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json

2. Scrape with JavaScript Rendering

For JavaScript-heavy sites (React, Vue, etc.):

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "waitForSelector": "h1",
  "timeout": 20
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, bodyLength: (.body | length)}'

With screenshot:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "screenshot": true
}

Then run:

# Get screenshot URL from response
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq -r '.info.screenshot'

3. Geo-Based Proxy Selection

Use proxies from specific regions:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "geo": "eu"
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq .info

Available geos: us, eu, br (Brazil), fr (France), de (Germany), 4g-eu

4. Smart Retries

Retry on specific HTTP status codes or text patterns:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "retryNum": 3,
  "statusNotExpected": [403, 429, 503],
  "textNotExpected": ["captcha", "Access Denied"]
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json

5. Extract Data with Cheerio

Extract structured JSON using Cheerio extractor functions:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://news.ycombinator.com",
  "extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.extractor'

6. Intercept AJAX Requests

Capture XHR/fetch responses:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "catchAjaxHeadersUrlMask": "api/data"
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.info.catchedAjax'

7. Block Resources for Speed

Speed up JS rendering by blocking images and media:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "blockImages": true,
  "blockMedia": true
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json

API Endpoints

EndpointDescription
/scrapeFast non-JS scraping with Chrome TLS fingerprint
/scrape-jsFull Chrome browser with JS rendering
/v2/scrape-jsEnhanced JS rendering for protected sites (APIRoad only)

Request Parameters

Common Parameters (all endpoints)

ParameterTypeDefaultDescription
urlstringrequiredURL to scrape
headersstring[]-Custom HTTP headers
retryNumint1Number of retry attempts
geostringusProxy geo: us, eu, br, fr, de, 4g-eu
proxystring-Custom proxy URL (overrides geo)
timeoutint10/16Timeout per attempt in seconds
textNotExpectedstring[]-Text patterns that trigger retry
statusNotExpectedint[][403, 502]HTTP status codes that trigger retry
extractorstring-Cheerio extractor function

JS Rendering Parameters (/scrape-js, /v2/scrape-js)

ParameterTypeDefaultDescription
waitForSelectorstring-CSS selector to wait for
postWaitTimeint-Extra wait time after load (1-12s)
screenshotbooltrueTake page screenshot
blockImagesboolfalseBlock image loading
blockMediaboolfalseBlock CSS/fonts loading
catchAjaxHeadersUrlMaskstring-URL pattern to intercept AJAX
viewportobject1920x1080Custom viewport size

Response Format

{
  "info": {
  "statusCode": 200,
  "finalUrl": "https://example.com",
  "headers": ["content-type: text/html"],
  "screenshot": "base64-encoded-png",
  "catchedAjax": {
  "url": "https://example.com/api/data",
  "method": "GET",
  "body": "...",
  "status": 200
  }
  },
  "body": "<html>...</html>",
  "extractor": { "extracted": "data" }
}

Guidelines

  1. Start with /scrape: Use the fast non-JS endpoint first, only switch to /scrape-js if needed
  2. Retries: Set retryNum to 2-3 for unreliable sites
  3. Geo Selection: Use eu for European sites, us for American sites
  4. Extractors: Test extractors at https://scrapeninja.net/cheerio-sandbox/
  5. Blocked Sites: For Cloudflare/Datadome protected sites, use /v2/scrape-js via APIRoad
  6. Screenshots: Set screenshot: false to speed up JS rendering
  7. Rate Limits: Check your plan limits on RapidAPI/APIRoad dashboard

Tools

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

google-sheets

No summary provided by upstream source.

Repository SourceNeeds Review
246-vm0-ai
General

apify

No summary provided by upstream source.

Repository SourceNeeds Review
214-vm0-ai
General

hackernews

No summary provided by upstream source.

Repository SourceNeeds Review
174-vm0-ai
General

serpapi

No summary provided by upstream source.

Repository SourceNeeds Review
166-vm0-ai