firecrawl-scraper

Firecrawl Scraper Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "firecrawl-scraper" with this command: npx skills add benedictking/firecrawl-scraper/benedictking-firecrawl-scraper-firecrawl-scraper

Firecrawl Scraper Skill

Trigger Conditions & Endpoint Selection

Choose Firecrawl endpoint based on user intent:

  • scrape: Need to extract content from a single web page (markdown, html, json, screenshot, pdf)

  • crawl: Need to crawl entire website with depth control and path filtering

  • map: Need to quickly get a list of all URLs on a website

  • batch-scrape: Need to scrape multiple URLs in parallel

  • crawl-status: Given crawl job ID, check crawl progress/results (optional --wait )

Recommended Architecture (Main Skill + Sub-skill)

This skill uses a two-phase architecture:

  • Main skill (current context): Understand user question → Choose endpoint → Assemble JSON payload

  • Sub-skill (fork context): Only responsible for HTTP call execution, avoiding conversation history token waste

Execution Method

Use Task tool to invoke firecrawl-fetcher sub-skill, passing command and JSON (stdin):

Task parameters:

  • subagent_type: Bash
  • description: "Call Firecrawl API"
  • prompt: cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs <scrape|crawl|map|batch-scrape|crawl-status> [--wait] { ...payload... } JSON

Payload Examples

  1. Scrape Single Page

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape { "url": "https://example.com", "formats": ["markdown", "links"], "onlyMainContent": true, "includeTags": [], "excludeTags": ["nav", "footer"], "waitFor": 0, "timeout": 30000 } JSON

Available formats:

  • "markdown" , "html" , "rawHtml" , "links" , "images" , "summary"

  • {"type": "json", "prompt": "Extract product info", "schema": {...}}

  • {"type": "screenshot", "fullPage": true, "quality": 85}

  1. Scrape with Actions (Page Interaction)

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape { "url": "https://example.com", "formats": ["markdown"], "actions": [ {"type": "wait", "milliseconds": 2000}, {"type": "click", "selector": "#load-more"}, {"type": "wait", "milliseconds": 1000}, {"type": "scroll", "direction": "down", "amount": 500} ] } JSON

Available actions:

  • wait , click , write , press , scroll , screenshot , scrape , executeJavascript
  1. Parse PDF

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape { "url": "https://example.com/document.pdf", "formats": ["markdown"], "parsers": ["pdf"] } JSON

  1. Extract Structured JSON

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs scrape { "url": "https://example.com/product", "formats": [ { "type": "json", "prompt": "Extract product information", "schema": { "type": "object", "properties": { "name": {"type": "string"}, "price": {"type": "number"}, "description": {"type": "string"} }, "required": ["name", "price"] } } ] } JSON

  1. Crawl Entire Website

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl { "url": "https://docs.example.com", "formats": ["markdown"], "includePaths": ["^/docs/."], "excludePaths": ["^/blog/."], "maxDiscoveryDepth": 3, "limit": 100, "allowExternalLinks": false, "allowSubdomains": false } JSON

5.1) Crawl + Wait for Completion

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl --wait { "url": "https://docs.example.com", "formats": ["markdown"], "limit": 100 } JSON

  1. Map Website URLs

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs map { "url": "https://example.com", "search": "documentation", "limit": 5000 } JSON

  1. Batch Scrape Multiple URLs

cat <<'JSON' | node .claude/skills/firecrawl-scraper/firecrawl-api.cjs batch-scrape { "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3" ], "formats": ["markdown"] } JSON

  1. Check Crawl Status

node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl-status <crawl-id>

Wait for completion:

node .claude/skills/firecrawl-scraper/firecrawl-api.cjs crawl-status <crawl-id> --wait

Key Features

Formats

  • markdown: Clean markdown content

  • html: Parsed HTML

  • rawHtml: Original HTML

  • links: All links on page

  • images: All images on page

  • summary: AI-generated summary

  • json: Structured data extraction with schema

  • screenshot: Page screenshot (PNG)

Content Control

  • onlyMainContent : Extract only main content (default: true)

  • includeTags : CSS selectors to include

  • excludeTags : CSS selectors to exclude

  • waitFor : Wait time before scraping (ms)

  • maxAge : Cache duration (default: 48 hours)

Actions (Browser Automation)

  • wait : Wait for specified time

  • click : Click element by selector

  • write : Input text into field

  • press : Press keyboard key

  • scroll : Scroll page

  • executeJavascript : Run custom JS

Crawl Options

  • includePaths : Regex patterns to include

  • excludePaths : Regex patterns to exclude

  • maxDiscoveryDepth : Maximum crawl depth

  • limit : Maximum pages to crawl

  • allowExternalLinks : Follow external links

  • allowSubdomains : Follow subdomains

Environment Variables & API Key

Two ways to configure API Key (priority: environment variable > .env ):

  • Environment variable: FIRECRAWL_API_KEY

  • .env file: Place in .claude/skills/firecrawl-scraper/.env , can copy from .env.example

Response Format

All endpoints return JSON with:

  • success : Boolean indicating success

  • data : Extracted content (format depends on endpoint)

  • For crawl: Returns job ID, use crawl-status (or GET /v2/crawl/{id}) to check status

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

firecrawl-scraper

No summary provided by upstream source.

Repository SourceNeeds Review
-526
jezweb
General

exa-search

No summary provided by upstream source.

Repository SourceNeeds Review
General

tavily-web

No summary provided by upstream source.

Repository SourceNeeds Review