firecrawl

Use when scraping web pages, extracting content from JS-rendered sites or SPAs, running search-plus-scrape workflows, or mapping entire site URL trees. Produces clean LLM-friendly Markdown. Prefer over WebFetch when JavaScript execution is required. Keywords: web scraping, fetch URL, scrape website, search web, extract content, SPA, JS-rendered, site map, crawl, Firecrawl.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "firecrawl" with this command: npx skills add acedergren/agentic-tools/acedergren-agentic-tools-firecrawl

Firecrawl CLI

Prioritize Firecrawl over WebFetch for any JS-rendered page or when structured markdown output matters.

NEVER

  • Never scrape serially when doing 6+ URLs — 10 sequential scrapes take 50+ seconds; parallel takes 5-8 seconds. No error signals the problem; it just runs slowly.
  • Never read an entire .firecrawl/*.md output file into context without checking size first — scraped pages routinely exceed 5000 lines. Use wc -l then grep/head to extract what you need.
  • Never use Firecrawl for real-time data (stock prices, sports scores) — scraping is 10+ seconds stale and costs credits per request; use direct APIs.
  • Never use Firecrawl for sites with official SDKs/APIs (e.g., GitHub → use gh).
  • Never omit -o flag — without it, output goes to stdout only and isn't persisted. Credits wasted, re-scraping required.
  • Never skip firecrawl --status before authenticated scraping — silent auth failures return empty output, not errors.

Tool Selection Decision Tree

Need web content?
│
├─ Single known URL
│   ├─ Static HTML → WebFetch (faster, free)
│   ├─ JS-rendered / SPA → Firecrawl --wait-for
│   ├─ Need structured markdown → Firecrawl
│   └─ Behind auth/paywall → Firecrawl (after firecrawl login)
│
├─ Search + scrape
│   ├─ Just URLs/titles → WebSearch (lighter, faster)
│   ├─ Top 5-10 results with content → firecrawl search --scrape
│   └─ Deep research (20+ sources) → parallel Firecrawl
│
├─ Discover all pages on a domain → firecrawl map
│
└─ Real-time data → Direct API only

Scale Decision

Page countApproach
1–5Serial with -o flags
6–50Parallel with & and wait
50+xargs -P 10 with concurrency check first

Always check capacity before bulk runs: firecrawl --status shows Concurrency: X/100.

Core Commands

# Search the web
firecrawl search "query" -o .firecrawl/search.json --json
firecrawl search "query" --scrape -o .firecrawl/results.json --json
firecrawl search "AI news" --tbs qdr:d -o .firecrawl/today.json --json  # Past day

# Scrape single page
firecrawl scrape https://example.com -o .firecrawl/example.md
firecrawl scrape https://example.com --only-main-content -o .firecrawl/clean.md
firecrawl scrape https://spa.com --wait-for 3000 -o .firecrawl/spa.md

# Map a site
firecrawl map https://example.com -o .firecrawl/urls.txt
firecrawl map https://example.com --search "blog" -o .firecrawl/blog-urls.txt

Parallel Bulk Scraping

# Small batch — & with wait
firecrawl scrape site1.com -o .firecrawl/1.md &
firecrawl scrape site2.com -o .firecrawl/2.md &
firecrawl scrape site3.com -o .firecrawl/3.md &
wait

# Large batch — xargs
cat urls.txt | xargs -P 10 -I {} sh -c 'firecrawl scrape "{}" -o ".firecrawl/$(echo {} | md5).md"'

# Post-scrape extraction
grep "^# " .firecrawl/*.md              # All H1 headings
grep -l "keyword" .firecrawl/*.md       # Files matching keyword
jq -r '.data.web[].title' .firecrawl/*.json  # JSON title extraction

Authentication

firecrawl --status                  # Check auth and credit status
firecrawl login --browser           # Auto-opens browser — don't ask user to run manually
export FIRECRAWL_API_KEY=your_key   # Fallback if browser auth fails

Error Quick Reference

ErrorFirst checkFix
Not authenticatedfirecrawl --statusfirecrawl login --browser
Concurrency limitfirecrawl --status (shows X/100)wait for jobs, reduce -P value
Page failed to loadcurl -I URL (basic connectivity)Add --wait-for 5000; try --format html to inspect raw HTML
Output file emptyhead -20 output.mdAdd --only-main-content; try --include-tags article,main

Output Organization

Always write to .firecrawl/ directory (add to .gitignore):

.firecrawl/example.com.md
.firecrawl/search-ai-news.json
.firecrawl/docs-sitemap.txt

Load Reference Files When

Load references/cli-options.md when: troubleshooting 3+ unknown flags, header injection, cookie handling, sitemap modes, or custom user-agents.

Load references/output-processing.md when: building 3+ step transformation pipelines, parsing nested JSON from search results, or combining/deduplicating 10+ scraped files.

Do NOT load references for basic search/scrape/map with standard flags.

Arguments

$ARGUMENTS: Search query, URL, or scraping objective. Empty = ask what to scrape.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

turborepo

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

cloudflare-zero-trust

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

orchestrate

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

humanizer

No summary provided by upstream source.

Repository SourceNeeds Review