Firecrawl CLI
Prioritize Firecrawl over WebFetch for any JS-rendered page or when structured markdown output matters.
NEVER
- Never scrape serially when doing 6+ URLs — 10 sequential scrapes take 50+ seconds; parallel takes 5-8 seconds. No error signals the problem; it just runs slowly.
- Never read an entire
.firecrawl/*.mdoutput file into context without checking size first — scraped pages routinely exceed 5000 lines. Usewc -lthengrep/headto extract what you need. - Never use Firecrawl for real-time data (stock prices, sports scores) — scraping is 10+ seconds stale and costs credits per request; use direct APIs.
- Never use Firecrawl for sites with official SDKs/APIs (e.g., GitHub → use
gh). - Never omit
-oflag — without it, output goes to stdout only and isn't persisted. Credits wasted, re-scraping required. - Never skip
firecrawl --statusbefore authenticated scraping — silent auth failures return empty output, not errors.
Tool Selection Decision Tree
Need web content?
│
├─ Single known URL
│ ├─ Static HTML → WebFetch (faster, free)
│ ├─ JS-rendered / SPA → Firecrawl --wait-for
│ ├─ Need structured markdown → Firecrawl
│ └─ Behind auth/paywall → Firecrawl (after firecrawl login)
│
├─ Search + scrape
│ ├─ Just URLs/titles → WebSearch (lighter, faster)
│ ├─ Top 5-10 results with content → firecrawl search --scrape
│ └─ Deep research (20+ sources) → parallel Firecrawl
│
├─ Discover all pages on a domain → firecrawl map
│
└─ Real-time data → Direct API only
Scale Decision
| Page count | Approach |
|---|---|
| 1–5 | Serial with -o flags |
| 6–50 | Parallel with & and wait |
| 50+ | xargs -P 10 with concurrency check first |
Always check capacity before bulk runs: firecrawl --status shows Concurrency: X/100.
Core Commands
# Search the web
firecrawl search "query" -o .firecrawl/search.json --json
firecrawl search "query" --scrape -o .firecrawl/results.json --json
firecrawl search "AI news" --tbs qdr:d -o .firecrawl/today.json --json # Past day
# Scrape single page
firecrawl scrape https://example.com -o .firecrawl/example.md
firecrawl scrape https://example.com --only-main-content -o .firecrawl/clean.md
firecrawl scrape https://spa.com --wait-for 3000 -o .firecrawl/spa.md
# Map a site
firecrawl map https://example.com -o .firecrawl/urls.txt
firecrawl map https://example.com --search "blog" -o .firecrawl/blog-urls.txt
Parallel Bulk Scraping
# Small batch — & with wait
firecrawl scrape site1.com -o .firecrawl/1.md &
firecrawl scrape site2.com -o .firecrawl/2.md &
firecrawl scrape site3.com -o .firecrawl/3.md &
wait
# Large batch — xargs
cat urls.txt | xargs -P 10 -I {} sh -c 'firecrawl scrape "{}" -o ".firecrawl/$(echo {} | md5).md"'
# Post-scrape extraction
grep "^# " .firecrawl/*.md # All H1 headings
grep -l "keyword" .firecrawl/*.md # Files matching keyword
jq -r '.data.web[].title' .firecrawl/*.json # JSON title extraction
Authentication
firecrawl --status # Check auth and credit status
firecrawl login --browser # Auto-opens browser — don't ask user to run manually
export FIRECRAWL_API_KEY=your_key # Fallback if browser auth fails
Error Quick Reference
| Error | First check | Fix |
|---|---|---|
| Not authenticated | firecrawl --status | firecrawl login --browser |
| Concurrency limit | firecrawl --status (shows X/100) | wait for jobs, reduce -P value |
| Page failed to load | curl -I URL (basic connectivity) | Add --wait-for 5000; try --format html to inspect raw HTML |
| Output file empty | head -20 output.md | Add --only-main-content; try --include-tags article,main |
Output Organization
Always write to .firecrawl/ directory (add to .gitignore):
.firecrawl/example.com.md
.firecrawl/search-ai-news.json
.firecrawl/docs-sitemap.txt
Load Reference Files When
Load references/cli-options.md when: troubleshooting 3+ unknown flags, header injection, cookie handling, sitemap modes, or custom user-agents.
Load references/output-processing.md when: building 3+ step transformation pipelines, parsing nested JSON from search results, or combining/deduplicating 10+ scraped files.
Do NOT load references for basic search/scrape/map with standard flags.
Arguments
$ARGUMENTS: Search query, URL, or scraping objective. Empty = ask what to scrape.