Firecrawl Web Scraper Skill
Status: Production Ready ✅ Last Updated: 2025-10-24 Official Docs: https://docs.firecrawl.dev API Version: v2
What is Firecrawl?
Firecrawl is a Web Data API for AI that turns entire websites into LLM-ready markdown or structured data. It handles:
-
JavaScript rendering - Executes client-side JavaScript to capture dynamic content
-
Anti-bot bypass - Gets past CAPTCHA and bot detection systems
-
Format conversion - Outputs as markdown, JSON, or structured data
-
Screenshot capture - Saves visual representations of pages
-
Browser automation - Full headless browser capabilities
API Endpoints
- /v2/scrape
- Single Page Scraping
Scrapes a single webpage and returns clean, structured content.
Use Cases:
-
Extract article content
-
Get product details
-
Scrape specific pages
-
Convert HTML to markdown
Key Options:
-
formats : ["markdown", "html", "screenshot"]
-
onlyMainContent : true/false (removes nav, footer, ads)
-
waitFor : milliseconds to wait before scraping
-
actions : browser automation actions (click, scroll, etc.)
- /v2/crawl
- Full Site Crawling
Crawls all accessible pages from a starting URL.
Use Cases:
-
Index entire documentation sites
-
Archive website content
-
Build knowledge bases
-
Scrape multi-page content
Key Options:
-
limit : max pages to crawl
-
maxDepth : how many links deep to follow
-
allowedDomains : restrict to specific domains
-
excludePaths : skip certain URL patterns
- /v2/map
- URL Discovery
Maps all URLs on a website without scraping content.
Use Cases:
-
Find sitemap
-
Discover all pages
-
Plan crawling strategy
-
Audit website structure
- /v2/extract
- Structured Data Extraction
Uses AI to extract specific data fields from pages.
Use Cases:
-
Extract product prices and names
-
Parse contact information
-
Build structured datasets
-
Custom data schemas
Key Options:
-
schema : Zod or JSON schema defining desired structure
-
systemPrompt : guide AI extraction behavior
Authentication
Firecrawl requires an API key for all requests.
Get API Key
-
Sign up at https://www.firecrawl.dev
-
Go to dashboard → API Keys
-
Copy your API key (starts with fc- )
Store Securely
NEVER hardcode API keys in code!
.env file
FIRECRAWL_API_KEY=fc-your-api-key-here
.env.local (for local development)
FIRECRAWL_API_KEY=fc-your-api-key-here
Python SDK Usage
Installation
pip install firecrawl-py
Latest Version: firecrawl-py v4.5.0+
Basic Scrape
import os from firecrawl import FirecrawlApp
Initialize client
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
Scrape a single page
result = app.scrape_url( url="https://example.com/article", params={ "formats": ["markdown", "html"], "onlyMainContent": True } )
Access markdown content
markdown = result.get("markdown") print(markdown)
Crawl Multiple Pages
import os from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
Start crawl
crawl_result = app.crawl_url( url="https://docs.example.com", params={ "limit": 100, "scrapeOptions": { "formats": ["markdown"] } }, poll_interval=5 # Check status every 5 seconds )
Process results
for page in crawl_result.get("data", []): url = page.get("url") markdown = page.get("markdown") print(f"Scraped: {url}")
Extract Structured Data
import os from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
Define schema
schema = { "type": "object", "properties": { "company_name": {"type": "string"}, "product_price": {"type": "number"}, "availability": {"type": "string"} }, "required": ["company_name", "product_price"] }
Extract data
result = app.extract( urls=["https://example.com/product"], params={ "schema": schema, "systemPrompt": "Extract product information from the page" } )
print(result)
TypeScript/Node.js SDK Usage
Installation
npm install @mendable/firecrawl-js
or
pnpm add @mendable/firecrawl-js
or use the unscoped package:
npm install firecrawl
Latest Version: @mendable/firecrawl-js v4.4.1+ (or firecrawl v4.4.1+ )
Basic Scrape
import FirecrawlApp from '@mendable/firecrawl-js';
// Initialize client const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Scrape a single page const result = await app.scrapeUrl('https://example.com/article', { formats: ['markdown', 'html'], onlyMainContent: true });
// Access markdown content const markdown = result.markdown; console.log(markdown);
Crawl Multiple Pages
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Start crawl const crawlResult = await app.crawlUrl('https://docs.example.com', { limit: 100, scrapeOptions: { formats: ['markdown'] } });
// Process results
for (const page of crawlResult.data) {
console.log(Scraped: ${page.url});
console.log(page.markdown);
}
Extract Structured Data with Zod
import FirecrawlApp from '@mendable/firecrawl-js'; import { z } from 'zod';
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Define schema with Zod const schema = z.object({ company_name: z.string(), product_price: z.number(), availability: z.string() });
// Extract data const result = await app.extract({ urls: ['https://example.com/product'], schema: schema, systemPrompt: 'Extract product information from the page' });
console.log(result);
Common Use Cases
- Documentation Scraping
Scenario: Convert entire documentation site to markdown for RAG/chatbot
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
docs = app.crawl_url( url="https://docs.myapi.com", params={ "limit": 500, "scrapeOptions": { "formats": ["markdown"], "onlyMainContent": True }, "allowedDomains": ["docs.myapi.com"] } )
Save to files
for page in docs.get("data", []): filename = page["url"].replace("https://", "").replace("/", "_") + ".md" with open(f"docs/{filename}", "w") as f: f.write(page["markdown"])
- Product Data Extraction
Scenario: Extract structured product data for e-commerce
const schema = z.object({ title: z.string(), price: z.number(), description: z.string(), images: z.array(z.string()), in_stock: z.boolean() });
const products = await app.extract({ urls: productUrls, schema: schema, systemPrompt: 'Extract all product details including price and availability' });
- News Article Scraping
Scenario: Extract clean article content without ads/navigation
article = app.scrape_url( url="https://news.com/article", params={ "formats": ["markdown"], "onlyMainContent": True, "removeBase64Images": True } )
Get clean markdown
content = article.get("markdown")
Error Handling
Python
from firecrawl import FirecrawlApp from firecrawl.exceptions import FirecrawlException
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
try: result = app.scrape_url("https://example.com") except FirecrawlException as e: print(f"Firecrawl error: {e}") except Exception as e: print(f"Unexpected error: {e}")
TypeScript
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
try { const result = await app.scrapeUrl('https://example.com'); } catch (error) { if (error.response) { // API error console.error('API Error:', error.response.data); } else { // Network or other error console.error('Error:', error.message); } }
Rate Limits & Best Practices
Rate Limits
-
Free tier: 500 credits/month
-
Paid tiers: Higher limits based on plan
-
Credits consumed vary by endpoint and options
Best Practices
-
Use onlyMainContent: true to reduce credits and get cleaner data
-
Set reasonable limits on crawls to avoid excessive costs
-
Handle retries with exponential backoff for transient errors
-
Cache results locally to avoid re-scraping same content
-
Use map endpoint first to plan crawling strategy
-
Batch extract calls when processing multiple URLs
-
Monitor credit usage in dashboard
Cloudflare Workers Integration
⚠️ Important: SDK Compatibility
The Firecrawl SDK cannot run in Cloudflare Workers due to Node.js dependencies (specifically axios which uses Node.js http module). Workers require Web Standard APIs.
✅ Use the direct REST API with fetch instead (see example below).
Alternative: Self-host with workers-firecrawl - a Workers-native implementation (requires Workers Paid Plan, only implements /search endpoint).
Workers Example: Direct REST API
This example uses the fetch API to call Firecrawl directly - works perfectly in Cloudflare Workers:
interface Env { FIRECRAWL_API_KEY: string; SCRAPED_CACHE?: KVNamespace; // Optional: for caching results }
interface FirecrawlScrapeResponse { success: boolean; data: { markdown?: string; html?: string; metadata: { title?: string; description?: string; language?: string; sourceURL: string; }; }; }
export default { async fetch(request: Request, env: Env): Promise<Response> { if (request.method !== 'POST') { return Response.json({ error: 'Method not allowed' }, { status: 405 }); }
try {
const { url } = await request.json<{ url: string }>();
if (!url) {
return Response.json({ error: 'URL is required' }, { status: 400 });
}
// Check cache (optional)
if (env.SCRAPED_CACHE) {
const cached = await env.SCRAPED_CACHE.get(url, 'json');
if (cached) {
return Response.json({ cached: true, data: cached });
}
}
// Call Firecrawl API directly using fetch
const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: url,
formats: ['markdown'],
onlyMainContent: true,
removeBase64Images: true
})
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(`Firecrawl API error (${response.status}): ${errorText}`);
}
const result = await response.json<FirecrawlScrapeResponse>();
// Cache for 1 hour (optional)
if (env.SCRAPED_CACHE && result.success) {
await env.SCRAPED_CACHE.put(
url,
JSON.stringify(result.data),
{ expirationTtl: 3600 }
);
}
return Response.json({
cached: false,
data: result.data
});
} catch (error) {
console.error('Scraping error:', error);
return Response.json(
{ error: error instanceof Error ? error.message : 'Unknown error' },
{ status: 500 }
);
}
} };
Environment Setup: Add FIRECRAWL_API_KEY in Wrangler secrets:
npx wrangler secret put FIRECRAWL_API_KEY
Optional KV Binding (for caching - add to wrangler.jsonc ):
{ "kv_namespaces": [ { "binding": "SCRAPED_CACHE", "id": "your-kv-namespace-id" } ] }
See templates/firecrawl-worker-fetch.ts for a complete production-ready example.
When to Use This Skill
✅ Use Firecrawl when:
-
Scraping modern websites with JavaScript
-
Need clean markdown output for LLMs
-
Building RAG systems from web content
-
Extracting structured data at scale
-
Dealing with bot protection
-
Need reliable, production-ready scraping
❌ Don't use Firecrawl when:
-
Scraping simple static HTML (use cheerio/beautifulsoup)
-
Have existing Puppeteer/Playwright setup working well
-
Working with APIs (use direct API calls instead)
-
Budget constraints (free tier has limits)
Common Issues & Solutions
Issue: "Invalid API Key"
Cause: API key not set or incorrect Fix:
Check env variable is set
echo $FIRECRAWL_API_KEY
Verify key format (should start with fc-)
Issue: "Rate limit exceeded"
Cause: Exceeded monthly credits Fix:
-
Check usage in dashboard
-
Upgrade plan or wait for reset
-
Use onlyMainContent: true to reduce credits
Issue: "Timeout error"
Cause: Page takes too long to load Fix:
result = app.scrape_url(url, params={"waitFor": 10000}) # Wait 10s
Issue: "Content is empty"
Cause: Content loaded via JavaScript after initial render Fix:
result = app.scrape_url(url, params={ "waitFor": 5000, "actions": [{"type": "wait", "milliseconds": 3000}] })
Advanced Features
Browser Actions
Perform interactions before scraping:
result = app.scrape_url( url="https://example.com", params={ "actions": [ {"type": "click", "selector": "button.load-more"}, {"type": "wait", "milliseconds": 2000}, {"type": "scroll", "direction": "down"} ] } )
Custom Headers
result = app.scrape_url( url="https://example.com", params={ "headers": { "User-Agent": "Custom Bot 1.0", "Accept-Language": "en-US" } } )
Webhooks for Long Crawls
Instead of polling, receive results via webhook:
crawl = app.crawl_url( url="https://docs.example.com", params={ "limit": 1000, "webhook": "https://your-domain.com/webhook" } )
Package Versions
Package Version Last Checked
firecrawl-py 4.5.0+ 2025-10-20
@mendable/firecrawl-js (or firecrawl) 4.4.1+ 2025-10-24
API Version v2 Current
Note: The Node.js SDK requires Node.js >=22.0.0 and cannot run in Cloudflare Workers. Use direct REST API calls in Workers (see Cloudflare Workers Integration section).
Official Documentation
-
Python SDK: https://docs.firecrawl.dev/sdks/python
-
Node.js SDK: https://docs.firecrawl.dev/sdks/node
-
API Reference: https://docs.firecrawl.dev/api-reference
-
Dashboard: https://www.firecrawl.dev/app
Next Steps After Using This Skill
-
Store scraped data: Use Cloudflare D1, R2, or KV to persist results
-
Build RAG system: Combine with Vectorize for semantic search
-
Add scheduling: Use Cloudflare Queues for recurring scrapes
-
Process content: Use Workers AI to analyze scraped data
Token Savings: ~60% vs manual integration Error Prevention: API authentication, rate limiting, format handling Production Ready: ✅