firecrawl-known-pitfalls

Firecrawl Known Pitfalls

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "firecrawl-known-pitfalls" with this command: npx skills add jeremylongshore/claude-code-plugins-plus-skills/jeremylongshore-claude-code-plugins-plus-skills-firecrawl-known-pitfalls

Firecrawl Known Pitfalls

Overview

Real gotchas when using Firecrawl for web scraping and crawling. Firecrawl handles JavaScript rendering and anti-bot bypassing, but its async crawl model and credit-based pricing create specific failure modes.

Prerequisites

  • Firecrawl API key configured

  • Understanding of async job patterns

  • Awareness of credit-based billing model

Instructions

Step 1: Handle Async Crawl Jobs Properly

crawlUrl returns a job ID, not results. Polling too aggressively wastes credits and may trigger rate limits.

import FirecrawlApp from '@mendable/firecrawl-js'; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

// BAD: assuming synchronous results const result = await firecrawl.crawlUrl('https://example.com'); console.log(result.data); // This is a job object, not page data!

// GOOD: use the async crawl with proper polling const crawl = await firecrawl.asyncCrawlUrl('https://example.com', { limit: 50, scrapeOptions: { formats: ['markdown'] } }); // Poll with backoff let status; do { await new Promise(r => setTimeout(r, 5000)); # 5000: 5 seconds in ms status = await firecrawl.checkCrawlStatus(crawl.id); } while (status.status === 'scraping');

Step 2: Avoid Credit Burn on Large Sites

Firecrawl charges per page. Crawling without limits on large sites burns credits fast.

// BAD: no limit on a site with 100K pages await firecrawl.crawlUrl('https://docs.large-project.org'); // burns entire quota

// GOOD: set explicit limits and use URL filters await firecrawl.crawlUrl('https://docs.large-project.org', { limit: 100, includePaths: ['/api/', '/guides/'], excludePaths: ['/changelog/', '/blog/'], maxDepth: 3 });

Step 3: Don't Assume Markdown Output by Default

Firecrawl can return HTML, markdown, links, or screenshots. Not specifying format returns raw HTML.

// BAD: getting HTML when you wanted clean text const result = await firecrawl.scrapeUrl('https://example.com'); // result.html exists but result.markdown may be absent

// GOOD: specify output format explicitly const result = await firecrawl.scrapeUrl('https://example.com', { formats: ['markdown', 'links'], onlyMainContent: true // strips nav, footer, sidebars }); console.log(result.markdown);

Step 4: Handle JavaScript-Heavy Pages

Some SPAs need extra wait time for content to render. Default timeouts may capture loading states.

// BAD: scraping an SPA with default settings const result = await firecrawl.scrapeUrl('https://app.example.com/dashboard'); // Gets "Loading..." instead of actual content

// GOOD: configure wait time for JS rendering const result = await firecrawl.scrapeUrl('https://app.example.com/dashboard', { waitFor: 5000, // wait 5s for JS to render # 5000: 5 seconds in ms formats: ['markdown'], onlyMainContent: true });

Step 5: Respect robots.txt and Rate Limits

Firecrawl honors robots.txt by default. Disabling it risks IP bans and legal issues.

// BAD: aggressive crawling that ignores site limits await firecrawl.crawlUrl('https://example.com', { limit: 10000, # 10000: 10 seconds in ms // No delay between requests = potential IP ban });

// GOOD: respect site constraints await firecrawl.crawlUrl('https://example.com', { limit: 200, # HTTP 200 OK maxDepth: 3, // Firecrawl handles rate limiting internally });

Error Handling

Issue Cause Solution

Empty markdown JS not rendered Increase waitFor timeout

Credit depletion No crawl limit set Always set limit parameter

402 Payment Required Out of credits Check balance before large crawls

Partial crawl results Site blocks crawler Use scrapeUrl for individual pages

Stale job status Polling stopped early Poll until completed or failed

Examples

Safe Batch Scraping

const urls = ['https://a.com', 'https://b.com', 'https://c.com']; const results = await firecrawl.batchScrapeUrls(urls, { formats: ['markdown'], onlyMainContent: true });

Resources

  • Firecrawl Docs

  • Crawl vs Scrape

Output

  • Configuration files or code changes applied to the project

  • Validation report confirming correct implementation

  • Summary of changes made and their rationale

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

backtesting-trading-strategies

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

svg-icon-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

performance-lighthouse-runner

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

mindmap-generator

No summary provided by upstream source.

Repository SourceNeeds Review