firecrawl-architecture-variants

Firecrawl Architecture Variants

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "firecrawl-architecture-variants" with this command: npx skills add jeremylongshore/claude-code-plugins-plus-skills/jeremylongshore-claude-code-plugins-plus-skills-firecrawl-architecture-variants

Firecrawl Architecture Variants

Overview

Deployment architectures for Firecrawl web scraping at different scales. Firecrawl's async crawl model, credit billing, and JavaScript rendering support different architectures from simple page scraping to enterprise content ingestion pipelines.

Prerequisites

  • Firecrawl API configured

  • Clear scraping use case defined

  • Infrastructure for async job processing

Instructions

Step 1: On-Demand Scraping (Simple)

Best for: Single-page scraping, < 500 pages/day, content extraction.

User Request -> Backend -> Firecrawl scrapeUrl -> Parse Content -> Response

app.post('/extract', async (req, res) => { const result = await firecrawl.scrapeUrl(req.body.url, { formats: ['markdown'], onlyMainContent: true }); res.json({ content: result.markdown, title: result.metadata.title }); });

Step 2: Scheduled Crawl Pipeline (Moderate)

Best for: Content monitoring, 500-10K pages/day, documentation indexing.

Scheduler (cron) -> Crawl Queue -> Firecrawl crawlUrl -> Result Store | v Content Processor -> Search Index

// Scheduled crawler cron.schedule('0 2 * * *', async () => { // Daily at 2 AM const sites = await db.getCrawlTargets(); for (const site of sites) { const crawl = await firecrawl.asyncCrawlUrl(site.url, { limit: site.maxPages, includePaths: site.paths }); await db.saveCrawlJob({ siteId: site.id, jobId: crawl.id }); } });

// Separate worker polls for results async function processCrawlResults() { const pending = await db.getPendingCrawlJobs(); for (const job of pending) { const status = await firecrawl.checkCrawlStatus(job.jobId); if (status.status === 'completed') { await indexPages(status.data); await db.markComplete(job.id); } } }

Step 3: Real-Time Content Pipeline (Scale)

Best for: Enterprise, 10K+ pages/day, AI training data, knowledge base.

URL Sources -> Priority Queue -> Firecrawl Workers -> Content Validation | v Vector DB + Search Index | v RAG / AI Pipeline

class ContentPipeline { async ingest(urls: string[], priority: 'high' | 'normal' | 'low') { const budget = this.creditTracker.canAfford(urls.length); if (!budget) throw new Error('Daily credit budget exceeded');

const results = await firecrawl.batchScrapeUrls(urls, {
  formats: ['markdown'], onlyMainContent: true
});

const validated = results.filter(r => this.validateContent(r));
await this.vectorStore.upsert(validated);
this.creditTracker.record(urls.length);
return { ingested: validated.length, rejected: urls.length - validated.length };

} }

Decision Matrix

Factor On-Demand Scheduled Real-Time Pipeline

Volume < 500/day 500-10K/day 10K+/day

Latency Sync (2-10s) Async (hours) Async (minutes)

Use Case Single page Site monitoring Knowledge base

Cost Control Per-request Per-crawl budget Credit pipeline

Error Handling

Issue Cause Solution

Slow scraping in request path Synchronous scrapeUrl Move to async pipeline

Stale content Infrequent crawling Increase crawl frequency

Credit overrun No budget tracking Implement credit circuit breaker

Duplicate content Re-crawling same pages Dedup by URL hash before indexing

Examples

Architecture Selection

< 500 pages/day, user-facing: On-Demand # HTTP 500 Internal Server Error 500-10K pages, batch processing: Scheduled Pipeline # HTTP 500 Internal Server Error 10K+, AI/ML ingestion: Real-Time Pipeline

Resources

  • Firecrawl API Docs

Output

  • Configuration files or code changes applied to the project

  • Validation report confirming correct implementation

  • Summary of changes made and their rationale

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

backtesting-trading-strategies

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

svg-icon-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

performance-lighthouse-runner

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

mindmap-generator

No summary provided by upstream source.

Repository SourceNeeds Review