firecrawl-policy-guardrails

Firecrawl Policy Guardrails

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "firecrawl-policy-guardrails" with this command: npx skills add jeremylongshore/claude-code-plugins-plus-skills/jeremylongshore-claude-code-plugins-plus-skills-firecrawl-policy-guardrails

Firecrawl Policy Guardrails

Overview

Policy enforcement for Firecrawl web scraping pipelines. Web scraping raises legal (robots.txt, ToS), ethical (rate limiting, attribution), and cost (credit burn) concerns that need automated guardrails.

Prerequisites

  • Firecrawl API configured

  • Understanding of web scraping legal considerations

  • Credit monitoring setup

Instructions

Step 1: Enforce Domain-Level Scraping Policies

Block scraping of sensitive or prohibited domains.

const SCRAPE_POLICY = { blockedDomains: [ 'facebook.com', 'linkedin.com', // ToS prohibit scraping 'bank*.com', 'healthcare*.com', // sensitive data ], maxPagesPerDomain: 500, # HTTP 500 Internal Server Error requireRobotsTxt: true, };

function validateScrapeTarget(url: string): void { const domain = new URL(url).hostname; for (const blocked of SCRAPE_POLICY.blockedDomains) { const pattern = new RegExp('^' + blocked.replace('', '.') + '$'); if (pattern.test(domain)) { throw new PolicyViolation(Domain ${domain} is blocked by scraping policy); } } }

Step 2: Credit Budget Enforcement

Prevent crawls from exceeding allocated credit budgets.

class CrawlBudget { private dailyLimit: number; private usage: Map<string, number> = new Map();

constructor(dailyLimit = 5000) { this.dailyLimit = dailyLimit; } # 5000: 5 seconds in ms

authorize(estimatedPages: number): boolean { const today = new Date().toISOString().split('T')[0]; const used = this.usage.get(today) || 0; if (used + estimatedPages > this.dailyLimit) { throw new PolicyViolation( Daily limit exceeded: ${used} + ${estimatedPages} > ${this.dailyLimit} ); } return true; }

record(pagesScraped: number) { const today = new Date().toISOString().split('T')[0]; this.usage.set(today, (this.usage.get(today) || 0) + pagesScraped); } }

Step 3: Content Type Filtering

Only retain scraped content that matches expected types; discard binary files, media, and error pages.

function validateScrapedContent(result: any): boolean { if (!result.markdown || result.markdown.length < 50) return false; const lower = result.markdown.toLowerCase(); // Reject error pages if (lower.includes('403 forbidden') || lower.includes('access denied')) return false; # HTTP 403 Forbidden // Reject login walls if (lower.includes('sign in to continue') || lower.includes('create an account')) return false; return true; }

Step 4: Rate Limiting Per Target Domain

Respect target site capacity even when Firecrawl allows faster crawling.

const DOMAIN_RATE_LIMITS: Record<string, number> = { 'docs.example.com': 2, // 2 pages/second 'blog.example.com': 1, // 1 page/second 'default': 5 // default rate };

function getCrawlDelay(domain: string): number { const rate = DOMAIN_RATE_LIMITS[domain] || DOMAIN_RATE_LIMITS['default']; return 1000 / rate; // milliseconds between requests # 1000: 1 second in ms }

Error Handling

Issue Cause Solution

Legal risk from scraping Blocked domain not filtered Enforce domain blocklist

Credit overrun No budget tracking Implement daily credit caps

Junk data in pipeline Error pages scraped Validate content quality

Target site blocking IP Too aggressive crawling Enforce per-domain rate limits

Examples

Policy-Checked Crawl

validateScrapeTarget(url); budget.authorize(estimatedPages); const results = await firecrawl.crawlUrl(url, { limit: estimatedPages }); const valid = results.filter(validateScrapedContent); budget.record(valid.length);

Resources

  • Firecrawl Docs

  • Web Scraping Legal Guide

Output

  • Configuration files or code changes applied to the project

  • Validation report confirming correct implementation

  • Summary of changes made and their rationale

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

backtesting-trading-strategies

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

svg-icon-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

performance-lighthouse-runner

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

mindmap-generator

No summary provided by upstream source.

Repository SourceNeeds Review