firecrawl

Web scraping and content extraction using Firecrawl API. Use when users need to crawl websites, extract structured data, convert web pages to markdown, scrape multiple URLs, or build knowledge bases from web content. Supports single page extraction, site-wide crawling, batch processing, and structured data extraction with CSS selectors.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "firecrawl" with this command: npx skills add antonia-sz/web-scraper-firecrawl

Firecrawl Skill

Powerful web scraping powered by Firecrawl - turn websites into LLM-ready markdown.

Overview

Firecrawl provides APIs for:

  • Scrape - Single page extraction to markdown
  • Crawl - Entire site crawling with depth control
  • Map - URL discovery from a starting point
  • Batch - Multiple URL processing
  • Extract - Structured data extraction with schemas

Prerequisites

  1. Firecrawl API Key - Get free tier at https://firecrawl.dev
  2. Install Python dependencies: requests

Configuration

Set environment variable:

export FIRECRAWL_API_KEY="fc-your-api-key"

Usage

Single Page Scraping

# Basic scrape
firecrawl scrape https://example.com

# With specific options
firecrawl scrape https://example.com --formats markdown,html --only-main-content

# Wait for JS rendering
firecrawl scrape https://spa-app.com --wait-for 2000

Site Crawling

# Crawl entire site (up to limit)
firecrawl crawl https://docs.example.com --limit 50

# With depth control
firecrawl crawl https://blog.example.com --max-depth 2 --limit 100

# Include/exclude patterns
firecrawl crawl https://site.com --include "/blog/*" --exclude "/admin/*"

# Custom formats
firecrawl crawl https://docs.example.com --formats markdown,links

URL Mapping

# Discover all URLs from a site
firecrawl map https://example.com

# With search term
firecrawl map https://docs.python.org --search "tutorial"

Batch Processing

# Scrape multiple URLs
firecrawl batch urls.txt --output ./scraped/

# From JSON list
firecrawl batch urls.json --formats markdown --concurrency 5

Structured Extraction

# Extract specific data using CSS selectors
firecrawl extract https://example.com/products \
  --schema '{"name": ".product-title", "price": ".price", "description": ".desc"}'

# Extract to JSON
firecrawl extract https://news.example.com/article --schema article-schema.json

Output Formats

Markdown

Clean, LLM-ready markdown with:

  • Headings preserved
  • Links converted to markdown format
  • Images with alt text
  • Tables formatted as markdown tables

HTML

Raw or cleaned HTML

Links

Extracted link lists for further crawling

Screenshot

Page screenshot (if requested)

Use Cases

Knowledge Base Building

# Crawl documentation site
firecrawl crawl https://docs.framework.com --limit 200 -o ./kb/

# Merge into single file for RAG
cat ./kb/*.md > knowledge-base.md

Research & Analysis

# Scrape competitor pricing
firecrawl batch competitors.txt --extract pricing-schema.json

# Monitor blog updates
firecrawl map https://blog.company.com --since 2024-01-01

Content Migration

# Export old CMS content
firecrawl crawl https://old-site.com --formats markdown,html -o ./export/

Scripts

All functionality via scripts/firecrawl.py:

  • Handles API authentication
  • Automatic rate limiting
  • Retry logic for failures
  • Progress tracking for large crawls

Integration

Works well with:

  • markdown-sync-pro - Sync scraped content to Notion/GitHub
  • arxiv-paper - Combine with academic paper downloads
  • maybe-finance - Scrape financial data for analysis

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Skrape

Ethical web data extraction with robots exclusion protocol adherence, throttled scraping requests, and privacy-compliant handling ("Scrape responsibly!").

Registry SourceRecently Updated
2830Profile unavailable
General

Ecom Monitor - 电商数据分析助手

电商数据分析助手 - 导入和管理商品价格数据,生成竞品分析报表,设置价格/库存预警。适用于竞品价格追踪、库存管理、销售报表生成。

Registry SourceRecently Updated
4711Profile unavailable
General

AI领域重点企业资讯抓取与简报生成

Collect, filter, classify AI industry news, generate Chinese titles and summaries, and export Excel and Word briefs based on company lists and sources.

Registry SourceRecently Updated
3980Profile unavailable
General

Web to Excel

从网页抓取结构化数据并填写到任意 Excel 文件的通用技能。 触发场景: - 用户说"帮我把网页上的参数填到 Excel"、"从网站抓数据到表格"、 "网页参数录入 Excel"、"爬取数据并填写 Excel"、或任何类似表达 - 用户提供网址和 Excel 文件,要求自动抓取并填写 - 批量从多个网页抓取数据...

Registry SourceRecently Updated
891Profile unavailable