crawl4ai-skill

Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scraping. 智能搜索爬取 | Free, no API key required.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "crawl4ai-skill" with this command: npx skills add lancelin111/crawl4ai-skill/lancelin111-crawl4ai-skill-crawl4ai-skill

Crawl4AI Skill - Web Crawler & Scraper

Web Crawling 网页爬虫 | Web Scraping 网页爬取 | LLM 优化输出

智能网页爬虫和爬取工具,支持搜索、全站爬取、动态页面抓取。Free web crawler and scraper with LLM-optimized Markdown output.

核心功能 | Core Features

  • 🔍 Web Search 网页搜索 - DuckDuckGo search, 免 API key
  • 🕷️ Web Crawling 网页爬虫 - Site crawler, spider, sitemap 识别
  • 📝 Web Scraping 网页抓取 - Smart scraper, data extraction
  • 📄 LLM-Optimized Output - Fit Markdown, 省 Token 80%
  • Dynamic Page Scraping - JavaScript 渲染页面爬取

快速开始 | Quick Start

安装 | Installation

pip install crawl4ai-skill

Web Search | 网页搜索

# Search the web with DuckDuckGo
crawl4ai-skill search "python web scraping"

Web Scraping | 单页爬取

# Scrape a single web page
crawl4ai-skill crawl https://example.com

Web Crawling | 全站爬虫

# Crawl entire website / spider
crawl4ai-skill crawl-site https://docs.python.org --max-pages 50

使用场景 | Use Cases

场景 1:Web Crawler for Documentation | 文档站爬虫

# Crawl documentation site with spider
crawl4ai-skill crawl-site https://docs.fastapi.com --max-pages 100

爬虫效果 | Crawler Output:

  • ❌ 移除:导航栏、侧边栏、广告
  • ✅ 保留:标题、正文、代码块
  • 📊 Token:50,000 → 10,000(-80%)

场景 2:Search + Scrape | 搜索+爬取

# Search and scrape top results
crawl4ai-skill search-and-crawl "Vue 3 best practices" --crawl-top 3

场景 3:Dynamic Page Scraping | 动态页面抓取

JavaScript 渲染的页面爬取(雪球、知乎等):

# Scrape JavaScript-heavy pages
crawl4ai-skill crawl https://xueqiu.com/S/BIDU --wait-until networkidle --delay 2

命令参考 | Commands

命令 Command说明 Description
search <query>Web search 网页搜索
crawl <url>Web scraping 单页爬取
crawl-site <url>Web crawling 全站爬虫
search-and-crawl <query>Search + scrape 搜索并爬取

常用参数 | Common Options

# Web Search 搜索
--num-results 10          # Number of results

# Web Scraping 爬取
--format fit_markdown     # Output format
--output result.md        # Output file
--wait-until networkidle  # Wait strategy for dynamic pages
--delay 2                 # Additional wait time (seconds)
--wait-for ".selector"    # Wait for specific element

# Web Crawling 爬虫
--max-pages 100          # Max pages to crawl
--max-depth 3            # Max crawl depth

输出格式 | Output Formats

fit_markdown(推荐 Recommended)

智能提取,节省 80% Token。Smart extraction, save 80% tokens.

crawl4ai-skill crawl https://example.com --format fit_markdown

raw_markdown

保留完整结构。Preserve full structure.

crawl4ai-skill crawl https://example.com --format raw_markdown

为什么选择这个爬虫?| Why This Crawler?

免费爬虫 Free Crawler - 无需 API key,开箱即用
智能爬取 Smart Scraper - 自动去噪,提取核心内容
全站爬虫 Site Crawler - 支持 sitemap,递归爬取
动态爬取 Dynamic Scraping - JavaScript 渲染页面支持
搜索集成 Search Integration - DuckDuckGo 搜索内置


链接 | Links

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

search

Search the web and get relevant results optimized for LLM consumption.

Repository SourceNeeds Review
11.9K86tavily-ai
General

crawl

No summary provided by upstream source.

Repository SourceNeeds Review
General

web-scraping

No summary provided by upstream source.

Repository SourceNeeds Review
General

web-scraping

No summary provided by upstream source.

Repository SourceNeeds Review