tavily-crawl

Crawl any website and save pages as local markdown files. Ideal for downloading documentation, knowledge bases, or web content for offline access or analysis.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "tavily-crawl" with this command: npx skills add matthew77/liang-tavily-crawl

Tavily Crawl

Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.

Authentication

Get your API key at https://tavily.com and add to your OpenClaw config:

{
  "skills": {
    "entries": {
      "tavily-crawl": {
        "enabled": true,
        "apiKey": "tvly-YOUR_API_KEY_HERE"
      }
    }
  }
}

Or set in environment variable:

export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"

Quick Start

Using the Script

node {baseDir}/scripts/crawl.mjs "https://docs.example.com"
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --output ./docs
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 --limit 50

Examples

# Basic crawl
node {baseDir}/scripts/crawl.mjs "https://docs.example.com"

# Deeper crawl with limits
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --limit 50

# Save to files
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --output ./docs

# Focused crawl with path filters
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 \
  --select "/docs/.*" --exclude "/blog/.*"

# With semantic instructions
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" \
  --instructions "Find API documentation" --chunks 3

Options

OptionDescriptionDefault
--depth <n>Crawl depth (1-5)1
--breadth <n>Links per page20
--limit <n>Total pages cap50
--output <dir>Save pages to directory-
--instructions <text>Natural language guidance-
--chunks <n>Chunks per page (1-5, requires instructions)-
--depth-mode <mode>Extract depth: basic or advancedbasic
--select <pattern>Regex pattern to include-
--exclude <pattern>Regex pattern to exclude-
--timeout <sec>Max wait time (10-150 seconds)150
--jsonOutput raw JSONfalse

Depth vs Performance

DepthTypical PagesTime
110-50Seconds
250-500Minutes
3500-5000Many minutes

Start with --depth 1 and increase only if needed.

Crawl for Context vs Data Collection

For agentic use (feeding results into context): Always use --instructions + --chunks. This returns only relevant chunks instead of full pages, preventing context window explosion.

For data collection (saving to files): Omit --chunks to get full page content.

Tips

  • Always use --chunks for agentic workflows - prevents context explosion when feeding results to LLMs
  • Omit --chunks only for data collection - when saving full pages to files
  • Start conservative (--depth 1, --limit 20) and scale up
  • Use path patterns to focus on relevant sections
  • Always set a --limit to prevent runaway crawls

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Tradealpha Realtime News

Fetch realtime TradeAlpha news across Reuters, Bloomberg, Truth Social, research alerts, and domestic news sources via `POST /api/v1/news/realtime_news`. Use...

Registry SourceRecently Updated
Research

Max-Self-Improvement

MiniMax Agent self-evolution system with 5-layer memory for continuous learning, error analysis, and persistent personalized context management.

Registry SourceRecently Updated
Research

AI Lead Magnet

Researches your audience’s exact needs, builds a high-converting lead magnet with full copy, automated delivery, promo video, and distribution plan to gain 5...

Registry SourceRecently Updated
Research

Gougoubi Arena Trade

Trade in the Gougoubi AI Trading Arena — a $10,000 simulated-USDT paper trading leaderboard fulfilled against real Binance / OKX / HTX / Hyperliquid order bo...

Registry SourceRecently Updated