sitemap-generator

Generate XML sitemaps by crawling a website or scanning local files. Auto-discovers pages via link extraction. Supports local HTML/MD file scanning with lastmod dates. Generates robots.txt with sitemap reference. Use when asked to create a sitemap, generate sitemap.xml, crawl a site for pages, create robots.txt, or prepare a site for SEO. Triggers on "sitemap", "sitemap.xml", "crawl site", "site map", "robots.txt", "SEO sitemap".

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "sitemap-generator" with this command: npx skills add charlie-morrison/cm-sitemap-generator

Sitemap Generator

Generate XML sitemaps by crawling a live website or scanning local HTML files.

Crawl a Website

python3 scripts/sitemap_gen.py https://example.com

Scan Local Files

python3 scripts/sitemap_gen.py --local ./public --base-url https://example.com

Save to File

# Save sitemap.xml
python3 scripts/sitemap_gen.py https://example.com --output sitemap.xml

# Save sitemap.xml + robots.txt
python3 scripts/sitemap_gen.py https://example.com --output sitemap.xml --robots

Output Formats

# XML (default — valid sitemap.xml)
python3 scripts/sitemap_gen.py https://example.com

# Text (human-readable summary + XML)
python3 scripts/sitemap_gen.py https://example.com --format text

# JSON (pages list + XML string)
python3 scripts/sitemap_gen.py https://example.com --format json

Options

FlagDefaultDescription
--max-pages500Maximum pages to crawl
--timeout10Request timeout in seconds
--output / -ostdoutSave sitemap.xml to file
--robotsoffAlso generate robots.txt
--localoffScan local directory instead of crawling
--base-urlBase URL for local mode (required)
--verbose / -voffShow crawl progress

Features

  • Crawl mode: BFS link discovery, same-domain only, deduplication
  • Local mode: Scan HTML/HTM/MD/PHP files, auto-detect lastmod from file mtime
  • Smart filtering: Skips images, CSS, JS, PDFs, archives, media files
  • URL normalization: Removes fragments, normalizes trailing slashes
  • robots.txt generation: User-agent + Allow + Sitemap reference
  • Valid XML: Proper XML escaping, sitemaps.org schema

Requirements

  • Python 3.6+
  • No external dependencies (stdlib only)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Project Init

Project initialization toolkit. contributing - auto-generate CONTRIBUTING.md from project structure [contributing.md]. "init", "project init", "initialize pr...

Registry SourceRecently Updated
General

通用报告生成器 consulting-report-generator

通用型专业报告生成技能。支持任意内容输入(PPT/文字/PDF/图片),自动识别内容类型, 自适应生成结构严谨的专业总结报告。覆盖精益生产、智能制造、计划物控(PC&MC)、 数字化转型、AI与工业智能五大领域,同时支持项目管理、市场分析、技术总结等通用场景。 基于mck-ppt-design专业布局框架(70+...

Registry SourceRecently Updated
General

Yandex Weather Smarthome

Gets current weather and short forecast (today/tomorrow) for the user's configured home location via Yandex Weather API. Trigger when user asks about weather...

Registry SourceRecently Updated
General

Audio Transcribe

This skill should be used when the user explicitly asks to "transcribe a meeting", "transcribe audio", "transcribe a meeting recording", "convert audio to te...

Registry SourceRecently Updated
3370zxkane