robots-txt-gen

Generate, validate, and analyze robots.txt files for websites. Use when creating robots.txt from scratch, validating existing robots.txt syntax, checking if a URL is allowed/blocked by robots.txt rules, or generating robots.txt for common platforms (WordPress, Next.js, Django, Rails). Also use when auditing crawl directives or debugging search engine indexing issues.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "robots-txt-gen" with this command: npx skills add Johnnywang2001/robots-txt-gen

robots-txt-gen

Generate, validate, and test robots.txt files from the command line.

Quick Start

# Generate a robots.txt for a platform
python3 scripts/robots_txt_gen.py generate --preset nextjs --sitemap https://example.com/sitemap.xml

# Validate an existing robots.txt
python3 scripts/robots_txt_gen.py validate --file robots.txt

# Validate a remote robots.txt
python3 scripts/robots_txt_gen.py validate --url https://example.com/robots.txt

# Test if a URL is allowed for a user-agent
python3 scripts/robots_txt_gen.py test --file robots.txt --url /admin/dashboard --agent Googlebot

# Generate with custom rules
python3 scripts/robots_txt_gen.py generate --allow "/" --disallow "/admin" --disallow "/api" --disallow "/private" --sitemap https://example.com/sitemap.xml --agent "*"

Commands

generate

Create a robots.txt file with custom rules or platform presets.

Options:

  • --preset <name> — Use a platform preset: wordpress, nextjs, django, rails, laravel, static, spa, ecommerce
  • --agent <name> — User-agent (default: *). Repeat for multiple agents.
  • --allow <path> — Allow path. Repeatable.
  • --disallow <path> — Disallow path. Repeatable.
  • --sitemap <url> — Sitemap URL. Repeatable.
  • --crawl-delay <seconds> — Crawl delay directive.
  • --block-ai — Add rules to block common AI crawlers (GPTBot, ChatGPT-User, CCBot, Google-Extended, anthropic-ai, etc.)
  • --output <file> — Write to file instead of stdout.

validate

Check a robots.txt file for syntax errors and best-practice warnings.

Options:

  • --file <path> — Local file to validate.
  • --url <url> — Remote robots.txt URL to fetch and validate.

test

Test whether a specific URL path is allowed or disallowed for a given user-agent.

Options:

  • --file <path> — robots.txt file to test against.
  • --url <path> — URL path to test (e.g., /admin/login).
  • --agent <name> — User-agent to test as (default: Googlebot).

Platform Presets

PresetWhat it blocksNotes
wordpress/wp-admin/, /wp-includes/, query paramsAllows /wp-admin/admin-ajax.php
nextjs/_next/static/, /api/, /.next/Standard Next.js paths
django/admin/, /static/admin/, /media/private/Django admin and private media
rails/admin/, /assets/, /tmp/Rails conventions
laravel/admin/, /storage/, /vendor/Laravel conventions
staticNothing blockedSimple allow-all with sitemap
spa/api/, /assets/Single-page app pattern
ecommerce/cart/, /checkout/, /account/, /search?Prevents crawling user sessions

AI Crawler Blocking

The --block-ai flag adds disallow rules for known AI training crawlers:

  • GPTBot, ChatGPT-User (OpenAI)
  • Google-Extended (Google AI)
  • CCBot (Common Crawl)
  • anthropic-ai (Anthropic)
  • Bytespider (ByteDance)
  • ClaudeBot (Anthropic)
  • FacebookBot (Meta)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Dead Link Scanner

Scan websites, markdown files, and HTML files for broken links (dead links). Use when checking a website for 404s, validating links in documentation or READM...

Registry SourceRecently Updated
550Profile unavailable
Security

Web Performance Engine

Performs comprehensive web performance audits, diagnoses bottlenecks, and provides targeted fixes for server, rendering, hero element, JavaScript, and layout...

Registry SourceRecently Updated
4590Profile unavailable
Automation

ai-news-pipeline-new

Run a self-contained Chinese and international AI news workflow inside the current workspace. Use when the user wants to collect RSS news, filter domestic an...

Registry SourceRecently Updated
440Profile unavailable
Automation

Web Gateway

Minimal Flask-based multi-user chat interface enabling OpenClaw HTTP integration with persistent UI state and optional Google Maps support.

Registry SourceRecently Updated
960Profile unavailable