robots-txt-gen
Generate, validate, and test robots.txt files from the command line.
Quick Start
# Generate a robots.txt for a platform
python3 scripts/robots_txt_gen.py generate --preset nextjs --sitemap https://example.com/sitemap.xml
# Validate an existing robots.txt
python3 scripts/robots_txt_gen.py validate --file robots.txt
# Validate a remote robots.txt
python3 scripts/robots_txt_gen.py validate --url https://example.com/robots.txt
# Test if a URL is allowed for a user-agent
python3 scripts/robots_txt_gen.py test --file robots.txt --url /admin/dashboard --agent Googlebot
# Generate with custom rules
python3 scripts/robots_txt_gen.py generate --allow "/" --disallow "/admin" --disallow "/api" --disallow "/private" --sitemap https://example.com/sitemap.xml --agent "*"
Commands
generate
Create a robots.txt file with custom rules or platform presets.
Options:
--preset <name>— Use a platform preset:wordpress,nextjs,django,rails,laravel,static,spa,ecommerce--agent <name>— User-agent (default:*). Repeat for multiple agents.--allow <path>— Allow path. Repeatable.--disallow <path>— Disallow path. Repeatable.--sitemap <url>— Sitemap URL. Repeatable.--crawl-delay <seconds>— Crawl delay directive.--block-ai— Add rules to block common AI crawlers (GPTBot, ChatGPT-User, CCBot, Google-Extended, anthropic-ai, etc.)--output <file>— Write to file instead of stdout.
validate
Check a robots.txt file for syntax errors and best-practice warnings.
Options:
--file <path>— Local file to validate.--url <url>— Remote robots.txt URL to fetch and validate.
test
Test whether a specific URL path is allowed or disallowed for a given user-agent.
Options:
--file <path>— robots.txt file to test against.--url <path>— URL path to test (e.g.,/admin/login).--agent <name>— User-agent to test as (default:Googlebot).
Platform Presets
| Preset | What it blocks | Notes |
|---|---|---|
wordpress | /wp-admin/, /wp-includes/, query params | Allows /wp-admin/admin-ajax.php |
nextjs | /_next/static/, /api/, /.next/ | Standard Next.js paths |
django | /admin/, /static/admin/, /media/private/ | Django admin and private media |
rails | /admin/, /assets/, /tmp/ | Rails conventions |
laravel | /admin/, /storage/, /vendor/ | Laravel conventions |
static | Nothing blocked | Simple allow-all with sitemap |
spa | /api/, /assets/ | Single-page app pattern |
ecommerce | /cart/, /checkout/, /account/, /search? | Prevents crawling user sessions |
AI Crawler Blocking
The --block-ai flag adds disallow rules for known AI training crawlers:
- GPTBot, ChatGPT-User (OpenAI)
- Google-Extended (Google AI)
- CCBot (Common Crawl)
- anthropic-ai (Anthropic)
- Bytespider (ByteDance)
- ClaudeBot (Anthropic)
- FacebookBot (Meta)