taxonomy-builder

Taxonomy Builder (router, compatibility mode)

Build outline/taxonomy.yml from papers/core_set.csv .

P0 compatibility note:

The output contract stays the same (outline/taxonomy.yml , YAML list, >=2 levels, concrete descriptions).
Curated domain taxonomies now live in assets/domain_packs/*.yaml instead of Python prose.
scripts/run.py stays a deterministic scaffold/helper: detect domain pack -> load pack when available -> otherwise fall back to the generic builder.

Load Order

references/overview.md
references/taxonomy_principles.md
If a domain pack applies, read its references/domain_pack_<domain>.md and assets/domain_packs/<domain>.yaml
Otherwise read references/archetypes_generic.md
Calibrate naming/description quality with references/examples_good.md and references/examples_bad.md

Current compatibility packs:

Inputs

Outputs

Asset contract

assets/taxonomy_schema.json : machine-readable shape for domain packs / output expectations
assets/domain_packs/*.yaml : compatibility domain packs for supported domains

Script role

Use scripts/run.py only for deterministic help:

When to refine manually

Refine the generated taxonomy before marking the unit DONE if:

Quick start

python .codex/skills/taxonomy-builder/scripts/run.py --help
python .codex/skills/taxonomy-builder/scripts/run.py --workspace <workspace_dir>

Execution notes

When running in compatibility mode, scripts/run.py currently reads:

papers/core_set.csv as the required corpus input
papers/papers_dedup.jsonl when present for extra title/abstract signals
GOAL.md , queries.md , and DECISIONS.md as optional domain/profile hints during pack selection

Script

Quick Start

python .codex/skills/taxonomy-builder/scripts/run.py --workspace <workspace_dir>

All Options

Examples

python .codex/skills/taxonomy-builder/scripts/run.py --workspace workspaces/<ws>

Troubleshooting

If the wrong domain pack is chosen, inspect GOAL.md , queries.md , and the pack detect rules before changing Python.
If outline/taxonomy.yml already contains a real non-placeholder taxonomy, the script intentionally returns without overwriting it.
If no pack matches, the script falls back to the generic builder.

Source Transparency