Schema Normalizer (NO PROSE)
Purpose: close a common failure mode in skills-first pipelines: schema drift across JSONL artifacts.
When fields are inconsistent (missing ids/titles, mixed citation-key formats), downstream skills start doing best-effort joins and fragile parsing. This skill makes the interface explicit and deterministic.
Inputs
-
outline/outline.yml (source of truth for section/subsection ids + titles)
-
Optional (for citation-key sanity): citations/ref.bib
-
Default JSONL artifacts to normalize (arxiv-survey(-latex) C4 bridge):
-
outline/subsection_briefs.jsonl
-
outline/chapter_briefs.jsonl
-
outline/evidence_bindings.jsonl
-
outline/evidence_drafts.jsonl
-
outline/anchor_sheet.jsonl
-
Optional (run after writer packs are generated):
-
outline/writer_context_packs.jsonl
Outputs
-
output/SCHEMA_NORMALIZATION_REPORT.md (always written; PASS/FAIL + what changed)
-
The processed JSONL files are normalized in place (a .bak.* is created if changes are applied).
What gets normalized
- IDs + titles (join keys)
For any record with sub_id: "<H2>.<H3>" :
-
Ensure section_id exists (derived from the prefix before the dot)
-
Ensure title , section_title exist (filled from outline/outline.yml )
For any record with section_id: "<H2>" :
- Ensure section_title exists (filled from outline/outline.yml )
- Citation key format (reduce parsing drift)
Within these C2-C4 JSONL artifacts, normalize citation keys so they are raw BibTeX keys (no @ prefix):
- "citations": ["smith2023", "jones2024"]
Notes:
-
Final prose still uses Markdown citations: [@smith2023] .
-
This skill does not add/remove citations; it only normalizes formatting.
When to run
Recommended placement in arxiv-survey(-latex) :
- Run after evidence-draft
- anchor-sheet and before writer-context-pack
- evidence-selfloop .
- This ensures outline/evidence_drafts.jsonl and outline/anchor_sheet.jsonl are schema-stable before drafting packs are built.
Failure modes
-
If outline/outline.yml is missing or cannot be parsed, the skill FAILs.
-
If any target JSONL contains invalid JSON lines, the skill reports them and FAILs (do not proceed on corrupted artifacts).
Script (optional)
Quick Start
-
python .codex/skills/schema-normalizer/scripts/run.py --help
-
Normalize the C4 bridge artifacts:
-
python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws>
All Options
-
--workspace <dir>
-
--unit-id <U###>
-
--inputs <semicolon-separated>
-
--outputs <semicolon-separated>
-
--checkpoint <C#>
Examples
Normalize the default C4 artifacts (ids/titles + citations format):
-
python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/subsection_briefs.jsonl;outline/chapter_briefs.jsonl;outline/evidence_bindings.jsonl;outline/evidence_drafts.jsonl;outline/anchor_sheet.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md
Normalize writer packs too (if you are running this after writer-context-pack ):
- python .codex/skills/schema-normalizer/scripts/run.py --workspace workspaces/<ws> --inputs outline/outline.yml;citations/ref.bib;outline/writer_context_packs.jsonl --outputs output/SCHEMA_NORMALIZATION_REPORT.md