Paragraph Curator (select -> evaluate -> subset -> fuse)
Purpose: turn “keep rewriting and getting longer” into a controlled convergence step.
This skill adds a decision layer between “draft paragraphs” and “polish voice”:
-
keep the best paragraphs
-
merge redundant ones
-
rewrite for clearer argument moves
-
expand only when coverage is missing (using existing evidence cards)
This is a content-structure pass (not a style pass). Run style-harmonizer and opener-variator after curation.
Inputs
Required:
-
sections/ (especially H3 bodies: sections/S<sub_id>.md )
-
outline/writer_context_packs.jsonl (what each H3 must cover + allowed citations)
-
output/ARGUMENT_SKELETON.md (single source of truth for terminology + premises)
Recommended:
-
output/SECTION_ARGUMENT_SUMMARIES.jsonl (paragraph moves + outputs)
-
output/SECTION_LOGIC_REPORT.md (paragraph linkage risks)
-
output/WRITER_SELFLOOP_TODO.md (style smells / scope/citation warnings)
Outputs
-
Updated sections/*.md (same filenames; body-only; no headings)
-
output/PARAGRAPH_CURATION_REPORT.md (short; PASS/FAIL + what changed)
-
Create sections/paragraphs_curated.refined.ok when done (empty file; pipeline contract signal)
What this skill optimizes (rubric)
You are not trying to “shorten”. You are trying to increase information density while keeping the section verifiable.
Score each paragraph on a simple 0-2 rubric:
Criterion 0 (bad) 1 (ok) 2 (good)
Coverage does not match any required axis/card matches one axis, thin directly executes a must-use card/comparison
Novelty repeats nearby content partially redundant adds a distinct comparison/insight
Move clarity unclear what it does move exists, weak output clear move + reusable output
Consistency premise/term drift vs skeleton minor mismatch fully aligned with Consistency Contract
Citation hygiene uncited when it should be; cite-dump vibe acceptable citations are local and anchored (not just tail)
Fusion readiness cannot merge; tangled mergeable with edits clean unit that can be fused or kept
Decision labels:
-
KEEP : keep mostly as-is
-
REWRITE : keep content, rewrite for clearer move/output
-
FUSE : merge with neighbor(s) and rewrite into one stronger paragraph
-
REPLACE : keep the slot, but rewrite using existing evidence cards (when coverage is missing)
Paragraph budget (profile-aware)
Default per-H3 target:
-
draft_profile=survey : 10-12 paragraphs
-
draft_profile=deep : 11-13 paragraphs
If you exceed the budget, do not delete content blindly. Prefer FUSE (merge redundancy) and make the fused paragraph denser.
Must-have coverage checklist (per H3)
Each H3 must contain at least:
-
1x Definition/Setup (only if this H3 introduces a new term/protocol field)
-
2x concrete Contrast paragraphs (A-vs-B comparisons; not just “many papers do...”)
-
1x Evaluation anchor paragraph (task + metric + constraint/budget/tool access; cite-backed)
-
1x cross-paper Synthesis paragraph (what generalizes, what does not; cite-backed)
-
1x Boundary/Failure paragraph (limitations; threats to validity; cite-backed when possible)
-
1x Local conclusion (a reusable takeaway used downstream)
If any item is missing, use REPLACE to write that paragraph from the writer context pack (do not invent new facts).
Workflow (minimal)
-
Pick the target set
-
Start with the H3 bodies listed in output/SECTION_LOGIC_REPORT.md , plus any H3 flagged in output/WRITER_SELFLOOP_TODO.md as repetitive/template-y, plus any H3 that keeps growing across edits.
-
Work file-by-file: each target is a concrete sections/S<sub_id>.md .
-
Build a paragraph inventory (scratch only; do not paste into the paper)
-
If output/SECTION_ARGUMENT_SUMMARIES.jsonl exists, use its per-paragraph moves /output as the first draft of your inventory, then reconcile with the actual text. For each paragraph, write one line:
-
P<i> :: move(s) -> output (1 sentence) :: citations (keys)
-
Apply the rubric and label each paragraph
-
Mark KEEP/REWRITE/FUSE/REPLACE .
-
If two adjacent paragraphs repeat the same axis, FUSE .
-
For any paragraph you plan to change (REWRITE /REPLACE /FUSE ), draft 2-3 candidate rewrites in parallel (different angles: contrast-first / protocol-first / synthesis-first).
-
Score candidates quickly with the rubric; keep one winner (or fuse two if they cover complementary axes).
-
Keep citation keys unchanged while sampling; you are choosing surface form + structure, not changing the evidence set.
-
Construct the curated set
-
Use outline/writer_context_packs.jsonl to enforce must-have coverage (paragraph_plan/must_use/comparison_cards/limitation_hooks) without inventing new content.
-
Enforce the must-have coverage checklist.
-
Enforce the paragraph budget by fusing redundancy rather than deleting substance.
-
Fuse + rewrite (keep citation keys fixed) Rules that keep the pipeline stable:
-
Do not add/remove citation keys; when fusing, carry citations forward and re-anchor them to the right sentence.
-
Do not move citations across subsections.
-
Avoid adjacent citation blocks (e.g., [@a] [@b] ) and duplicate keys in one block (e.g., [@a; @a] ).
-
When fusing, it is often faster to write two fused candidates (one contrast-heavy, one synthesis-heavy) and pick the better one.
-
Write the report + marker
-
output/PARAGRAPH_CURATION_REPORT.md should be short and actionable:
-
- Status: PASS|FAIL
-
per H3: paragraph count before/after; what was fused; any remaining gaps
-
(minimal) how many candidates you tried for the main rewrites (e.g., 2-3), so future passes can see whether this was a real selection step
-
Create sections/paragraphs_curated.refined.ok .
Routing rules
-
If you cannot fill a missing must-have paragraph without new evidence: stop and route upstream (evidence-selfloop / C3-C4). Do not pad.
-
If you feel forced to change a definition or evaluation premise: update output/ARGUMENT_SKELETON.md# Consistency Contract first, then rerun argument-selfloop .
-
If the only issue is surface cadence/openers: do not overwork curation; run style-harmonizer / opener-variator .
Done checklist
-
Each targeted H3 stays within its paragraph budget (survey 10-12; deep 11-13) without losing required moves.
-
Redundant paragraphs are fused into denser, clearer ones (not just deleted).
-
No citation keys were added/removed; citation shape is reader-facing (no adjacent blocks, no dup keys).
-
output/PARAGRAPH_CURATION_REPORT.md exists and is understandable.
-
sections/paragraphs_curated.refined.ok exists.
Script
Quick Start
- python .codex/skills/paragraph-curator/scripts/run.py --workspace workspaces/<ws>
All Options
-
--workspace <dir> (required)
-
--unit-id <U###>
-
--inputs <semicolon-separated>
-
--outputs <semicolon-separated>
-
--checkpoint <C#>
Examples
-
Curate paragraphs in a survey workspace:
-
python .codex/skills/paragraph-curator/scripts/run.py --workspace workspaces/survey-llm-agents