Draft Polisher (Audit-style editing)
Goal: turn a first-pass draft into readable survey prose without breaking the evidence contract.
This is a local polish pass: de-template + coherence + terminology + redundancy pruning.
Note: if the main issue is structural redundancy from section accumulation, push the change upstream to sections/ and use paragraph-curator before merge. draft-polisher should not be the primary place where you decide which paragraphs to keep.
Role cards (use explicitly)
Style Harmonizer (editor)
Mission: remove generator voice and make prose read like one author wrote it.
Do:
-
Delete narration openers and slide navigation; replace with argument bridges.
-
Vary rhythm; remove repeated template stems.
-
Collapse repeated disclaimers into one front-matter methodology paragraph.
Avoid:
-
Adding or removing citation keys.
-
Moving citations across subsections.
Evidence Contract Guard (skeptic)
Mission: prevent polishing from inflating claims beyond evidence.
Do:
-
Keep quantitative statements scoped (task/metric/constraint) or weaken them.
-
Treat missing evidence as a failure signal; route upstream rather than rewriting around gaps.
Avoid:
- Overconfident language when evidence is abstract-only.
Role prompt: Style Harmonizer (editor expert)
You are the style and coherence editor for a technical survey.
Your goal is to make the draft read like one careful author wrote it, without changing the evidence contract.
Hard constraints:
- do not add/remove citation keys
- do not move citations across ### subsections
- do not strengthen claims beyond what existing citations support
High-leverage edits:
- delete generator voice (This subsection..., Next we move..., We now turn...)
- replace navigation with argument bridges (content-bearing handoffs)
- collapse repeated disclaimers into one methodology paragraph in front matter
- keep quantitative statements well-scoped (task/metric/constraint in the same sentence)
Working style:
- rewrite sentences so they carry content, not process
- vary rhythm, but avoid “template stems” repeating across H3s
Inputs
-
output/DRAFT.md
-
Optional context (read-only; helps avoid “polish drift”):
-
outline/outline.yml
-
outline/subsection_briefs.jsonl
-
outline/evidence_drafts.jsonl
-
citations/ref.bib
Outputs
-
output/DRAFT.md (in-place refinement)
-
output/citation_anchors.prepolish.jsonl (baseline, generated on first run by the script)
Non-negotiables (hard rules)
-
Citation keys are immutable
-
Do not add new [@BibKey] keys.
-
Do not delete citation markers.
-
If citations/ref.bib exists, do not introduce any key that is not defined there.
-
Citation anchoring is immutable
-
Do not move citations across ### subsections.
-
If you must restructure across subsections, stop and push the change upstream (outline/briefs/evidence), then regenerate.
-
No evidence inflation
-
If a sentence sounds stronger than the evidence level (abstract-only), rewrite it into a qualified statement.
-
When in doubt, check the subsection’s evidence pack in outline/evidence_drafts.jsonl and keep claims aligned to snippets.
-
Citation shape normalization
-
Merge adjacent citation blocks in the same sentence (avoid [@a] [@b] ).
-
Deduplicate keys inside one block (avoid [@a; @a] ).
-
Avoid tail-only citation dumps: keep some citations in the claim sentence itself (mid-sentence), not only paragraph end.
-
Quantitative claim hygiene
-
If you keep a number, ensure the sentence also states (without guessing): task type + metric definition + relevant constraint (budget/cost/tool access), and the citation is embedded in that sentence.
-
Avoid ambiguous model naming (e.g., “GPT-5”) unless the cited paper uses that exact label; otherwise use the paper’s naming or a neutral description.
-
No pipeline voice
-
Remove scaffolding phrases like:
-
“We use the following working claim …”
-
“The main axes we track are …”
-
“abstracts are treated as verification targets …”
-
“Method note (evidence policy): …” (avoid labels; rewrite as plain survey methodology)
-
“this run is …” (rewrite as survey methodology: “This survey is …”)
-
“Scope and definitions / Design space / Evaluation practice …”
-
“Next, we move from …”
-
“We now turn to …”
-
“From to , ...” (title narration; rewrite as an argument bridge)
-
“In the next section/subsection …”
-
“Therefore/As a result, survey synthesis/comparisons should …” (rewrite as literature-facing observation)
-
Also remove generator-like thesis openers that read like outline narration:
-
“This subsection surveys …”
-
“This subsection argues …”
Three passes (recommended)
Pass 1 — Subsection polish (structure + de-template)
Best-of-2 micro-polish (recommended):
-
For any sentence/paragraph you touch, draft 2 candidate rewrites, then keep the better one.
-
Choose with a simple rubric: move clarity, no template stem, citations stay anchored, and citation shape stays reader-facing (no adjacent cite blocks / dup keys).
-
Do not keep both candidates. Pick one and move on (the goal is convergence, not endless rewriting).
Role split:
-
Editor: rewrite sentences for clarity and flow.
-
Skeptic: deletes any generic/template sentence.
Targets:
-
Each H3 reads like: tension → contrast → evidence → limitation.
-
Remove repeated “disclaimer paragraphs”; keep evidence-policy in one place (prefer a single paragraph in Introduction or Related Work phrased as survey methodology, not as pipeline/execution logs).
-
Use outline/outline.yml (if present) to avoid heading drift during edits.
-
If present, use outline/subsection_briefs.jsonl to keep each H3’s scope/RQ consistent while improving flow.
-
Do a quick “pattern sweep” (semantic, not mechanical):
-
delete outline narration: This subsection ... , In this subsection ...
-
delete slide navigation: Next, we move from ... , We now turn to ... , In the next section ...
-
delete title narration: From <X> to <Y>, ...
-
replace with: content claims + argument bridges + organization sentences (no new facts/citations)
-
If citation-injector was used, smooth any budget-injection sentences so they read paper-like:
-
Keep the citation keys unchanged.
-
Avoid list-injection stems (e.g., “A few representative references include …”, “Notable lines of work include …”, “Concrete examples ... include ...”).
-
Prefer integrating the added citations into an existing argument sentence, or rewrite as a short parenthetical e.g., ... clause tied to the subsection’s lens (no new facts).
-
Vary phrasing; avoid repeating the same opener stem across many H3s.
-
Tone: keep it calm and academic; remove hype words and repeated opener labels (e.g., literal Key takeaway: across many H3s).
-
Reduce repeated synthesis stems (e.g., many paragraphs starting with Taken together, ... ); vary synthesis phrasing and keep it content-bearing.
-
Treat repeated "Taken together," as a generator-voice smell. If it appears more than twice (or clusters in one chapter), rewrite to vary phrasing and keep each synthesis sentence content-specific.
-
Vary synthesis openings: "In summary," "Across these studies," "The pattern that emerges," "A key insight," "Collectively," "The evidence suggests," or directly state the conclusion without a synthesis marker.
-
Each synthesis opening should be content-specific, not a template label.
Rewrite recipe for subsection openers (paper voice, no new facts):
-
Delete: This subsection surveys/argues... / In this subsection, we...
-
Replace with a compact opener that does 2–3 of these (no labels; vary across subsections):
-
Content claim: the subsection-specific tension/trade-off (optionally with 1–2 embedded citations)
-
Why it matters: link the claim to evaluation/engineering constraints (benchmark/protocol/cost/tool access)
-
Preview: what you will contrast next and on what lens (A vs B; then evaluation anchors; then limitations)
-
Example skeletons (paraphrase; don’t reuse verbatim):
-
Tension-first: A central tension is ...; ...; we contrast ...
-
Decision-first: For builders, the crux is ...; ...
-
Lens-first: Seen through the lens of ..., ...
Pass 2 — Terminology normalization
Role split:
-
Taxonomist: chooses canonical terms and synonym policy.
-
Integrator: applies consistent replacements across the draft.
Targets:
-
One concept = one name across sections.
-
Headings, tables, and prose use the same canonical terms.
Pass 3 — Redundancy pruning (global repetition)
Role split:
-
Compressor: collapses repeated boilerplate.
-
Narrative keeper: ensures removing repetition does not break the argument chain.
Targets:
-
Cross-section repeated intros/outros are removed.
-
Only subsection-specific content remains inside subsections.
Script
Quick Start
-
python .codex/skills/draft-polisher/scripts/run.py --help
-
python .codex/skills/draft-polisher/scripts/run.py --workspace workspaces/<ws>
All Options
-
--workspace <dir> : workspace root
-
--unit-id <U###> : unit id (optional; for logs)
-
--inputs <semicolon-separated> : override inputs (rare; prefer defaults)
-
--outputs <semicolon-separated> : override outputs (rare; prefer defaults)
-
--checkpoint <C#> : checkpoint id (optional; for logs)
Examples
First polish pass (creates anchoring baseline output/citation_anchors.prepolish.jsonl ):
-
python .codex/skills/draft-polisher/scripts/run.py --workspace workspaces/<ws>
Reset the anchoring baseline (only if you intentionally accept citation drift):
- Delete output/citation_anchors.prepolish.jsonl , then rerun the polisher.
Acceptance checklist
-
No TODO/TBD/FIXME/(placeholder) .
-
No … or ... truncation.
-
No repeated boilerplate sentence across many subsections.
-
Citation anchoring passes (no cross-subsection drift).
-
Each H3 has at least one cross-paper synthesis paragraph (>=2 citations).
Troubleshooting
Issue: polishing causes citation drift across subsections
Fix:
- Keep citations inside the same ### subsection; if restructuring is intentional, delete output/citation_anchors.prepolish.jsonl and regenerate a new baseline.
Issue: draft polishing is requested before writing approval
Fix:
- Record the relevant approval in DECISIONS.md (typically Approve C2 ) before doing prose-level edits.