draft-polisher

Draft Polisher (Audit-style editing)

Goal: turn a first-pass draft into readable survey prose without breaking the evidence contract.

This is a local polish pass: de-template + coherence + terminology + redundancy pruning.

Note: if the main issue is structural redundancy from section accumulation, push the change upstream to sections/ and use paragraph-curator before merge. draft-polisher should not be the primary place where you decide which paragraphs to keep.

Role cards (use explicitly)

Style Harmonizer (editor)

Mission: remove generator voice and make prose read like one author wrote it.

Do:

Delete narration openers and slide navigation; replace with argument bridges.
Vary rhythm; remove repeated template stems.
Collapse repeated disclaimers into one front-matter methodology paragraph.

Avoid:

Adding or removing citation keys.
Moving citations across subsections.

Evidence Contract Guard (skeptic)

Mission: prevent polishing from inflating claims beyond evidence.

Do:

Keep quantitative statements scoped (task/metric/constraint) or weaken them.
Treat missing evidence as a failure signal; route upstream rather than rewriting around gaps.

Avoid:

Overconfident language when evidence is abstract-only.

Role prompt: Style Harmonizer (editor expert)

You are the style and coherence editor for a technical survey.

Your goal is to make the draft read like one careful author wrote it, without changing the evidence contract.

Hard constraints:

do not add/remove citation keys
do not move citations across ### subsections
do not strengthen claims beyond what existing citations support

High-leverage edits:

delete generator voice (This subsection..., Next we move..., We now turn...)
replace navigation with argument bridges (content-bearing handoffs)
collapse repeated disclaimers into one methodology paragraph in front matter
keep quantitative statements well-scoped (task/metric/constraint in the same sentence)

Working style:

rewrite sentences so they carry content, not process
vary rhythm, but avoid “template stems” repeating across H3s

Inputs

output/DRAFT.md
Optional context (read-only; helps avoid “polish drift”):
outline/outline.yml
outline/subsection_briefs.jsonl
outline/evidence_drafts.jsonl
citations/ref.bib

Outputs

output/DRAFT.md (in-place refinement)
output/citation_anchors.prepolish.jsonl (baseline, generated on first run by the script)

Non-negotiables (hard rules)

Citation keys are immutable
Do not add new [@BibKey] keys.
Do not delete citation markers.
If citations/ref.bib exists, do not introduce any key that is not defined there.
Citation anchoring is immutable
Do not move citations across ### subsections.
If you must restructure across subsections, stop and push the change upstream (outline/briefs/evidence), then regenerate.
No evidence inflation
If a sentence sounds stronger than the evidence level (abstract-only), rewrite it into a qualified statement.
When in doubt, check the subsection’s evidence pack in outline/evidence_drafts.jsonl and keep claims aligned to snippets.
Citation shape normalization
Merge adjacent citation blocks in the same sentence (avoid [@a] [@b] ).
Deduplicate keys inside one block (avoid [@a; @a] ).
Avoid tail-only citation dumps: keep some citations in the claim sentence itself (mid-sentence), not only paragraph end.
Quantitative claim hygiene
If you keep a number, ensure the sentence also states (without guessing): task type + metric definition + relevant constraint (budget/cost/tool access), and the citation is embedded in that sentence.
Avoid ambiguous model naming (e.g., “GPT-5”) unless the cited paper uses that exact label; otherwise use the paper’s naming or a neutral description.
No pipeline voice
Remove scaffolding phrases like:
“We use the following working claim …”
“The main axes we track are …”
“abstracts are treated as verification targets …”
“Method note (evidence policy): …” (avoid labels; rewrite as plain survey methodology)
“this run is …” (rewrite as survey methodology: “This survey is …”)
“Scope and definitions / Design space / Evaluation practice …”
“Next, we move from …”
“We now turn to …”
“From to , ...” (title narration; rewrite as an argument bridge)
“In the next section/subsection …”
“Therefore/As a result, survey synthesis/comparisons should …” (rewrite as literature-facing observation)
Also remove generator-like thesis openers that read like outline narration:
“This subsection surveys …”
“This subsection argues …”

Three passes (recommended)

Pass 1 — Subsection polish (structure + de-template)

Best-of-2 micro-polish (recommended):

For any sentence/paragraph you touch, draft 2 candidate rewrites, then keep the better one.
Choose with a simple rubric: move clarity, no template stem, citations stay anchored, and citation shape stays reader-facing (no adjacent cite blocks / dup keys).
Do not keep both candidates. Pick one and move on (the goal is convergence, not endless rewriting).

Role split:

Editor: rewrite sentences for clarity and flow.
Skeptic: deletes any generic/template sentence.

Targets:

Each H3 reads like: tension → contrast → evidence → limitation.
Remove repeated “disclaimer paragraphs”; keep evidence-policy in one place (prefer a single paragraph in Introduction or Related Work phrased as survey methodology, not as pipeline/execution logs).
Use outline/outline.yml (if present) to avoid heading drift during edits.
If present, use outline/subsection_briefs.jsonl to keep each H3’s scope/RQ consistent while improving flow.
Do a quick “pattern sweep” (semantic, not mechanical):
delete outline narration: This subsection ... , In this subsection ...
delete slide navigation: Next, we move from ... , We now turn to ... , In the next section ...
delete title narration: From <X> to <Y>, ...
replace with: content claims + argument bridges + organization sentences (no new facts/citations)
If citation-injector was used, smooth any budget-injection sentences so they read paper-like:
Keep the citation keys unchanged.
Avoid list-injection stems (e.g., “A few representative references include …”, “Notable lines of work include …”, “Concrete examples ... include ...”).
Prefer integrating the added citations into an existing argument sentence, or rewrite as a short parenthetical e.g., ... clause tied to the subsection’s lens (no new facts).
Vary phrasing; avoid repeating the same opener stem across many H3s.
Tone: keep it calm and academic; remove hype words and repeated opener labels (e.g., literal Key takeaway: across many H3s).
Reduce repeated synthesis stems (e.g., many paragraphs starting with Taken together, ... ); vary synthesis phrasing and keep it content-bearing.
Treat repeated "Taken together," as a generator-voice smell. If it appears more than twice (or clusters in one chapter), rewrite to vary phrasing and keep each synthesis sentence content-specific.
Vary synthesis openings: "In summary," "Across these studies," "The pattern that emerges," "A key insight," "Collectively," "The evidence suggests," or directly state the conclusion without a synthesis marker.
Each synthesis opening should be content-specific, not a template label.

Rewrite recipe for subsection openers (paper voice, no new facts):

Delete: This subsection surveys/argues... / In this subsection, we...
Replace with a compact opener that does 2–3 of these (no labels; vary across subsections):
Content claim: the subsection-specific tension/trade-off (optionally with 1–2 embedded citations)
Why it matters: link the claim to evaluation/engineering constraints (benchmark/protocol/cost/tool access)
Preview: what you will contrast next and on what lens (A vs B; then evaluation anchors; then limitations)
Example skeletons (paraphrase; don’t reuse verbatim):
Tension-first: A central tension is ...; ...; we contrast ...
Decision-first: For builders, the crux is ...; ...
Lens-first: Seen through the lens of ..., ...

Pass 2 — Terminology normalization

Role split:

Taxonomist: chooses canonical terms and synonym policy.
Integrator: applies consistent replacements across the draft.

Targets:

One concept = one name across sections.
Headings, tables, and prose use the same canonical terms.

Pass 3 — Redundancy pruning (global repetition)

Role split:

Compressor: collapses repeated boilerplate.
Narrative keeper: ensures removing repetition does not break the argument chain.

Targets:

Cross-section repeated intros/outros are removed.
Only subsection-specific content remains inside subsections.

Script

Quick Start

python .codex/skills/draft-polisher/scripts/run.py --help
python .codex/skills/draft-polisher/scripts/run.py --workspace workspaces/<ws>

All Options

--workspace <dir> : workspace root
--unit-id <U###> : unit id (optional; for logs)
--inputs <semicolon-separated> : override inputs (rare; prefer defaults)
--outputs <semicolon-separated> : override outputs (rare; prefer defaults)
--checkpoint <C#> : checkpoint id (optional; for logs)

Examples

First polish pass (creates anchoring baseline output/citation_anchors.prepolish.jsonl ):

python .codex/skills/draft-polisher/scripts/run.py --workspace workspaces/<ws>

Reset the anchoring baseline (only if you intentionally accept citation drift):

Delete output/citation_anchors.prepolish.jsonl , then rerun the polisher.

Acceptance checklist

No TODO/TBD/FIXME/(placeholder) .
No … or ... truncation.
No repeated boilerplate sentence across many subsections.
Citation anchoring passes (no cross-subsection drift).
Each H3 has at least one cross-paper synthesis paragraph (>=2 citations).

Troubleshooting

Issue: polishing causes citation drift across subsections

Fix:

Keep citations inside the same ### subsection; if restructuring is intentional, delete output/citation_anchors.prepolish.jsonl and regenerate a new baseline.

Issue: draft polishing is requested before writing approval

Fix:

Record the relevant approval in DECISIONS.md (typically Approve C2 ) before doing prose-level edits.

draft-polisher

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

evidence-auditor

pipeline-auditor

artifact-contract-auditor

pdf-text-extractor