Paper Notes
Produce consistent, searchable paper notes that later steps (claims, visuals, writing) can reliably synthesize.
This is still NO PROSE: keep notes as bullets / short fields, not narrative paragraphs.
Load Order
Always read:
-
references/overview.md
-
references/note_schema.md
Read by task:
-
references/limitation_taxonomy.md when writing or reviewing limitations (avoid boilerplate)
-
references/result_extraction_examples.md when extracting key_results (good vs bad examples)
-
references/source_text_hygiene.md when result/limitation fields still preserve paper self-narration or author-result wrappers
Machine-readable assets:
-
assets/note_schema.json — JSONL record schema for validation
-
assets/evidence_tags.json — evidence bank tagging categories (extensible without code changes)
-
assets/source_text_hygiene.json — note-field source sentence cleanup policy
Script Boundary
Use scripts/run.py only for:
-
deterministic scaffold generation from core_set + metadata
-
priority selection based on mapping coverage
-
evidence bank construction from structured note fields
Do not treat run.py as the place for:
-
paper-specific limitation prose (use references/limitation_taxonomy.md for guidance)
-
domain-specific evaluation heuristics hidden in code
-
reader-facing narrative text
Role cards (prompt-level guidance)
Close Reader
-
Mission: extract what is specific and checkable (setup, method, metrics, limits).
-
Do: name concrete tasks/benchmarks and what the paper actually measures.
-
Avoid: generic summary boilerplate that could fit any paper.
Results Recorder
-
Mission: capture evaluation anchors that later writing needs.
-
Do: record task + metric + constraints (budget/tool access) whenever available.
-
Avoid: copying numbers without the evaluation setting that makes them meaningful.
-
Avoid: promoting artifact introductions (X enables ... , our framework features ... ) into key_results .
-
Avoid: promoting benchmark-positioning, field-motivation, or author-navigation lines (we apply ... and show ... , we then discuss how ... ) into key_results .
Limitation Logger
-
Mission: capture the caveats that change interpretation.
-
Do: write paper-specific limitations (protocol mismatch, missing ablations, threat model gaps).
-
Avoid: repeated generic limitations like “may not generalize” without specifics.
When to use
-
After you have a core set (and ideally a mapping) and need evidence-ready notes.
-
Before writing a survey draft.
Inputs
-
papers/core_set.csv
-
Optional: outline/mapping.tsv (to prioritize)
-
Optional: papers/fulltext_index.jsonl
- papers/fulltext/*.txt (if running in fulltext mode)
Outputs
-
papers/paper_notes.jsonl (JSONL; one record per paper)
-
papers/evidence_bank.jsonl (JSONL; addressable evidence snippets derived from notes; A150++ target: >=7 items/paper on average)
Decision: evidence depth
-
If you have extracted text (papers/fulltext/*.txt ) → enrich key papers using fulltext snippets and set evidence_level: "fulltext" .
-
If you only have abstracts (default) → keep long-tail notes abstract-level, but still fully enrich high-priority papers (see below).
Workflow (heuristic)
Uses: outline/mapping.tsv , papers/fulltext_index.jsonl .
-
Ensure coverage: every paper_id in papers/core_set.csv must have one JSONL record.
-
Use mapping to choose high-priority papers:
-
heavily reused across subsections
-
pinned classics (ReAct/Toolformer/Reflexion… if in scope)
-
For high-priority papers, capture:
-
3–6 summary bullets (what’s new, what problem setting, what’s the loop)
-
method (mechanism and architecture; what differs from baselines)
-
key_results (benchmarks/metrics; include numbers if available)
-
limitations (specific assumptions/failure modes; avoid generic boilerplate)
-
For long-tail papers:
-
keep summary bullets short (abstract-derived is OK)
-
still include at least one limitation, but make it specific when possible
-
Assign a stable bibkey for each paper for citation generation.
Quality checklist
Coverage: every paper_id in papers/core_set.csv appears in papers/paper_notes.jsonl .
High-priority papers have non-TODO method/results/limitations.
Limitations are not copy-pasted across many papers.
evidence_level is set correctly (abstract vs fulltext ).
Evidence bank: papers/evidence_bank.jsonl exists and is dense enough for A150++ (>=7 items/paper on average).
Helper script (optional)
Quick Start
-
python .codex/skills/paper-notes/scripts/run.py --help
-
python .codex/skills/paper-notes/scripts/run.py --workspace <workspace_dir>
All Options
- See --help (this helper is intentionally minimal)
Examples
-
Generate notes, then optionally enrich priority=high papers:
-
Run the helper once, then refine papers/paper_notes.jsonl (e.g., add full-text details for key papers and diversify limitations).
Notes
-
The helper writes deterministic metadata/abstract-level notes and marks key papers with priority=high .
-
In pipeline.py --strict it will be blocked if high-priority notes are incomplete (missing method/key_results/limitations) or contain placeholders.
Troubleshooting
Common Issues
Issue: High-priority notes still look like scaffolds
Symptom:
- Quality gate reports missing method/key_results or TODO placeholders.
Causes:
- Notes were generated from abstracts only; key papers weren’t enriched.
Solutions:
-
Fully enrich priority=high papers: method , ≥1 key_results , ≥3 summary_bullets , ≥1 concrete limitations .
-
If you need full text evidence, run pdf-text-extractor in fulltext mode for key papers.
Issue: Repeated limitations across many papers
Symptom:
- Quality gate reports repeated limitation boilerplate.
Causes:
- Copy-pasted limitations instead of paper-specific failure modes/assumptions.
Solutions:
- Replace boilerplate with paper-specific limitations (setup, data, evaluation gaps, failure cases).
Recovery Checklist
-
papers/paper_notes.jsonl covers all papers/core_set.csv paper_ids.
-
≥80% of priority=high notes satisfy method/results/limitations completeness.
-
No TODO remains in high-priority notes.