section-mapper

Create a paper→subsection map that supports evidence building and later synthesis.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "section-mapper" with this command: npx skills add willoscar/research-units-pipeline-skills/willoscar-research-units-pipeline-skills-section-mapper

Section Mapper

Create a paper→subsection map that supports evidence building and later synthesis.

Good mapping is diverse (avoids reusing the same paper everywhere) and explainable (short semantic “why”, not just keyword overlap).

When to use

  • You have outline/outline.yml and a papers/core_set.csv and need coverage per subsection.

  • You want to identify weak-signal subsections early (so you can adjust scope or add papers).

Inputs

  • papers/core_set.csv

  • outline/outline.yml

Outputs

  • outline/mapping.tsv

  • outline/mapping_report.md (diagnostics: reuse hotspots, weak-signal subsections)

Freeze marker (explicit)

To prevent accidental overwrites after you refine mapping rationales:

  • Create outline/mapping.refined.ok .

If you rerun the script without this marker, it will back up the previous mapping to a timestamped file:

  • outline/mapping.tsv.bak.<timestamp>

Workflow (heuristic)

  • Start from the outline subsections (each subsection should be “mappable”).

  • For each subsection, pick enough papers to support evidence-first writing (A150++ default: 28; smaller runs: ~12–20; lightweight: ~3–6) that are:

  • representative (canonical / frequently-cited)

  • complementary (different design choices, different eval setups)

  • not overly reused elsewhere unless truly foundational

  • Fill why with a short semantic rationale (one line is enough), e.g.:

  • mechanism: “decouples planner/executor; tool calling API”

  • evaluation: “interactive web tasks; strong tool error analysis”

  • safety: “agentic jailbreak surface; mitigation study”

  • After initial mapping, scan for:

  • subsections with <3 papers → either broaden, merge, or expand retrieval

  • a few papers mapped everywhere → diversify; reserve “foundational” papers for only the truly relevant parts

Quality checklist

  • outline/mapping.tsv exists and is non-empty.

  • Most subsections have ≥3 mapped papers (or a clear exception noted in why ).

  • why is semantic (not just matched_terms=... ).

  • No single paper dominates unrelated subsections.

Helper script (optional)

Quick Start

  • python .codex/skills/section-mapper/scripts/run.py --help

  • python .codex/skills/section-mapper/scripts/run.py --workspace <workspace_dir> --per-subsection 28

All Options

  • --per-subsection <n> : target mapped papers per subsection

  • --diversity-penalty <float> : penalize repeated reuse of the same paper across many subsections

  • --soft-limit <n> / --hard-limit <n> : caps for per-paper reuse (0 = auto)

Examples

  • Higher diversity (reduce over-reuse):

  • python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 4 --diversity-penalty 0.25

  • Tighter reuse caps:

  • python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 3 --soft-limit 6 --hard-limit 10

Notes

  • Writes outline/mapping_report.md diagnostics.

  • In pipeline.py --strict , mapping may be blocked until generic why rationales are replaced with semantic ones.

Troubleshooting

Common Issues

Issue: outline/mapping.tsv is empty or low-coverage

Symptom:

  • Mapping has few rows, or many subsections have <3 papers.

Causes:

  • Core set is too small or outline is too fine-grained.

Solutions:

  • Increase core set size (rerun dedupe-rank with larger --core-size ).

  • Merge weak-signal subsections or broaden the scope/queries.

Issue: Mapping over-reuses the same papers

Symptom:

  • Quality gate reports repeated papers across many unrelated subsections.

Causes:

  • Diversity penalty too low; limited core set.

Solutions:

  • Raise --diversity-penalty and/or set tighter --soft-limit/--hard-limit .

  • Manually diversify mappings for unrelated sections.

Recovery Checklist

  • Each subsection has ≥3 mapped papers (target).

  • why column contains semantic rationale (not just token overlap).

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

pdf-text-extractor

No summary provided by upstream source.

Repository SourceNeeds Review
Research

latex-compile-qa

No summary provided by upstream source.

Repository SourceNeeds Review
Research

draft-polisher

No summary provided by upstream source.

Repository SourceNeeds Review
Research

citation-verifier

No summary provided by upstream source.

Repository SourceNeeds Review