agent-survey-corpus | V50.AI

agent-survey-corpus

Agent Survey Corpus (arXiv PDFs → text extracts)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "agent-survey-corpus" with this command: npx skills add willoscar/research-units-pipeline-skills/willoscar-research-units-pipeline-skills-agent-survey-corpus

Agent Survey Corpus (arXiv PDFs → text extracts)

Goal: create a small, local reference library so you can learn from real agent surveys when refining:

C2 outline structure (paper-like sectioning)
C4 tables/claims organization
C5 writing style and density

This is intentionally not part of the pipeline; it is an optional, repo-level toolkit.

Inputs

ref/agent-surveys/arxiv_ids.txt

Outputs

ref/agent-surveys/pdfs/
ref/agent-surveys/text/
ref/agent-surveys/STYLE_REPORT.md (tracked; auto-generated summary)

Workflow

Edit ref/agent-surveys/arxiv_ids.txt (one arXiv id per line).
Run the downloader to fetch PDFs and extract the first N pages to text.
Skim the extracted text under ref/agent-surveys/text/ :
look at section counts (H2), subsection granularity (H3), and how they transition between chapters.
identify repeated rhetorical patterns you want the pipeline writer to imitate.

Script

Quick Start

python .codex/skills/agent-survey-corpus/scripts/run.py --help
python .codex/skills/agent-survey-corpus/scripts/run.py --workspace . --max-pages 20

All Options

--workspace <dir> (use . to write into repo root)
--inputs <semicolon-separated> (default: ref/agent-surveys/arxiv_ids.txt )
--max-pages <N> (default: 20)
--sleep <seconds> (default: 1.0)
--overwrite (re-download + re-extract)

Examples

Download/extract into repo root ref/ :
python .codex/skills/agent-survey-corpus/scripts/run.py --workspace . --max-pages 20
Download/extract into a specific folder (treated as workspace root):
python .codex/skills/agent-survey-corpus/scripts/run.py --workspace /tmp/surveys --max-pages 30

Troubleshooting

Download fails / timeout: rerun with a larger --sleep , or try fewer ids.
Text extract is empty: the PDF may be scanned; try another survey or increase --max-pages .
Files showing up in git status: PDFs/text are ignored via .gitignore (ref//pdfs/ , ref//text/ ).

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open in GitHub Open in ClawHub

Related Skills

Related by shared tags or category signals.

Research

pdf-text-extractor

No summary provided by upstream source.

Repository SourceNeeds Review

Research

latex-compile-qa

No summary provided by upstream source.

Repository SourceNeeds Review

Research

draft-polisher

No summary provided by upstream source.

Repository SourceNeeds Review

Research

citation-verifier

No summary provided by upstream source.

Repository SourceNeeds Review