arXiv Research Skill
Systematic academic research through four capabilities that form a knowledge-building loop:
connect -> understand -> evidence
Find -> Comprehend -> Cite
tikz (extract figures from LaTeX sources)
Setup
Install dependencies: uv sync
Capabilities
Connect (Knowledge Navigation)
Find relevant papers across arXiv, explore citation networks, and discover related work.
# Search with optional filters
uv run python scripts/connect.py search "transformer attention mechanism" --category cs.LG --limit 20
uv run python scripts/connect.py search "LLM agents" --since 2023-01 --until 2024-06
uv run python scripts/connect.py search "topic" --with-citations --sort citations
# Discover related work
uv run python scripts/connect.py similar "2301.00001" --limit 10
uv run python scripts/connect.py recent cs.AI --days 7
uv run python scripts/connect.py by-author "Yann LeCun"
# Citation network
uv run python scripts/connect.py cited-by "2301.00001" --limit 20
uv run python scripts/connect.py coauthors "Yann LeCun" --limit 20
# Get full paper content (single or batch)
uv run python scripts/connect.py content "2301.00001"
uv run python scripts/connect.py content "2301.00001,2302.00002,2303.00003"
Understand (Meaning Extraction)
Analyze paper content with structured prompts. Pipe paper content from connect.py content into analysis.
uv run python scripts/connect.py content "2301.00001" | uv run python scripts/understand.py analyze quick
Available prompts (uv run python scripts/understand.py list):
| Prompt | Purpose |
|---|---|
quick | Fast structured summary |
methodology | Detailed methodology extraction |
contribution | Identify and rank contributions |
critical | Strengths/weaknesses analysis |
compare | Multi-paper comparison table |
literature | Organize for literature review |
implementation | Extract reproduction details |
evidence | Evaluate as evidence for a claim |
Evidence (Source Attribution)
Generate citations in multiple formats and export for reference managers.
uv run python scripts/evidence.py bibtex "2301.00001"
uv run python scripts/evidence.py apa "2301.00001"
uv run python scripts/evidence.py ris "2301.00001" # For Zotero/Mendeley/EndNote
uv run python scripts/evidence.py batch "id1,id2,id3" --format bibtex > refs.bib
uv run python scripts/evidence.py batch "id1,id2" --format ris > refs.ris
Formats: bibtex, apa, ieee, acm, chicago, ris
TikZ (Figure Extraction)
Extract TikZ source code from arXiv paper LaTeX sources. Supports tikzpicture, tikzcd, circuitikz, and pgfplots environments. Captures captions, labels, and library dependencies.
uv run python scripts/tikz.py extract "2301.00001" # Pure TikZ code
uv run python scripts/tikz.py extract "2301.00001" --format latex > fig.tex # Compilable LaTeX
uv run python scripts/tikz.py extract "2301.00001,2302.00002" --format json # Batch as JSON
uv run python scripts/tikz.py list "2301.00001" # Summary only
Output formats: tikz (default), latex, json, brief
Workflow Patterns
Literature Review
# 1. Find seed papers ranked by citation impact
uv run python scripts/connect.py search "your topic" --limit 50 --with-citations --sort citations
# 2. Expand with similar papers from top results
uv run python scripts/connect.py similar "top_paper_id"
# 3. Analyze each paper for the review
uv run python scripts/connect.py content "paper_id" | uv run python scripts/understand.py analyze literature
# 4. Generate bibliography
uv run python scripts/evidence.py batch "id1,id2,id3" --format bibtex > refs.bib
Finding Evidence for a Claim
# 1. Search for supporting research
uv run python scripts/connect.py search "your claim keywords" --with-citations
# 2. Verify the paper supports your claim
uv run python scripts/connect.py content "paper_id" | uv run python scripts/understand.py analyze evidence
# 3. Generate proper citation
uv run python scripts/evidence.py apa "paper_id"
API Dependencies
| Service | Purpose | Rate Limit | API Key |
|---|---|---|---|
| arXiv | Paper search, content | 1 req/3s | No |
| Semantic Scholar | Citations, similar papers | 100 req/5min | Optional (higher limits) |
| Jina Reader | Full text extraction | Generous | No |
Scripts include built-in rate limiting and backoff.
Error Handling
- Rate limited: Scripts retry automatically with backoff
- Paper not found: Verify arXiv ID format (YYMM.NNNNN)
- No citations: Paper may be too new for Semantic Scholar
File Structure
arxiv-research-skill/
├── SKILL.md # Agent instructions
├── README.md # Installation and overview
└── scripts/
├── connect.py # Knowledge navigation
├── understand.py # Analysis prompts
├── evidence.py # Citation generation
├── tikz.py # TikZ figure extraction
├── cache.py # SQLite caching (~/.cache/arxiv-research/papers.db)
└── utils.py # Shared utilities (extractPaperId, cleanText)