arxiv-mcp

Mode: Cognitive/Prompt-Driven — No standalone utility script; use via agent context.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "arxiv-mcp" with this command: npx skills add oimiragieo/agent-studio/oimiragieo-agent-studio-arxiv-mcp

Mode: Cognitive/Prompt-Driven — No standalone utility script; use via agent context.

arXiv Search Skill

✅ No Installation Required

This skill uses existing tools to access arXiv:

  • WebFetch - Direct access to arXiv API

  • Exa - Semantic search with arXiv filtering

Works immediately - no MCP server, no restart needed.

Result Limits (Memory Safeguard)

arxiv-mcp returns academic papers. To prevent memory exhaustion:

  • max_results: 20 (HARD LIMIT)

  • Each paper metadata ~300 bytes

  • 20 papers × 300 bytes = ~6 KB metadata

  • Papers can be 100+ KB each if fetched - DON'T fetch full papers

Why the limit?

  • Previous limit: 100 results → 30 KB+ metadata → context explosion

  • New limit: 20 results → 6 KB metadata → memory safe

  • 20 papers is usually enough to find your target

Method 1: WebFetch with arXiv API (Recommended for specific queries)

The arXiv API is publicly accessible at http://export.arxiv.org/api/query .

Recommended Pattern

// ✓ GOOD: Limit results to 20 WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance', prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results', });

// ✓ GOOD: Use specific filters to reduce result set WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention+2025&max_results=20&sortBy=submittedDate', prompt: 'Extract recent papers on transformer attention', });

// ✗ BAD: Old behavior - unlimited or >20 results WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=all:neural+networks', // Too broad - will get 100s of results });

// ✗ BAD: Exceeds memory limit WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=all:deep+learning&max_results=100', // Over limit - memory risk });

Search by Keywords

WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance', prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results', });

Search by Author

WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=au:LeCun&max_results=10&sortBy=submittedDate', prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs', });

Search by Category

WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=15&sortBy=submittedDate', prompt: 'Extract paper titles, authors, abstracts, categories, and arXiv IDs', });

Get Specific Paper by ID

WebFetch({ url: 'http://export.arxiv.org/api/query?id_list=2301.07041', prompt: 'Extract full details: title, all authors, abstract, categories, published date, PDF link', });

API Query Parameters

Parameter Description Example

search_query

Search terms with field prefixes all:transformer , au:LeCun , ti:attention

id_list

Comma-separated arXiv IDs 2301.07041,2302.13971

max_results

Number of results (default 10, max 100) max_results=20

start

Offset for pagination start=10

sortBy

Sort order: relevance , lastUpdatedDate , submittedDate

sortBy=submittedDate

sortOrder

ascending or descending

sortOrder=descending

Field Prefixes for search_query

Prefix Field Example

all:

All fields all:machine+learning

ti:

Title ti:transformer

au:

Author au:Vaswani

abs:

Abstract abs:attention+mechanism

cat:

Category cat:cs.LG

co:

Comment co:accepted

Boolean Operators

Combine terms with AND , OR , ANDNOT :

search_query=ti:transformer+AND+abs:attention search_query=au:LeCun+OR+au:Bengio search_query=cat:cs.LG+ANDNOT+ti:survey

When NOT to Use arxiv-mcp

  • General web research → Use WebSearch/WebFetch instead

  • Implementation examples → Use pnpm search:code or ripgrep skill on codebase (Grep/Glob as fallback)

  • Product research → Use WebSearch with news filter

  • Community discussions → Use WebSearch for forums/Stack Overflow

arxiv-mcp is best for:

  • Finding academic papers on specific topics

  • Understanding theoretical foundations

  • Citing research in documentation

  • Quick literature review (20 papers max)

Method 2: Exa Search (Better for semantic/natural language queries)

Use Exa for more natural language queries with arXiv filtering:

Semantic Search

mcp__Exa__web_search_exa({ query: 'site:arxiv.org transformer architecture attention mechanism deep learning', numResults: 10, });

Recent Papers in a Field

mcp__Exa__web_search_exa({ query: 'site:arxiv.org large language model scaling laws 2024', numResults: 15, });

Author-Focused Search

mcp__Exa__web_search_exa({ query: 'site:arxiv.org author:"Yann LeCun" deep learning', numResults: 10, });

Common arXiv Categories

Category Field

cs.AI Artificial Intelligence

cs.LG Machine Learning

cs.CL Computation and Language (NLP)

cs.CV Computer Vision

cs.SE Software Engineering

cs.CR Cryptography and Security

stat.ML Machine Learning (Statistics)

math.* Mathematics (all subcategories)

physics.* Physics (all subcategories)

q-bio.* Quantitative Biology

econ.* Economics

Workflow: Complete Research Process

Step 1: Initial Search

// Start with broad Exa search for semantic matching mcp__Exa__web_search_exa({ query: 'site:arxiv.org transformer attention mechanism neural networks', numResults: 10, });

Step 2: Get Specific Papers

// Get details for interesting papers by ID WebFetch({ url: 'http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971', prompt: 'Extract full metadata for each paper: title, authors, abstract, categories, PDF URL', });

Step 3: Find Related Work

// Search by category of interesting paper WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+ti:attention&max_results=10&sortBy=submittedDate', prompt: 'Find related papers, extract titles and abstracts', });

Step 4: Get Recent Papers

// Latest papers in the field WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending', prompt: 'Extract the 20 most recent machine learning papers', });

</execution_process>

<best_practices>

  • Use Exa for discovery: Natural language queries find semantically related papers

  • Use WebFetch for precision: Specific IDs, categories, or API queries

  • Combine approaches: Exa to discover, WebFetch to deep-dive

  • Use specific queries: "transformer attention mechanism" > "machine learning"

  • Check multiple categories: Papers often span cs.AI + cs.LG + cs.CL

  • Sort by date for recent work: sortBy=submittedDate&sortOrder=descending

</best_practices>

WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&#x26;max_results=10&#x26;sortBy=relevance', prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs', });

Example 2: Find papers by researcher:

WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=au:Vaswani&#x26;max_results=15', prompt: 'List all papers by this author with titles and dates', });

Example 3: Get recent ML papers:

WebFetch({ url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&#x26;max_results=20&#x26;sortBy=submittedDate&#x26;sortOrder=descending', prompt: 'Extract the 20 most recent machine learning papers with titles and abstracts', });

Example 4: Semantic search with Exa:

mcp__Exa__web_search_exa({ query: 'site:arxiv.org multimodal large language models vision 2024', numResults: 10, });

Example 5: Get specific paper details:

WebFetch({ url: 'http://export.arxiv.org/api/query?id_list=1706.03762', prompt: "Extract complete details for the 'Attention Is All You Need' paper", });

</usage_example>

Agent Integration

This skill is automatically assigned to:

  • researcher - Academic research, literature review

  • scientific-research-expert - Deep scientific analysis

  • developer - Finding technical papers for implementation

Iron Laws

  • ALWAYS enforce max_results=20 — never allow unlimited or >20 result queries; context explosion from 100+ papers is a known failure mode that stalls agent pipelines.

  • NEVER fetch full paper PDFs during literature review — extract metadata and abstracts only; full papers are 100KB+ each and will exhaust context budget in minutes.

  • ALWAYS use Exa for semantic discovery, WebFetch for precision retrieval — Exa finds semantically related papers; WebFetch gets specific IDs or category feeds; use both in sequence, not interchangeably.

  • NEVER use broad queries without field prefixes — search_query=neural+networks returns thousands of results; always scope with ti: , au: , cat: , or abs: prefixes to target the query.

  • ALWAYS cite arXiv IDs (e.g., 2301.07041) when referencing papers — titles alone are ambiguous and change; IDs are stable, machine-readable, and enable instant retrieval.

Anti-Patterns

Anti-Pattern Why It Fails Correct Approach

Using max_results=100 or no limit Context explosion; 100 papers × 300 bytes = 30KB+ metadata Always set max_results=20 (hard limit)

Fetching full paper PDFs Single paper can be 100KB+; kills context budget Extract abstract + metadata only via API

Broad query without field prefix Returns irrelevant results across all fields Use ti: , au: , cat: , or abs: prefix

Using only WebFetch for discovery Misses semantically related papers not matching exact terms Use Exa for semantic discovery first

Citing paper titles instead of arXiv IDs Titles can be ambiguous or duplicated Always include the arXiv ID (e.g., 1706.03762)

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md

After completing:

  • New pattern -> .claude/context/memory/learnings.md

  • Issue found -> .claude/context/memory/issues.md

  • Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

filesystem

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

slack-notifications

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

chrome-browser

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

context-compressor

No summary provided by upstream source.

Repository SourceNeeds Review