paper_claw

Fetch, classify, and summarize papers from multiple sources (arXiv, etc.) with AI-powered multi-language summaries and email delivery.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "paper_claw" with this command: npx skills add pigeondan1/paperclaw

Paper Claw Skill

Intelligent multi-source paper digest generator. Automatically fetch, classify, and summarize papers with AI-powered translations in 7 languages.

Features

  • 🌐 Multi-Source Support — arXiv (170+ categories), extensible for CNKI, Web of Science
  • 🗣️ Multi-Language — Chinese, English, Japanese, Korean, German, French, Spanish
  • 🤖 Multi-Provider LLM — Kimi, OpenAI, Claude, Gemini, DeepSeek with auto-fallback
  • 📧 Email Delivery — HTML digests with full Markdown attachment
  • 👥 Recipient Management — JSON-based configuration
  • ⚙️ Config-Driven — Zero-code customization
  • 🔄 State Persistence — Auto-deduplication

Setup

1. Environment Variables

Required for email delivery:

export SMTP_HOST="smtp.qq.com"
export SMTP_PORT="465"
export SMTP_USER="your-email@qq.com"
export SMTP_PASS="your-auth-code"

Optional for AI summaries (multiple providers supported):

# Primary: Kimi AI (recommended for Chinese)
export MOONSHOT_API_KEY="sk-your-kimi-key"

# Alternatives (auto-fallback)
export OPENAI_API_KEY="sk-your-openai-key"
export ANTHROPIC_API_KEY="sk-your-claude-key"
export GOOGLE_API_KEY="your-gemini-key"
export DEEPSEEK_API_KEY="sk-your-deepseek-key"

2. Recipient Configuration

Create config/recipients.json:

{
  "recipients": [
    {"email": "prof@university.edu.cn", "name": "Professor", "enabled": true},
    {"email": "student@university.edu.cn", "name": "Student", "enabled": true}
  ]
}

3. Source & Category Configuration

Edit config/default.json to customize sources:

{
  "sources": {
    "arxiv": {
      "enabled": true,
      "categories": [
        {"id": "cs.CL", "name": "NLP", "url": "https://arxiv.org/list/cs.CL/recent"},
        {"id": "cs.CV", "name": "Computer Vision", "url": "https://arxiv.org/list/cs.CV/recent"}
      ]
    }
  }
}

See config/arxiv_categories.json for all 170+ available categories.

4. Language Configuration

{
  "language": {
    "default": "zh",
    "supported": ["zh", "en", "ja", "ko", "de", "fr", "es"]
  }
}

Quick Start for Agents

The fastest way to configure Paper Claw is using Presets:

from skill.example import list_presets, preview_preset, apply_preset

# Step 1: See available presets
presets = list_presets()
# Returns: [
#   {"id": "speech_audio", "name": "Speech & Audio", ...},
#   {"id": "nlp", "name": "NLP & LLM", ...},
#   {"id": "computer_vision", "name": "Computer Vision", ...},
#   {"id": "general_ai", "name": "General AI/ML", ...}
# ]

# Step 2: Preview what will be configured
preview = preview_preset("nlp")
# Shows: arXiv categories (cs.CL, cs.LG) and classification categories (LLM, RAG, etc.)

# Step 3: Apply the preset
apply_preset("nlp")  # Updates config/default.json automatically

Available Presets

Preset IDResearch FieldArXiv CategoriesClassification
speech_audioSpeech & Audiocs.SD, eess.ASSpeech LLM, ASR, TTS, Enhancement, SLU, Paralinguistics, Audio
nlpNLP & LLMcs.CL, cs.LG, cs.AILLM, RAG, Agents, NLP Tasks, Evaluation
computer_visionComputer Visioncs.CV, cs.MM, cs.LGImage Generation, Object Detection, Segmentation, Video Understanding, Multimodal, 3D Vision
general_aiGeneral AI/MLcs.AI, cs.LG, cs.CL, cs.CV, stat.MLDeep Learning, RL, Generative Models, Optimization, Theory, Applications

Detailed Usage

List Presets

from skill.example import list_presets

presets = list_presets()
for p in presets:
    print(f"{p['id']}: {p['name']}")
    print(f"  {p['description']}")

Preview Before Apply

from skill.example import preview_preset

# See what will be configured
preview = preview_preset("computer_vision")
print(f"ArXiv categories: {[c['id'] for c in preview['arxiv_categories']]}")
print(f"Classifications: {[c['name'] for c in preview['classification_categories']]}")

Apply Preset

from skill.example import apply_preset

# Apply NLP configuration
result = apply_preset("nlp")
if result["success"]:
    print(f"Applied: {result['preset_name']}")
    print(f"ArXiv: {result['arxiv_categories']}")
    print(f"Categories: {result['classification_categories']}")

Fetch Papers

# Fetch today's papers (default language from config)
python scripts/main.py

# Fetch with specific language
python scripts/main.py --day 2026-03-10 --language en
python scripts/main.py --day 2026-03-10 --language ja  # Japanese

# Fetch date range
python scripts/main.py --start-date 2026-03-01 --end-date 2026-03-10

Generated Outputs

  • Markdown digest: content/posts/YYYY-MM-DD-arxiv-audio-digest.md
  • JSON data: data/processed/YYYY-MM-DD.json
  • Raw data: data/raw/YYYY-MM-DD.json

Email Delivery

Email is automatically sent with:

  • HTML preview — Shows first 3 papers with logo and GitHub link
  • Full Markdown attachment — Complete digest with all papers

Schedule Daily Runs

GitHub Actions: Already configured in .github/workflows/daily_digest.yml

Linux/Mac Cron:

0 1 * * * cd /path/to/paper_claw && python scripts/main.py

Windows Task Scheduler:

$Action = New-ScheduledTaskAction -Execute "python.exe" -Argument "scripts/main.py"
$Trigger = New-ScheduledTaskTrigger -Daily -At "09:00"
Register-ScheduledTask -TaskName "PaperClaw" -Action $Action -Trigger $Trigger

AI Summary Chain

The system uses intelligent fallback across providers:

Kimi → OpenAI → Claude → DeepSeek → Gemini → Rule-based

Even without API keys, summaries are generated using rule-based methods.

Agent Tools

fetch_papers

Fetch papers from configured sources.

Parameters:

  • day (string, optional): Date in YYYY-MM-DD format
  • start_date + end_date (string, optional): Date range
  • language (string, optional): Output language (zh/en/ja/ko/de/fr/es)

Example:

from skill.example import fetch_papers
result = fetch_papers(day="2026-03-10", language="en")

configure_sources

Update data sources and categories.

Parameters:

  • sources (object): Source configuration with categories

Example:

from skill.example import configure_sources
configure_sources({
    "arxiv": {
        "enabled": True,
        "categories": [
            {"id": "cs.AI", "name": "AI"},
            {"id": "cs.LG", "name": "ML"}
        ]
    }
})

configure_language

Set output language for summaries.

Parameters:

  • language (string): One of zh/en/ja/ko/de/fr/es

Example:

from skill.example import configure_language
configure_language("ja")  # Japanese output

get_digest_content

Retrieve generated digest.

Parameters:

  • date (string): Date in YYYY-MM-DD format
  • format (string): "markdown", "json", or "summary"

Example:

from skill.example import get_digest_content
content = get_digest_content("2026-03-10", format="summary")

configure_recipients

Update email recipients.

Parameters:

  • recipients (array): List of {email, name, enabled}

Example:

from skill.example import configure_recipients
configure_recipients([
    {"email": "user@example.com", "name": "User", "enabled": True}
])

Preset Details

Speech & Audio (Default)

Best for: Speech recognition, synthesis, audio processing researchers

ArXiv Categories:

  • cs.SD - Sound (Audio processing, music computing)
  • eess.AS - Audio and Speech Processing

Classification:

CategoryKeywords
Speech LLMspeech llm, audio llm, spoken language model
ASRasr, speech recognition, speech-to-text, whisper
TTStts, text-to-speech, speech synthesis, tacotron
Enhancementspeech enhancement, noise reduction, beamforming
SLUspoken language understanding, intent recognition
Paralinguisticsemotion recognition, speaker verification
Audioaudio classification, sound event detection

NLP & LLM

Best for: Natural language processing, large language model researchers

ArXiv Categories:

  • cs.CL - Computation and Language
  • cs.LG - Machine Learning
  • cs.AI - Artificial Intelligence

Classification:

CategoryKeywords
LLMllm, gpt, transformer, prompt engineering, llama, bert
RAGrag, retrieval-augmented, knowledge base, embedding
Agentsagent, multi-agent, tool use, function calling
NLP Tasksner, sentiment analysis, translation, summarization
Evaluationbenchmark, evaluation metrics, human evaluation

Computer Vision

Best for: Computer vision, image processing, multimodal researchers

ArXiv Categories:

  • cs.CV - Computer Vision
  • cs.MM - Multimedia
  • cs.LG - Machine Learning

Classification:

CategoryKeywords
Image Generationdiffusion model, gan, stable diffusion, text-to-image
Object Detectionyolo, rcnn, ssd, bounding box
Segmentationsemantic segmentation, mask, sam, u-net
Video Understandingaction recognition, temporal, tracking
Multimodalvision-language, clip, image-text, vqa
3D Visionpoint cloud, depth estimation, nerf

General AI/ML

Best for: Broad AI/ML research covering multiple domains

ArXiv Categories:

  • cs.AI, cs.LG, cs.CL, cs.CV, stat.ML

Classification:

CategoryKeywords
Deep Learningneural network, optimization, gradient descent
Reinforcement Learningrl, q-learning, policy gradient, actor-critic
Generative Modelsgan, vae, diffusion, flow-based
Optimizationconvex optimization, learning rate, adam
Theorygeneralization, convergence, bounds, complexity
Applicationshealthcare, finance, robotics, real-world

Customizing After Preset

After applying a preset, you can further customize:

from skill.example import configure_sources, configure_categories

# Add more arXiv categories
configure_sources({
    "arxiv": {
        "enabled": True,
        "categories": [
            {"id": "cs.IR", "name": "Information Retrieval", 
             "url": "https://arxiv.org/list/cs.IR/recent"}
        ]
    }
})

# Add custom classification category
configure_categories([
    {
        "name": "Your Custom Category",
        "labels": {"zh": "自定义分类", "en": "Custom"},
        "keywords": ["keyword1", "keyword2"]
    }
])

SMTP Providers

ServiceHostPortNote
QQ Mailsmtp.qq.com465Use authorization code
163 Mailsmtp.163.com465Use authorization code
Gmailsmtp.gmail.com465Use app password

Notes

  • All configurations are in config/ directory
  • .env and config/recipients.json are git-ignored for security
  • API rate limits: System auto-retries with fallback providers
  • State is tracked in data/state.json to avoid duplicate processing
  • Email includes both HTML preview and full Markdown attachment
  • Logo displayed in emails from GitHub raw URL

Examples

# Quick start - fetch and send email
python scripts/main.py --day 2026-03-10

# Multi-language examples
python scripts/main.py --day 2026-03-10 --language zh  # Chinese
python scripts/main.py --day 2026-03-10 --language en  # English
python scripts/main.py --day 2026-03-10 --language ja  # Japanese

# View paper count
cat data/processed/2026-03-10.json | jq '.summary.total'

# View papers by category
cat data/processed/2026-03-10.json | jq '.grouped.ASR'

# Reset state and re-fetch
python scripts/reset_state.py
python scripts/main.py --day 2026-03-10

Files

  • skill/tools.json — Tool definitions for agent frameworks
  • skill/example.py — Python usage examples
  • config/default.json — Source and language configuration
  • config/arxiv_categories.json — Complete arXiv category list
  • config/recipients.example.json — Recipient template

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Gemini Deep Research → Notion

Trigger Gemini Deep Research via browser and save results to Notion. Use when the user asks to "deep research" a topic, says "gemini deep research", or wants...

Registry SourceRecently Updated
3630Profile unavailable
Research

微信公众号发布工具

微信公众号发布工具 - 安全版 v2.8,支持 Knowledge-Base 主题、分步流程、一键发布,优化表格和 Markdown 渲染

Registry SourceRecently Updated
3700Profile unavailable
Research

Clawbars Skills

Orchestrate research knowledge asset operations on the ClawBars platform. Convert scattered, one-time research analysis into persistent, reusable, governable...

Registry SourceRecently Updated
3180Profile unavailable
Research

Mem Skill

Self-evolving memory and knowledge accumulation system for AI agents. Acts as a persistent 'second brain' that automatically retrieves past experiences, capt...

Registry SourceRecently Updated
2920Profile unavailable