paperbanana

Generate publication-quality academic diagrams from paper methodology text

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "paperbanana" with this command: npx skills add dwzhu-pku/paperbanana

PaperBanana

Generate publication-quality academic diagrams and pipeline figures from a paper's methodology section and figure caption. PaperBanana orchestrates a multi-agent pipeline (Retriever, Planner, Stylist, Visualizer, Critic) to produce camera-ready figures suitable for venues like NeurIPS, ICML, and ACL.

Environment Setup

cd <repo-root>
uv pip install -r requirements.txt

Set your API key via environment variable or in configs/model_config.yaml.

Option 1 (Recommended): OpenRouter API key — one key for both text reasoning and image generation:

export OPENROUTER_API_KEY="sk-or-v1-..."

Option 2: Google API key — direct access to Gemini API:

export GOOGLE_API_KEY="your-key-here"

If both keys are configured, OpenRouter is used by default.

Usage

python skill/run.py \
  --content "METHOD_TEXT" \
  --caption "FIGURE_CAPTION" \
  --task diagram \
  --output output.png

Parameters

ParameterRequiredDefaultDescription
--contentYes*Method section text to visualize
--content-fileYes*Path to a file containing the method text (alternative to --content)
--captionYesFigure caption or visual intent
--taskNodiagramTask type: diagram
--outputNooutput.pngOutput image file path
--aspect-ratioNo21:9Aspect ratio: 21:9, 16:9, or 3:2
--max-critic-roundsNo3Maximum critic refinement iterations
--num-candidatesNo10Number of parallel candidates to generate
--retrieval-settingNoautoRetrieval mode: auto, manual, random, or none
--main-model-nameNogemini-3.1-pro-previewMain model for VLM agents. Provider auto-detected from configured API key
--image-gen-model-nameNogemini-3.1-flash-image-previewModel for image generation. Also supports gemini-3-pro-image-preview
--exp-modeNodemo_fullPipeline: demo_full (with Stylist) or demo_planner_critic (without Stylist)

*One of --content or --content-file is required.

When --num-candidates > 1, output files are named <stem>_0.png, <stem>_1.png, etc.

Output

The absolute path of each saved image is printed to stdout, one per line.

Examples

Diagram

python skill/run.py \
  --content "We propose a transformer-based encoder-decoder architecture. The encoder consists of 12 self-attention layers with residual connections. The decoder uses cross-attention to attend to encoder outputs and generates the target sequence autoregressively." \
  --caption "Figure 1: Overview of the proposed transformer architecture" \
  --task diagram \
  --output architecture.png

Important Notes

  • Runtime: A single candidate typically takes 3-10 minutes depending on model and network conditions. With the default 10 candidates running in parallel, expect ~10-30 minutes total. Plan accordingly.
  • API calls: Each candidate involves multiple LLM calls (Retriever + Planner + Stylist + Visualizer + up to 3 Critic rounds). Candidates run in parallel for efficiency.
  • Image generation: The Visualizer agent calls an image generation model (Gemini Image) to render diagrams.

About

PaperBanana is based on the PaperVizAgent framework, a reference-driven multi-agent system for automated academic illustration. It was developed as part of the research paper:

PaperBanana: Automating Academic Illustration for AI Scientists Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon arXiv:2601.23265

The framework introduces a collaborative team of five specialized agents — Retriever, Planner, Stylist, Visualizer, and Critic — to transform raw scientific content into publication-quality diagrams. Evaluation is conducted on the PaperBananaBench benchmark.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Investment Portfolio

投资组合分析。资产配置、风险评估、再平衡、收益分析、分散化、回测。Portfolio analysis with allocation, rebalancing. 投资组合、资产配置。

Registry SourceRecently Updated
2950ckchzh
Research

ClawInsight

Earn rewards by helping brands understand consumers. With your explicit consent, contributes anonymized market research data through natural conversation and...

Registry SourceRecently Updated
Research

Policy Reader

政策解读助手。政策文件摘要、要点提取、影响分析、合规建议。Policy reader with document summary, key points extraction, impact analysis. 政策解读、法规解读、行业政策。Use when understanding government po...

Registry SourceRecently Updated
1510Profile unavailable
Research

Paraphraser

文本改写/降重、学术改写、简化表达、正式化、扩写、前后对比。Paraphrase, rewrite for academic tone, simplify, formalize, expand, and compare text versions. Use when you need paraphraser ca...

Registry SourceRecently Updated
1620Profile unavailable