Academic Figure Generation

Thin CLI wrapper around PaperBanana (a.k.a. PaperVizAgent), a multi-agent figure-generation pipeline for academic papers.

The skill provides exactly one script: scripts/generate.py. It feeds your method text + caption into PaperBanana and writes N candidate PNGs. Model selection and API keys come from PaperBanana's own configs/model_config.yaml — the wrapper does not override them.

One-time setup

Clone PaperBanana somewhere convenient:

git clone https://github.com/dwzhu-pku/PaperBanana.git ~/PaperBanana
cd ~/PaperBanana
uv venv && uv pip install -r requirements.txt

Configure configs/model_config.yaml — set the image model and the matching API key. Two common setups:

defaults:
  image_model_name: "gemini-3-pro-image-preview"   # or "openai/gpt-5.4-image-2"
  model_name: "gemini-3.1-pro-preview"             # text model for Planner/Stylist/Critic

api_keys:
  google_api_key: "..."        # required for Gemini models
  openrouter_api_key: ""       # required for openai/gpt-5.4-image-2

Use Gemini if you have a Google AI key; use GPT-Image-2 via OpenRouter if you have an OpenRouter key. Pick one — there's nothing else to wire up.

Workflow

Step 1: Gather inputs

You need:

Method text: the relevant section of the paper describing the approach (./method.md or ./method.tex).
Figure caption: the target caption, e.g. "Figure 1: Overview of our framework".

If the user only gives a vague request, ask:

What aspect of the method should the figure focus on?
Style? (block diagram, flowchart, pipeline, architecture, comparison)
Venue / column width? (ACL ≤ 7.5", NeurIPS single-column 5.5")

Step 2: Generate

~/PaperBanana/.venv/bin/python scripts/generate.py \
  --paperbanana-root ~/PaperBanana \
  --method-file ./method.md \
  --caption "Figure 1: Overview of our framework" \
  --out-dir ./figures/v1 \
  --candidates 3 \
  --aspect-ratio 16:9

Flag	Default	Notes
`--paperbanana-root`	(required)	Path to your PaperBanana checkout
`--method-file`	(required)	Method section as a text/markdown file
`--caption`	(required)	Target figure caption
`--out-dir`	(required)	Where PNGs land
`--candidates`	`3`	Independent diagram candidates
`--max-concurrent`	`2`	Cap concurrent runs (be gentle on quota)
`--exp-mode`	`demo_full`	Full pipeline (Planner+Stylist+Visualizer+Critic). Use `demo_planner_critic` to skip Stylist, or `vanilla` for single-shot.
`--aspect-ratio`	`16:9`	One of `21:9`, `16:9`, `3:2`, `1:1`
`--max-critic-rounds`	`2`	Critique → revise loops (early-exits if critic says "No changes needed")

Step 3: Present & iterate

Show all candidates to the user.
Common refinements: color scheme, layout, label text, font size.
Re-run with a tweaked caption or more candidates.

Step 4: Export

PNGs are written as candidate_0.png, candidate_1.png, … in --out-dir.
For camera-ready PDFs: magick candidate_0.png candidate_0.pdf.

Style guidelines

Color: consistent, colorblind-friendly palette
Fonts: match the paper's body font (Times for ACL/EMNLP, Helvetica/Arial for many ML venues)
Labels: concise; no full sentences inside the diagram
Arrows: solid for data flow, dashed for optional / feedback loops
Whitespace: don't overcrowd — reviewers skim figures in seconds

Common figure types

Type	When to use	Key elements
Pipeline / Flowchart	Sequential processing	Boxes + arrows, L→R or T→B
Architecture	System overview	Nested boxes, clear module boundaries
Comparison	Before/after, baseline vs proposed	Side-by-side panels
Ablation	Component contributions	Bar charts, highlighted rows
Framework	High-level conceptual overview	Abstract shapes, minimal detail

Troubleshooting

429 RESOURCE_EXHAUSTED on Gemini: monthly Google AI Studio spending cap hit. Raise it at https://ai.studio/spend or switch image_model_name to openai/gpt-5.4-image-2 and set OPENROUTER_API_KEY.
OpenRouter Client not initialized: OPENROUTER_API_KEY not in env and openrouter_api_key not in yaml.
No PNGs in output dir: check out_dir/results.json for the raw per-candidate response and any error messages.
Long latency (>5 min): most wall time is the image model. Lower --candidates or use --exp-mode vanilla for faster iteration.

academic-figure-generation

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

dev-browser

agent-browser

understand-chat

understand-dashboard

Recibe nuevas skills de Pesquisa e Web todos los lunes