Generate publication-ready academic diagrams from methodology text using AI.
Open-source reimplementation of PaperBanana (Zhu et al., 2026). Six specialized agents generate methodology diagrams and statistical plots via a two-phase pipeline: planning (Prompt Enhancer → Retriever → Planner → Stylist) and iterative refinement (Visualizer ↔ Critic). Powered by Google Gemini.
Disclaimer: Unofficial, independent reimplementation for research and education. Not affiliated with or endorsed by the original authors or Google/DeepMind.
Text prompt → iterative refinement → publication-ready diagram:
| VLM-as-Judge Framework | Pipeline Architecture | Segmentation | Quantum |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| Score: 9.2/10 | Score: 9.8/10 | Comparison grid | Surface code |
Each diagram generated from text in ~3 minutes for ~$0.50. See example prompts for the full scored prompts.
- Free and open source. No $300/year subscription. Bring your own Gemini API key (~$0.50 per diagram).
- Publication-ready output. Exports to PNG (300 DPI). Designed for NeurIPS/ICML-style figures.
git clone https://github.com/efradeca/freepaperbanana.git
cd freepaperbanana
pip install -e ".[dev]"freepaperbanana setup
# Opens browser to get a free Gemini API key → saves to .envimport asyncio
from freepaperbanana import FreePaperBananaPipeline, GenerationInput, DiagramType
async def main():
pipeline = FreePaperBananaPipeline()
result = await pipeline.generate(
GenerationInput(
source_context="We propose a transformer-based encoder-decoder...",
communicative_intent="Overview of the proposed architecture.",
diagram_type=DiagramType.METHODOLOGY,
)
)
print(f"Output: {result.image_path}")
asyncio.run(main())Try it online: HF Space demo (no installation needed)
result = await pipeline.generate(
GenerationInput(
source_context="""
We propose a multi-agent framework consisting of five specialized agents.
A Retriever selects reference examples, a Planner generates textual descriptions
via in-context learning, and a Stylist refines them for aesthetics. In the
refinement phase, a Visualizer renders images and a Critic evaluates them,
iterating for T rounds.
""",
communicative_intent="Overview of the multi-agent illustration pipeline.",
diagram_type=DiagramType.METHODOLOGY,
)
)result = await pipeline.generate(
GenerationInput(
source_context="""
Our model uses a U-Net architecture with skip connections. The encoder
downsamples through 4 stages of conv-batchnorm-relu blocks. The decoder
mirrors this with transposed convolutions. Skip connections concatenate
encoder features at each scale.
""",
communicative_intent="U-Net encoder-decoder with skip connections.",
diagram_type=DiagramType.METHODOLOGY,
)
)freepaperbanana plot \
-d results.csv \
--intent "Grouped bar chart comparing F1 scores across 4 models on 3 benchmarks"| Type | Status | Description |
|---|---|---|
| Methodology diagrams | Ready | Flowcharts, pipelines, system architectures |
| Model architectures | Ready | Neural network structures, encoder-decoders |
| Process diagrams | Ready | Multi-stage workflows, data flows |
| Training pipelines | Ready | Training loops, loss flows, optimization steps |
| Comparison figures | Ready | Side-by-side method comparisons, ablation visuals |
| Statistical plots | Ready | Bar charts, line plots, scatter plots (via matplotlib) |
| Multi-panel figures | Partial | Limited by aspect ratio constraints |
| Tables / algorithms | Coming soon | Not yet supported |
| Format | Resolution | Recommended Use |
|---|---|---|
| PNG | 300 DPI | Paper submissions, presentations |
- Multi-provider support (OpenAI, Anthropic alongside Gemini)
- Multi-panel figure generation with sub-figure layout
- PyPI package (
pip install freepaperbanana) - Algorithm/pseudocode rendering
- Interactive editing — click to modify specific diagram elements
Contributions welcome! See CONTRIBUTING.md for setup, conventions, and PR guidelines.
Look for issues labeled good first issue to get started.
pytest tests/ -v # 179 tests, all mocked (no API key needed)
ruff check src/ tests/ # Lint
ruff format --check src/ # Format check- License: MIT (code) / CC BY 4.0 + Public Domain (reference images).
- Independent implementation: Inspired by the published paper (arXiv:2601.23265). No code was copied from any existing repository. The llmsresearch/paperbanana community project (MIT) was consulted as architectural reference.
- Generated outputs: Images produced by this pipeline are generated via the Google Gemini API. Users are responsible for compliance with Google's Generative AI Terms of Service.
If you use FreePaperBanana in your research, please cite the original paper:
@article{zhu2026paperbanana,
title={PaperBanana: Automating Academic Illustration for AI Scientists},
author={Zhu, Dawei and Meng, Rui and Song, Yale and Wei, Xiyu and Li, Sujian and Pfister, Tomas and Yoon, Jinsung},
journal={arXiv preprint arXiv:2601.23265},
year={2026}
}@software{freepaperbanana2026,
title={FreePaperBanana: Open-Source Multi-Agent Academic Illustration Generation},
author={Deulofeu, Efrain},
year={2026},
url={https://github.com/efradeca/freepaperbanana},
license={MIT}
}- PaperBanana (Zhu et al., 2026) for the original methodology.
- 331 reference figures from 80 published papers (all CC BY 4.0 / Public Domain). See THIRD_PARTY_NOTICES.md. Authors may request removal via Issues.
- llmsresearch/paperbanana community reimplementation (MIT), consulted as architectural reference.




