PaperBanana 🍌

Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister and Jinsung yoon

Hi everyone! The original version of PaperBanana is already open-sourced under Google-Research as PaperVizAgent. This repository forked the content of that repo and aims to keep evolving toward better support for academic paper illustration—though we have made solid progress, there is still a long way to go for more reliable generation and for more diverse, complex scenarios. PaperBanana is intended to be a fully open-source project dedicated to facilitating academic illustration for all researchers. Our goal is simply to benefit the community, so we currently have no plans to use it for commercial purposes.

PaperBanana is a reference-driven multi-agent framework for automated academic illustration generation. Acting like a creative team of specialized agents, it transforms raw scientific content into publication-quality diagrams and plots through an orchestrated pipeline of Retriever, Planner, Stylist, Visualizer, and Critic agents. The framework leverages in-context learning from reference examples and iterative refinement to produce aesthetically pleasing and semantically accurate scientific illustrations.

Here are some example diagrams and plots generated by PaperBanana:

Overview of PaperBanana

PaperBanana achieves high-quality academic illustration generation by orchestrating five specialized agents in a structured pipeline:

Retriever Agent: Identifies the most relevant reference diagrams from a curated collection to guide downstream agents
Planner Agent: Translates method content and communicative intent into comprehensive textual descriptions using in-context learning
Stylist Agent: Refines descriptions to adhere to academic aesthetic standards using automatically synthesized style guidelines
Visualizer Agent: Transforms textual descriptions into visual outputs using state-of-the-art image generation models
Critic Agent: Forms a closed-loop refinement mechanism with the Visualizer through multi-round iterative improvements

Quick Start

Step1: Clone the Repo

git clone https://github.com/dwzhu-pku/PaperBanana.git
cd PaperBanana

Step2: Configuration

PaperBanana supports configuring API keys from a YAML configuration file or via environment variables.

We recommend duplicate the configs/model_config.template.yaml file into configs/model_config.yaml to externalize all user configurations. This file is ignored by git to keep your api keys and configurations secret. In model_config.yaml, remember to fill in the two model names (defaults.model_name and defaults.image_model_name) and set at least one API key under api_keys (e.g. google_api_key for Gemini models).

Note that if you need to generate many candidates simultaneously, you will require an API key that supports high concurrency.

Step3: Downloading the Dataset

First download PaperBananaBench, then place it under the data directory (e.g., data/PaperBananaBench/). The framework is designed to function gracefully without the dataset by bypassing the Retriever Agent's few-shot learning capability. If interested in the original PDFs, please download them from PaperBananaDiagramPDFs.

Step4: Installing the Environment

We use uv to manage Python packages. Please install uv following the instructions here.

Create and activate a virtual environment

uv venv # This will create a virtual environment in the current directory, under .venv/
source .venv/bin/activate  # or .venv\Scripts\activate on Windows

Install python 3.12
```
uv python install 3.12
```
Install required packages
```
uv pip install -r requirements.txt
```

Launch PaperBanana

Interactive Demo (Streamlit)

The easiest way to launch PaperBanana is via the interactive Streamlit demo:

streamlit run demo.py

The web interface provides two main workflows:

1. Generate Candidates Tab:

Paste your method section content (Markdown recommended) and provide the figure caption.
Configure settings (pipeline mode, retrieval setting, number of candidates, aspect ratio, critic rounds).
Click "Generate Candidates" and wait for parallel processing.
View results in a grid with evolution timelines and download individual images or batch ZIP.

2. Refine Image Tab:

Upload a generated candidate or any diagram.
Describe desired changes or request upscaling.
Select resolution (2K/4K) and aspect ratio.
Download the refined high-resolution output.

Command-Line Interface

You can also run PaperBanana from the command line:

# Basic usage with default settings
python main.py

# Advanced usage with custom settings
python main.py \
  --dataset_name "PaperBananaBench" \
  --task_name "diagram" \
  --split_name "test" \
  --exp_mode "dev_full" \
  --retrieval_setting "auto"

Available Options:

--dataset_name: Dataset to use (default: PaperBananaBench)
--task_name: Task type - diagram or plot (default: diagram)
--split_name: Dataset split (default: test)
--exp_mode: Experiment mode (see section below)
--retrieval_setting: Retrieval strategy - auto, manual, random, or none (default: auto)

Experiment Modes:

vanilla: Direct generation without planning or refinement
dev_planner: Planner → Visualizer only
dev_planner_stylist: Planner → Stylist → Visualizer
dev_planner_critic: Planner → Visualizer → Critic (multi-round)
dev_full: Full pipeline with all agents
demo_planner_critic: Demo mode (Planner → Visualizer → Critic) without evaluation
demo_full: Demo mode (full pipeline) without evaluation

Visualization Tools

View pipeline evolution and intermediate results:

streamlit run visualize/show_pipeline_evolution.py

View evaluation results:

streamlit run visualize/show_referenced_eval.py

Project Structure

├── .venv/
│   └── ...
├── data/
│   └── PaperBananaBench/
│       ├── diagram/
│       │   ├── images/
│       │   ├── pdfs/
│       │   ├── test.json
│       │   └── ref.json
│       └── plot/
├── agents/
│   ├── __init__.py
│   ├── base_agent.py
│   ├── retriever_agent.py
│   ├── planner_agent.py
│   ├── stylist_agent.py
│   ├── visualizer_agent.py
│   ├── critic_agent.py
│   ├── vanilla_agent.py
│   └── polish_agent.py
├── prompts/
│   ├── __init__.py
│   ├── diagram_eval_prompts.py
│   └── plot_eval_prompts.py
├── style_guides/
│   ├── generate_category_style_guide.py
│   └── ...
├── utils/
│   ├── __init__.py
│   ├── config.py
│   ├── paperviz_processor.py
│   ├── eval_toolkits.py
│   ├── generation_utils.py
│   └── image_utils.py
├── visualize/
│   ├── show_pipeline_evolution.py
│   └── show_referenced_eval.py
├── scripts/
│   ├── run_main.sh
│   ├── run_demo.sh
├── configs/
│   └── model_config.template.yaml
├── results/
│   ├── PaperBananaBench_diagram/
│   └── parallel_demo/
├── main.py
├── demo.py
└── README.md

Key Features

Multi-Agent Pipeline

Reference-Driven: Learns from curated examples through generative retrieval
Iterative Refinement: Critic-Visualizer loop for progressive quality improvement
Style-Aware: Automatically synthesized aesthetic guidelines ensure academic quality
Flexible Modes: Multiple experiment modes for different use cases

Interactive Demo

Parallel Generation: Generate up to 20 candidate diagrams simultaneously
Pipeline Visualization: Track the evolution through Planner → Stylist → Critic stages
High-Resolution Refinement: Upscale to 2K/4K using Image Generation APIs
Batch Export: Download all candidates as PNG or ZIP

Extensible Design

Modular Agents: Each agent is independently configurable
Task Support: Handles both conceptual diagrams and data plots
Evaluation Framework: Built-in evaluation against ground truth with multiple metrics
Async Processing: Efficient batch processing with configurable concurrency

TODO List

Add support for using manually selected examples. Provide a user-friendly interface.
Upload code for generating statistical plots.
Upload code for improving existing diagrams based on style guideline.
Expand the reference set to support more areas beyond computer science.

Community Supports

Around the release of this repo, we noticed several community efforts to reproduce this work. These efforts introduce unique perspectives that we find incredibly valuable. We highly recommend checking out these excellent contributions: (welcome to add if we missed something):

Additionally, alongside the development of this method, many other works have been exploring the same topic of automated academic illustration generation—some even enabling editable generated figures. Their contributions are essential to the ecosystem and are well worth your attention (likewise, welcome to add):

Overall, we are encouraged that the fundamental capabilities of current models have brought us much closer to solving the problem of automated academic illustration generation. With the community's continued efforts, we believe that in the near future we will have high-quality automated drawing tools to accelerate academic research iteration and visual communication.

We warmly welcome community contributions to make PaperBanana even better!

License

Apache-2.0

Citation

If you find this repo helpful, please cite our paper as follows:

@article{zhu2026paperbanana,
  title={PaperBanana: Automating Academic Illustration for AI Scientists},
  author={Zhu, Dawei and Meng, Rui and Song, Yale and Wei, Xiyu and Li, Sujian and Pfister, Tomas and Yoon, Jinsung},
  journal={arXiv preprint arXiv:2601.23265},
  year={2026}
}

Disclaimer

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

Our goal is simply to benefit the community, so currently we have no plans to use it for commercial purposes. The core methodology was developed during my internship at Google, and patents have been filed for these specific workflows by Google. While this doesn't impact open-source research efforts, it restricts third-party commercial applications using similar logic.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
agents		agents
assets		assets
configs		configs
prompts		prompts
scripts		scripts
static		static
style_guides		style_guides
utils		utils
visualize		visualize
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
demo.py		demo.py
index.html		index.html
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperBanana 🍌

Overview of PaperBanana

Quick Start

Step1: Clone the Repo

Step2: Configuration

Step3: Downloading the Dataset

Step4: Installing the Environment

Launch PaperBanana

Interactive Demo (Streamlit)

Command-Line Interface

Visualization Tools

Project Structure

Key Features

Multi-Agent Pipeline

Interactive Demo

Extensible Design

TODO List

Community Supports

License

Citation

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

Folders and files

Latest commit

History

Repository files navigation

PaperBanana 🍌

Overview of PaperBanana

Quick Start

Step1: Clone the Repo

Step2: Configuration

Step3: Downloading the Dataset

Step4: Installing the Environment

Launch PaperBanana

Interactive Demo (Streamlit)

Command-Line Interface

Visualization Tools

Project Structure

Key Features

Multi-Agent Pipeline

Interactive Demo

Extensible Design

TODO List

Community Supports

License

Citation

Disclaimer

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages