protoBanana

OSS chat-native image generation + editing — the open-source counterpart to Google's Nano-Banana 2 / OpenAI's GPT-Image-2, served as an OpenAI-compatible LiteLLM provider on top of ComfyUI.

The mascot above was generated by protoBanana itself — chat completion through protolabs/qwen-image-chat, prompt: "a friendly cartoon banana waving hello, simple white background".

The capability — workflow JSON as OpenAI model name

ComfyUI is the open-source standard for composable image pipelines. Every pipeline is a workflow JSON: a graph of nodes, weights, prompts, sampler settings, conditional branches. It's expressive — and it speaks only its own /prompt REST API. Nothing in the wider OpenAI client ecosystem knows what a ComfyUI workflow is.

LiteLLM is the open-source standard for unifying LLM providers behind the OpenAI spec. Every OpenAI-compatible client — Open WebUI, the Anthropic / OpenAI SDKs, a curl one-liner, your CLI tool — already speaks it. It has no native ComfyUI provider.

protoBanana is the bridge. The provider package registers a custom LiteLLM provider that does three things:

Maps a model name (e.g. comfyui-qwen-image/qwen_image_edit_2511) to a workflow JSON on disk.
Translates between specs: takes an OpenAI request shape — /v1/images/generations, /v1/images/edits, or /v1/chat/completions with image parts — and patches the right slots of the workflow JSON (prompt, init image, mask, seed, size, custom KSampler params, etc.).
Submits to ComfyUI, polls /history/<id>, fetches /view, and returns the OpenAI response shape (ImageResponse(b64_json) or a chat completion with markdown-embedded images).

The net effect:

A ComfyUI workflow you authored in the web UI becomes a model name any OpenAI client can call.

That's why the same gateway alias works from Open WebUI, the Gradio app in this repo, protoCLI, a Python SDK script, or a curl /v1/images/generations. You don't write a new client per consumer; you write a workflow once and it appears in every OpenAI-compatible surface you have.

Authoring those workflows is out of scope here — that's ComfyUI's job. See protoLabsAI/comfy-workflows for the workflow library that gets bind-mounted into the gateway. The seven workflows shipped with this repo (qwen_image_2512, qwen_image_edit_2511, multiref_*, inpaint_*, outpaint_*, region_edit_*, bgremove_*) are reference implementations that prove the bridge works across every category — gen, edit, mask, multi-image, agent-routed chat — and back the model aliases below.

What it is

One gateway alias drives the full conversational image experience:

protolabs/qwen-image — text-to-image (/v1/images/generations)
protolabs/qwen-image-edit — image + instruction (/v1/images/edits)
protolabs/qwen-image-chat — multi-turn "draw → now make it blue", multi-reference compose, region edit, background removal, outpaint — routed by an LLM agent that owns the chat surface

The chat path is agent-driven: an LLM (default protolabs/fast = Qwen3.6-35B-A3B-FP8) decides whether to respond conversationally, call an image tool, or chain multiple tools. Conversational replies, clarifying questions, and "remove the bg, then put a sunset behind" chains all work — see docs/agent.md. Falls back to a deterministic keyword classifier when no LLM endpoint is configured.

Backed by Qwen-Image-2512 (gen) + Qwen-Image-Edit-2511 (edit, multi-ref, inpaint, outpaint, region edit) + BiRefNet/RMBG-2.0 (sticker) + SAM 3 (text→mask grounding for region edit). All seven phases shipped.

Why this exists

Nano-Banana 2 and GPT-Image-2 made conversational image editing mainstream. They're closed-source, hosted, and metered. For organizations that can't or won't send their data to a third party, the equivalent experience didn't exist as a single drop-in stack.

protoBanana fills that gap. It's the same call shape (/v1/chat/completions with image output), the same UX ("draw a cat" → "now make it blue"), running entirely on local GPUs through your own LiteLLM gateway.

Headline numbers

	nano-banana 2	protoBanana (Phase 1)
Operation auto-routing per chat turn	✓	✓
Conversational replies + clarifying questions	✓	✓ (agent path)
Chained operations in one chat turn	✓	✓ (agent calls tools in sequence)
Text-to-image	✓	✓
Single-image instruction edit	✓	✓
Multi-reference compose	up to 14 refs	up to 3 (Qwen-Image-Edit cap)
Background removal / sticker	✓	✓
Text-region edit (`"change the man's tie"`)	✓	✓ (SAM 3 text→mask)
Inpaint with provided mask	✓	✓ (`/v1/images/edits` + mask)
Outpaint	✓	✓ (`"extend left"`, `"make this wider"`)
Hosted	yes	no — all local
Cost per image	metered	electricity

See PHASES.md for the per-phase rationale.

Quickstart

# 1. Install into your LiteLLM gateway environment.
#    [tracing] pulls langfuse v2 (LiteLLM-compatible).
#    [agent] pulls openai client for the chat agent.
pip install 'protobanana[tracing,agent] @ git+https://github.com/protoLabsAI/protoBanana.git'

# 2. Add to LiteLLM config.yaml:
model_list:
  - model_name: protolabs/qwen-image
    litellm_params:
      model: protobanana/qwen_image_2512
      api_base: http://your-comfyui-host:8188
    model_info: { mode: image_generation }

  - model_name: protolabs/qwen-image-chat
    litellm_params:
      model: protobanana/chat
      api_base: http://your-comfyui-host:8188
    model_info: { mode: chat, supports_vision: true }

litellm_settings:
  custom_provider_map:
    - { provider: "protobanana", custom_handler: "protobanana.handler" }

# 3. Mount the workflows dir into the gateway container at /app/workflows
#    (or set PROTOBANANA_WORKFLOWS_DIR)

# 4. (Optional but recommended) Enable the chat agent:
#    PROTOBANANA_AGENT_BASE=http://localhost:4000/v1   # gateway calls itself
#    PROTOBANANA_AGENT_KEY=$LITELLM_MASTER_KEY
#    PROTOBANANA_AGENT_MODEL=protolabs/fast            # or protolabs/smart

# 5. Hit it like any OpenAI chat endpoint
curl -X POST http://your-gateway:4000/v1/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -d '{"model":"protolabs/qwen-image-chat","messages":[
    {"role":"user","content":"a cat in a hat, watercolor"}
  ]}'

# Then continue the conversation:
#   {"role":"user","content":"make it a bowling cap"}
# → agent picks region_edit("the hat" → "a bowling cap"), preserves
#   everything else pixel-perfect.

Returns an assistant message with a markdown-embedded data:image/png;base64,... URL — Open WebUI displays inline like a regular image attachment.

See docs/installation.md for the full setup (ComfyUI install, model downloads + symlinks, GPU planning).

Architecture

                 OpenAI client (Open WebUI / protoCLI / curl)
                          │
                          ▼
                    LiteLLM gateway
                          │
                          ▼
                  ProtoBananaProvider
                          │
                ┌─────────┴──────────┐
                ▼                    ▼
        chat agent loop      keyword classifier
        (LLM picks tool)     (fallback when no LM)
                │                    │
                └─────────┬──────────┘
                          ▼
       ┌──────┬──────┬──────┬──────┬──────┬──────┐
       ▼      ▼      ▼      ▼      ▼      ▼      ▼
      gen   edit  region multi  bgremove inpaint outpaint
                  edit   ref
       │      │      │      │      │      │      │
       └──────┴──────┴──────┴──────┴──────┴──────┘
                          │
                          ▼
                    ComfyUIClient
                  (HTTP transport)
                          │
                          ▼
                       ComfyUI
            (Qwen-Image-2512 / Qwen-Image-Edit-2511 /
                 BiRefNet / RMBG-2.0 / SAM 3)

The chat agent is the default; the keyword classifier kicks in only when PROTOBANANA_AGENT_BASE is unset or the LLM endpoint fails. Either path calls the same six route modules.

See docs/architecture.md for the full breakdown and docs/agent.md for the agent loop in detail.

Test/eval UI

The Gradio app at app/ is a reference consumer + quick-test surface, not the product. It exists for two specific reasons:

Dogfooding the bridge. Every tab posts to the gateway exactly the way any other OpenAI client would. If a workflow regresses or a provider change breaks the request shape, Gradio surfaces it before downstream consumers do.
Fast iteration on workflows. Author a workflow in ComfyUI, drop the JSON into the gateway's mount, add a model alias to LiteLLM, hit the Gradio tab to validate end-to-end — usually 30 seconds.

Five tabs covering Generate, Edit, Multi-ref, Sticker, and Chat (multi-turn auto-routing). Runs anywhere with Python 3.11+; intended for local debugging AND HuggingFace Space deployment. See app/README.md and docs/gradio-app.md.

pip install -e ".[gradio]"
GATEWAY_URL=http://your-gateway:4000/v1 GATEWAY_API_KEY=sk-... python -m app

If you only want the bridge in your gateway — e.g. you'll consume it from Open WebUI, your own CLI, or a non-Python client — skip the [gradio] extra and the app/ directory entirely. The provider package is self-contained.

Documentation


PROPOSAL.md	The strategic system design + why-this-shape
PHASES.md	The 7-phase roadmap with status, models needed, acceptance criteria
JOURNEY.md	How we got here — the full backfill (research → broken integrations → gateway → agent)
HOWTO.md	User-facing guide: prompting recipes, multi-ref tricks, intent keywords
app/README.md	Gradio test/eval UI — local + HF Space
docs/installation.md	Full setup from a clean machine
docs/operating.md	Day-2 ops: GPU planning, model swaps, troubleshooting
docs/architecture.md	Component breakdown + extension points
docs/agent.md	The tool-use chat agent — loop, tools, env, fallback, multi-step examples
docs/workflows-cookbook.md	How to add a new ComfyUI workflow
docs/intent-router.md	How the keyword fallback path routes requests
docs/gradio-app.md	Test/eval UI architecture + HF Space deploy
docs/api.md	Client-facing API reference
docs/observability.md	Langfuse tracing — what's captured, env, recommended views
docs/validating-workflows.md	Static workflow validator + e2e smoke (pre-merge gate)
docs/benchmarks.md	Quality + latency methodology
DECISIONS.md	Architectural decision records
CHANGELOG.md	Per-version log

Building on prior art

protoBanana is a synthesis, not an invention. Credit:

Component	Source
Image gen + edit	Qwen-Image, Qwen-Image-Edit-2511 (Alibaba)
Background removal	BiRefNet, RMBG-2.0 (BRIA)
Region segmentation (Phase 4)	Florence-2 (Microsoft), SAM 2.1 (Meta)
Universal inpaint (Phase 5)	LanPaint
Bundled ComfyUI nodes	ComfyUI-RMBG (1038lab) — RMBG/BiRefNet/SAM/Grounding
LLM gateway	LiteLLM (BerriAI)
Image runtime	ComfyUI
Original paradigm	Nano-Banana 2 (Google), GPT-Image-2 (OpenAI)

License

Apache-2.0. Workflows and node-pack dependencies retain their original licenses (see workflows/<name>.json's _doc field for per-workflow notes — RMBG-2.0 is CC BY-NC 4.0 / non-commercial).

Citation

See CITATION.cff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

protoBanana

The capability — workflow JSON as OpenAI model name

What it is

Why this exists

Headline numbers

Quickstart

Architecture

Test/eval UI

Documentation

Building on prior art

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
app		app
docs		docs
protobanana		protobanana
scripts		scripts
tests		tests
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
DECISIONS.md		DECISIONS.md
HOWTO.md		HOWTO.md
JOURNEY.md		JOURNEY.md
LICENSE		LICENSE
PHASES.md		PHASES.md
PROPOSAL.md		PROPOSAL.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

protoBanana

The capability — workflow JSON as OpenAI model name

What it is

Why this exists

Headline numbers

Quickstart

Architecture

Test/eval UI

Documentation

Building on prior art

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages