Skip to content

feat(image_gen): multi-reference input support, starting with openai-codex#15308

Open
dejay2 wants to merge 1 commit into
NousResearch:mainfrom
dejay2:feat/openai-codex-multi-reference-images
Open

feat(image_gen): multi-reference input support, starting with openai-codex#15308
dejay2 wants to merge 1 commit into
NousResearch:mainfrom
dejay2:feat/openai-codex-multi-reference-images

Conversation

@dejay2

@dejay2 dejay2 commented Apr 24, 2026

Copy link
Copy Markdown

Summary

Adds an opt-in capability contract so image_gen providers can accept reference images, and wires the openai-codex plugin up as the first provider to opt in. The image_generate tool gains an optional references field (array of local file paths); the dispatcher forwards it only when the active provider advertises supports_references=True, otherwise rejects early with references_unsupported instead of silently dropping it.

Motivation — the Codex image_generation tool accepts input_image after all

The existing baoyu-comic skill documents image_generate as "prompt-only — it does NOT accept reference images", and #14317 explicitly scoped the openai-codex plugin to the generate endpoint. This PR's finding contradicts both: passing one or more input_image content items on the user message of the Codex responses.stream(...) call works, and the image_generation tool uses the attached images for composition, style transfer, and edits.

Empirically verified against the live Codex backend with three tests:

  1. Two-subject composition — red apple (reference 1) + blue ceramic teapot (reference 2) → single still-life on a shared wooden table with consistent lighting. Both subjects reproduced faithfully (same shape, same surface, same highlights).
  2. Cross-reference style transfer — a minimalist line-drawn mountain (reference 1) + a photographic teapot (reference 2), prompt says "reproduce reference 1 exactly, place reference 2 on a shelf, keep reference 1's line style" → teapot rendered in reference 1's ink style, mountain preserved. Style transfer and subject extraction both work.
  3. Single-reference re-lighting — apple on white (reference 1) → apple on grey gradient, preserving the exact fruit silhouette and surface. Works end-to-end through the new dispatcher path.

(Happy to share the PNGs in a follow-up comment if useful — they're local and not on a public host yet.)

Scope

In:

  • agent/image_gen_provider.py: supports_references property (defaults False) with doc for the references kwarg.
  • plugins/image_gen/openai-codex/__init__.py: opts in; adds _load_reference_images + _build_user_content; wires references through generate() with invalid_argument / invalid_reference error paths. Caps at MAX_REFERENCES = 16 (matches OpenAI's documented upstream cap).
  • tools/image_generation_tool.py: references in the agent-facing schema; dispatcher forwards only to capable providers; clear references_unsupported error for FAL and any non-capable plugin.
  • Tests: TestReferences in the openai-codex suite (7 new cases), TestReferencesDispatch in the dispatcher suite (4 new cases), schema invariant test updated.

Out (deliberately, happy to follow up in a separate PR):

  • Opting openai (API-key) plugin in to references via the Images Edit endpoint.
  • Masking / image-variation flows.
  • Remote URL references (only local file paths for now — the plugin encodes each as a data URL).

Non-breaking

  • fal, openai, xai all inherit supports_references=False — their behaviour is byte-identical to before.
  • The dispatcher rejects reference payloads for non-capable providers before they reach generate(), so no provider is silently passed kwargs it doesn't understand.
  • The image_generate tool with no references field continues to behave exactly as before.

Test plan

  • pytest tests/plugins/image_gen/ tests/tools/test_image_generation.py tests/tools/test_image_generation_env.py tests/tools/test_image_generation_plugin_dispatch.py → 133 passed.
  • Live run against the Codex backend through the standalone Responses call: text-only 22s, 2-ref composition 51s, 2-ref style-transfer 73s — all produced coherent outputs matching the references.
  • Integration check via tools.image_generation_tool._handle_image_generate (same path a real tool call takes): {"prompt": "...", "references": ["/path/a.png"]} with image_gen.provider: fal → returns references_unsupported.
  • Integration check via _handle_image_generate with image_gen.provider: openai-codex: single-reference re-lighting prompt → returns success=True, provider=openai-codex, references=1, image points to a 1.9 MB PNG that preserves the reference subject. Full round-trip 30s.

Open questions

  1. Preferred name for the kwarg — I used references since it's what OpenAI's docs call them; other options include input_images or reference_images.
  2. Whether to expose this in the tool description as "currently honoured only by openai-codex" (what I did) or stay silent and let users discover via the error. The current text is probably clearer.
  3. If you'd rather ship this as a separate image_edit tool instead of growing image_generate, I'm happy to refactor.

…codex

Adds an opt-in capability contract so providers can accept reference images.
The agent-facing `image_generate` tool schema grows an optional `references`
field (list of local file paths); the dispatcher forwards it only when the
active provider advertises `supports_references=True`, otherwise rejects
early with `references_unsupported` instead of silently dropping it.

The openai-codex provider opts in and honours references by attaching each
file as an `input_image` content item on the user message, labelled in the
prompt ("Reference image 1 is provided below.") per OpenAI's best-practice
guidance for gpt-image-2 composition. Up to MAX_REFERENCES (16) are sent,
matching the documented upstream limit.

Empirically verified against the live Codex backend: two references combine
correctly (subject + subject → merged scene), and a style-from-one-reference
+ subject-from-another composition preserves both inputs. This contradicts
the assumption baked into the existing baoyu-comic skill (which documents
`image_generate` as prompt-only) — that guidance is now accurate only when
the configured provider doesn't support references.

- agent/image_gen_provider.py: add `supports_references` property (default
  False) and document the `references` kwarg contract.
- plugins/image_gen/openai-codex/__init__.py: set `supports_references=True`,
  add `_load_reference_images` + `_build_user_content`, wire the references
  kwarg through `generate()` with invalid-argument / invalid-reference error
  handling.
- tools/image_generation_tool.py: expose `references` in the tool schema,
  forward via the dispatcher, return `references_unsupported` for FAL and
  any plugin provider that doesn't opt in.
- tests: new `TestReferences` class (7 cases) in the openai-codex tests,
  new `TestReferencesDispatch` class (4 cases) in the dispatcher tests,
  and the schema invariant test updated to match the new contract. Full
  image_gen suite stays green (133 passed).

No breaking changes: existing providers (fal, openai, xai) unaffected —
they inherit `supports_references=False` and the dispatcher shields them
from reference payloads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have tool/vision Vision analysis and image generation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants