feat(image_gen): multi-reference input support, starting with openai-codex by dejay2 · Pull Request #15308 · NousResearch/hermes-agent

dejay2 · 2026-04-24T19:11:44Z

Summary

Adds an opt-in capability contract so image_gen providers can accept reference images, and wires the openai-codex plugin up as the first provider to opt in. The image_generate tool gains an optional references field (array of local file paths); the dispatcher forwards it only when the active provider advertises supports_references=True, otherwise rejects early with references_unsupported instead of silently dropping it.

Motivation — the Codex `image_generation` tool accepts `input_image` after all

The existing baoyu-comic skill documents image_generate as "prompt-only — it does NOT accept reference images", and #14317 explicitly scoped the openai-codex plugin to the generate endpoint. This PR's finding contradicts both: passing one or more input_image content items on the user message of the Codex responses.stream(...) call works, and the image_generation tool uses the attached images for composition, style transfer, and edits.

Empirically verified against the live Codex backend with three tests:

Two-subject composition — red apple (reference 1) + blue ceramic teapot (reference 2) → single still-life on a shared wooden table with consistent lighting. Both subjects reproduced faithfully (same shape, same surface, same highlights).
Cross-reference style transfer — a minimalist line-drawn mountain (reference 1) + a photographic teapot (reference 2), prompt says "reproduce reference 1 exactly, place reference 2 on a shelf, keep reference 1's line style" → teapot rendered in reference 1's ink style, mountain preserved. Style transfer and subject extraction both work.
Single-reference re-lighting — apple on white (reference 1) → apple on grey gradient, preserving the exact fruit silhouette and surface. Works end-to-end through the new dispatcher path.

(Happy to share the PNGs in a follow-up comment if useful — they're local and not on a public host yet.)

Scope

In:

agent/image_gen_provider.py: supports_references property (defaults False) with doc for the references kwarg.
plugins/image_gen/openai-codex/__init__.py: opts in; adds _load_reference_images + _build_user_content; wires references through generate() with invalid_argument / invalid_reference error paths. Caps at MAX_REFERENCES = 16 (matches OpenAI's documented upstream cap).
tools/image_generation_tool.py: references in the agent-facing schema; dispatcher forwards only to capable providers; clear references_unsupported error for FAL and any non-capable plugin.
Tests: TestReferences in the openai-codex suite (7 new cases), TestReferencesDispatch in the dispatcher suite (4 new cases), schema invariant test updated.

Out (deliberately, happy to follow up in a separate PR):

Opting openai (API-key) plugin in to references via the Images Edit endpoint.
Masking / image-variation flows.
Remote URL references (only local file paths for now — the plugin encodes each as a data URL).

Non-breaking

fal, openai, xai all inherit supports_references=False — their behaviour is byte-identical to before.
The dispatcher rejects reference payloads for non-capable providers before they reach generate(), so no provider is silently passed kwargs it doesn't understand.
The image_generate tool with no references field continues to behave exactly as before.

Test plan

pytest tests/plugins/image_gen/ tests/tools/test_image_generation.py tests/tools/test_image_generation_env.py tests/tools/test_image_generation_plugin_dispatch.py → 133 passed.
Live run against the Codex backend through the standalone Responses call: text-only 22s, 2-ref composition 51s, 2-ref style-transfer 73s — all produced coherent outputs matching the references.
Integration check via tools.image_generation_tool._handle_image_generate (same path a real tool call takes): {"prompt": "...", "references": ["/path/a.png"]} with image_gen.provider: fal → returns references_unsupported.
Integration check via _handle_image_generate with image_gen.provider: openai-codex: single-reference re-lighting prompt → returns success=True, provider=openai-codex, references=1, image points to a 1.9 MB PNG that preserves the reference subject. Full round-trip 30s.

Open questions

Preferred name for the kwarg — I used references since it's what OpenAI's docs call them; other options include input_images or reference_images.
Whether to expose this in the tool description as "currently honoured only by openai-codex" (what I did) or stay silent and let users discover via the error. The current text is probably clearer.
If you'd rather ship this as a separate image_edit tool instead of growing image_generate, I'm happy to refactor.

…codex Adds an opt-in capability contract so providers can accept reference images. The agent-facing `image_generate` tool schema grows an optional `references` field (list of local file paths); the dispatcher forwards it only when the active provider advertises `supports_references=True`, otherwise rejects early with `references_unsupported` instead of silently dropping it. The openai-codex provider opts in and honours references by attaching each file as an `input_image` content item on the user message, labelled in the prompt ("Reference image 1 is provided below.") per OpenAI's best-practice guidance for gpt-image-2 composition. Up to MAX_REFERENCES (16) are sent, matching the documented upstream limit. Empirically verified against the live Codex backend: two references combine correctly (subject + subject → merged scene), and a style-from-one-reference + subject-from-another composition preserves both inputs. This contradicts the assumption baked into the existing baoyu-comic skill (which documents `image_generate` as prompt-only) — that guidance is now accurate only when the configured provider doesn't support references. - agent/image_gen_provider.py: add `supports_references` property (default False) and document the `references` kwarg contract. - plugins/image_gen/openai-codex/__init__.py: set `supports_references=True`, add `_load_reference_images` + `_build_user_content`, wire the references kwarg through `generate()` with invalid-argument / invalid-reference error handling. - tools/image_generation_tool.py: expose `references` in the tool schema, forward via the dispatcher, return `references_unsupported` for FAL and any plugin provider that doesn't opt in. - tests: new `TestReferences` class (7 cases) in the openai-codex tests, new `TestReferencesDispatch` class (4 cases) in the dispatcher tests, and the schema invariant test updated to match the new contract. Full image_gen suite stays green (133 passed). No breaking changes: existing providers (fal, openai, xai) unaffected — they inherit `supports_references=False` and the dispatcher shields them from reference payloads.

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins tool/vision Vision analysis and image generation labels Apr 24, 2026

This was referenced May 2, 2026

fix(image-gen): forward reference images to providers #18805

Closed

feat(image_gen): add reference_images for image-to-image (openai-codex) #21463

Open

fix: forward reference images through Codex and native FAL image generation #21570

Open

jplew mentioned this pull request May 8, 2026

[Bug]: native image generation (image_generate) doesn't accept images as input #21562

Open

This was referenced May 11, 2026

feat(image-gen): support reference images in image_generate #21854

Open

feat: add reference_image_path support to image_generate tool #25661

Open

feat: add reference_image_path support to image_generate tool #25677

Closed

This was referenced May 28, 2026

feat(image-gen): support Codex reference-image edits on raw SSE path #33644

Open

fix: pass reference images to Codex image generation #33969

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(image_gen): multi-reference input support, starting with openai-codex#15308

feat(image_gen): multi-reference input support, starting with openai-codex#15308
dejay2 wants to merge 1 commit into
NousResearch:mainfrom
dejay2:feat/openai-codex-multi-reference-images

dejay2 commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dejay2 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation — the Codex image_generation tool accepts input_image after all

Scope

Non-breaking

Test plan

Open questions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dejay2 commented Apr 24, 2026 •

edited

Loading

Motivation — the Codex `image_generation` tool accepts `input_image` after all