feat(image_gen): add reference_images for image-to-image (openai-codex) by cypres0099 · Pull Request #21463 · NousResearch/hermes-agent

cypres0099 · 2026-05-07T18:36:46Z

Summary

The Codex Responses image_generation tool already supports reference-image conditioning when the user message includes input_image content blocks alongside the text prompt — the same multimodal path Claude Code and other Codex-OAuth clients use. The bundled openai-codex provider, however, only ever sent input_text, so every call was effectively text-to-image regardless of any source image the user wanted to edit. Outputs drift from the source even when the user explicitly asks for an "edit" because the source bytes never reach the model.

This adds first-class reference-image support without changing defaults.

Tool schema (tools/image_generation_tool.py) gains an optional reference_images: array[string]. Entries may be absolute file paths, data:image/...;base64,... URLs, or http(s):// URLs. Description tells the agent it's only honored by providers that opt in (currently openai-codex); others will absorb the kwarg and continue text-only.
Dispatcher forwards the kwarg only when the list is non-empty, so providers that haven't opted in see exactly the same call shape as before this PR (no behavior change for FAL, openai-REST, xai).
openai-codex plugin appends each entry as an input_image block in the Responses content array. Unrecognised entries (missing files, unsupported types) are skipped with a warning rather than failing the whole call — partly-bad input still produces a usable request.
Shared helper agent.image_gen_provider.normalize_reference_image does the path → b64 data-URL conversion and mime inference (.png/.jpg/.jpeg/.webp/.gif → correct mime, unknown → image/png). Future patches for openai REST /images/edits, xai, and a fal img2img port can reuse the same resolution rules.

Smoke test

Tested live against the Codex Responses backend with a real photo + "add a dinosaur" prompt. The source composition (subject, clothing, lighting, background flora) is preserved and the requested element is added in-frame — true image-to-image, not text-shaped drift. Same image submitted with the unpatched provider produces a fresh-from-scratch generation with no source fidelity.

Tests

File	What it covers
`tests/agent/test_image_gen_provider_helpers.py` (new)	`normalize_reference_image`: file path, `data:` URL, `http(s)://` URL, missing file (returns None, no raise), mime-by-suffix table, unsupported types, `~` expansion.
`tests/plugins/image_gen/test_openai_codex_provider.py`	New `TestReferenceImages` class: text-only path stays single-block; data URL appended verbatim; file path encoded to data URL; mixed list (path + URL + path) preserves order; unrecognised entries silently dropped without failing the call.
`tests/tools/test_image_generation_plugin_dispatch.py`	Two new tests: dispatcher forwards `reference_images` to the provider when present, and omits the kwarg entirely when the list is empty (preserves pre-PR call shape).

pytest tests/agent/test_image_gen_provider_helpers.py tests/plugins/image_gen/test_openai_codex_provider.py tests/tools/test_image_generation_plugin_dispatch.py → 47 passed. Wider sweep including all tests/plugins/image_gen/ and tests/hermes_cli/test_image_gen_picker.py → 117 passed.

Backward compatibility

Schema extension is additive and optional. Models that don't use the new field behave identically.
Dispatcher only forwards the kwarg when non-empty, so the call signature into FAL, openai REST, and xai is byte-identical to before.
Other providers absorb the kwarg via their existing **kwargs (already the convention from the base class).
Existing tests pass without modification.

Test plan

tests/agent/test_image_gen_provider_helpers.py covers the helper
tests/plugins/image_gen/test_openai_codex_provider.py covers the plugin path
tests/tools/test_image_generation_plugin_dispatch.py covers the dispatcher
Smoke-tested live against the Codex Responses backend
Optional follow-up PR: extend openai REST plugin to use /images/edits when reference_images is set
Optional follow-up PR: same for xai and a future fal img2img port

🤖 Generated with Claude Code

The Codex Responses ``image_generation`` tool already supports reference-image conditioning when the user message includes ``input_image`` content blocks alongside the text prompt — the same multimodal path Claude Code uses against the Codex backend. The ``openai-codex`` provider, however, only ever sent ``input_text``, so every call was effectively text-to-image regardless of any attached source image. Generated outputs drift from the source even when the user explicitly asks for an "edit" or "modification" because the source bytes never reach the model. This adds first-class reference-image support without changing defaults. The tool schema gains an optional ``reference_images`` array (file paths, ``data:`` URLs, or ``http(s)://`` URLs); the dispatcher forwards it as a kwarg only when non-empty so providers that have not opted in see exactly the same call shape as before. The ``openai-codex`` plugin now appends each entry as an ``input_image`` block. A shared ``agent.image_gen_provider.normalize_reference_image`` helper handles the path → b64 data-URL conversion and mime inference so other providers (openai REST ``images.edits``, xai, future fal img2img) can reuse the same resolution rules when they grow reference-image support. Smoke-tested against the live Codex Responses backend with a real photo + "add a dinosaur" prompt: the source composition (subject, clothing, lighting, background flora) is preserved and the requested element is added in-frame — i.e. true image-to-image, not text-shaped drift. Tests: - ``tests/agent/test_image_gen_provider_helpers.py`` — covers ``normalize_reference_image`` (file path, data URL, http URL, missing file, mime-by-suffix table, unsupported types). - ``tests/plugins/image_gen/test_openai_codex_provider.py`` — ``TestReferenceImages`` asserts the patched ``_collect_image_b64`` builds a content array with one ``input_image`` block per entry and skips unrecognised entries instead of failing the call. - ``tests/tools/test_image_generation_plugin_dispatch.py`` — two new tests assert the dispatcher forwards ``reference_images`` when present and *omits* the kwarg entirely when the list is empty (preserves the pre-PR call shape for plain text-to-image).

alt-glitch · 2026-05-07T18:43:47Z

Duplicate of #18805 — same feature (reference_images forwarding for openai-codex provider). Also overlaps with #15308 (multi-reference input support).

* Type-annotate ``kwargs`` in ``_dispatch_to_plugin_provider`` as ``Dict[str, Any]`` so ty no longer flags the ``kwargs[\"reference_images\"] = list`` assignment as ``invalid-assignment`` — without the explicit annotation ty narrows the dict type from the ``{prompt, aspect_ratio}`` literal to ``dict[str, str]``. * Add the commit-author email to ``scripts/release.py`` AUTHOR_MAP so the contributor-attribution check resolves the noreply email (``cypres0099@users.noreply.github.com`` lacks the ``+id`` form, so it doesn't fall under the script's auto-resolve path for plain GitHub noreply emails).

The existing TestRegistryIntegration test asserted the schema's properties were exactly {prompt, aspect_ratio} as a guard against silently exposing user-level config knobs to the agent. Adding the reference_images arg in the prior commit broke that assertion; this updates it (and renames it from the now-misleading test_schema_exposes_only_prompt_and_aspect_ratio_to_agent) to assert the new exact set, and adds a guard that 'prompt' remains the only required arg so plain text-to-image still works with no extra args.

alt-glitch added type/feature New feature or request comp/plugins Plugin system and bundled plugins tool/vision Vision analysis and image generation P3 Low — cosmetic, nice to have labels May 7, 2026

cypres0099 mentioned this pull request May 7, 2026

feat(image_gen): auto-attach session image attachments to image_generate #21472

Open

3 tasks

cypres0099 added 2 commits May 7, 2026 14:04

alt-glitch mentioned this pull request May 8, 2026

fix: forward reference images through Codex and native FAL image generation #21570

Open

12 tasks

jplew mentioned this pull request May 8, 2026

[Bug]: native image generation (image_generate) doesn't accept images as input #21562

Open

alt-glitch mentioned this pull request May 14, 2026

feat: add reference_image_path support to image_generate tool #25677

Closed

This was referenced May 25, 2026

feat: image_edit tool — prompt-guided img2img editing with OpenAI Codex #32248

Open

feat(image-gen): support Codex reference-image edits on raw SSE path #33644

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(image_gen): add reference_images for image-to-image (openai-codex)#21463

feat(image_gen): add reference_images for image-to-image (openai-codex)#21463
cypres0099 wants to merge 3 commits into
NousResearch:mainfrom
cypres0099:feat/image-gen-reference-images

cypres0099 commented May 7, 2026

Uh oh!

alt-glitch commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cypres0099 commented May 7, 2026

Summary

Smoke test

Tests

Backward compatibility

Test plan

Uh oh!

alt-glitch commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants