Skip to content

feat(image_gen): add reference_images for image-to-image (openai-codex)#21463

Open
cypres0099 wants to merge 3 commits into
NousResearch:mainfrom
cypres0099:feat/image-gen-reference-images
Open

feat(image_gen): add reference_images for image-to-image (openai-codex)#21463
cypres0099 wants to merge 3 commits into
NousResearch:mainfrom
cypres0099:feat/image-gen-reference-images

Conversation

@cypres0099

Copy link
Copy Markdown
Contributor

Summary

The Codex Responses image_generation tool already supports reference-image conditioning when the user message includes input_image content blocks alongside the text prompt — the same multimodal path Claude Code and other Codex-OAuth clients use. The bundled openai-codex provider, however, only ever sent input_text, so every call was effectively text-to-image regardless of any source image the user wanted to edit. Outputs drift from the source even when the user explicitly asks for an "edit" because the source bytes never reach the model.

This adds first-class reference-image support without changing defaults.

  • Tool schema (tools/image_generation_tool.py) gains an optional reference_images: array[string]. Entries may be absolute file paths, data:image/...;base64,... URLs, or http(s):// URLs. Description tells the agent it's only honored by providers that opt in (currently openai-codex); others will absorb the kwarg and continue text-only.
  • Dispatcher forwards the kwarg only when the list is non-empty, so providers that haven't opted in see exactly the same call shape as before this PR (no behavior change for FAL, openai-REST, xai).
  • openai-codex plugin appends each entry as an input_image block in the Responses content array. Unrecognised entries (missing files, unsupported types) are skipped with a warning rather than failing the whole call — partly-bad input still produces a usable request.
  • Shared helper agent.image_gen_provider.normalize_reference_image does the path → b64 data-URL conversion and mime inference (.png/.jpg/.jpeg/.webp/.gif → correct mime, unknown → image/png). Future patches for openai REST /images/edits, xai, and a fal img2img port can reuse the same resolution rules.

Smoke test

Tested live against the Codex Responses backend with a real photo + "add a dinosaur" prompt. The source composition (subject, clothing, lighting, background flora) is preserved and the requested element is added in-frame — true image-to-image, not text-shaped drift. Same image submitted with the unpatched provider produces a fresh-from-scratch generation with no source fidelity.

Tests

File What it covers
tests/agent/test_image_gen_provider_helpers.py (new) normalize_reference_image: file path, data: URL, http(s):// URL, missing file (returns None, no raise), mime-by-suffix table, unsupported types, ~ expansion.
tests/plugins/image_gen/test_openai_codex_provider.py New TestReferenceImages class: text-only path stays single-block; data URL appended verbatim; file path encoded to data URL; mixed list (path + URL + path) preserves order; unrecognised entries silently dropped without failing the call.
tests/tools/test_image_generation_plugin_dispatch.py Two new tests: dispatcher forwards reference_images to the provider when present, and omits the kwarg entirely when the list is empty (preserves pre-PR call shape).

pytest tests/agent/test_image_gen_provider_helpers.py tests/plugins/image_gen/test_openai_codex_provider.py tests/tools/test_image_generation_plugin_dispatch.py → 47 passed. Wider sweep including all tests/plugins/image_gen/ and tests/hermes_cli/test_image_gen_picker.py → 117 passed.

Backward compatibility

  • Schema extension is additive and optional. Models that don't use the new field behave identically.
  • Dispatcher only forwards the kwarg when non-empty, so the call signature into FAL, openai REST, and xai is byte-identical to before.
  • Other providers absorb the kwarg via their existing **kwargs (already the convention from the base class).
  • Existing tests pass without modification.

Test plan

  • tests/agent/test_image_gen_provider_helpers.py covers the helper
  • tests/plugins/image_gen/test_openai_codex_provider.py covers the plugin path
  • tests/tools/test_image_generation_plugin_dispatch.py covers the dispatcher
  • Smoke-tested live against the Codex Responses backend
  • Optional follow-up PR: extend openai REST plugin to use /images/edits when reference_images is set
  • Optional follow-up PR: same for xai and a future fal img2img port

🤖 Generated with Claude Code

The Codex Responses ``image_generation`` tool already supports
reference-image conditioning when the user message includes
``input_image`` content blocks alongside the text prompt — the same
multimodal path Claude Code uses against the Codex backend. The
``openai-codex`` provider, however, only ever sent ``input_text``,
so every call was effectively text-to-image regardless of any
attached source image. Generated outputs drift from the source even
when the user explicitly asks for an "edit" or "modification"
because the source bytes never reach the model.

This adds first-class reference-image support without changing
defaults. The tool schema gains an optional ``reference_images``
array (file paths, ``data:`` URLs, or ``http(s)://`` URLs); the
dispatcher forwards it as a kwarg only when non-empty so providers
that have not opted in see exactly the same call shape as before.
The ``openai-codex`` plugin now appends each entry as an
``input_image`` block. A shared
``agent.image_gen_provider.normalize_reference_image`` helper
handles the path → b64 data-URL conversion and mime inference so
other providers (openai REST ``images.edits``, xai, future fal
img2img) can reuse the same resolution rules when they grow
reference-image support.

Smoke-tested against the live Codex Responses backend with a real
photo + "add a dinosaur" prompt: the source composition (subject,
clothing, lighting, background flora) is preserved and the
requested element is added in-frame — i.e. true image-to-image,
not text-shaped drift.

Tests:
- ``tests/agent/test_image_gen_provider_helpers.py`` — covers
  ``normalize_reference_image`` (file path, data URL, http URL,
  missing file, mime-by-suffix table, unsupported types).
- ``tests/plugins/image_gen/test_openai_codex_provider.py`` —
  ``TestReferenceImages`` asserts the patched ``_collect_image_b64``
  builds a content array with one ``input_image`` block per entry
  and skips unrecognised entries instead of failing the call.
- ``tests/tools/test_image_generation_plugin_dispatch.py`` — two
  new tests assert the dispatcher forwards ``reference_images``
  when present and *omits* the kwarg entirely when the list is
  empty (preserves the pre-PR call shape for plain text-to-image).
@alt-glitch alt-glitch added type/feature New feature or request comp/plugins Plugin system and bundled plugins tool/vision Vision analysis and image generation P3 Low — cosmetic, nice to have labels May 7, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #18805 — same feature (reference_images forwarding for openai-codex provider). Also overlaps with #15308 (multi-reference input support).

cypres0099 added 2 commits May 7, 2026 14:04
* Type-annotate ``kwargs`` in ``_dispatch_to_plugin_provider`` as
  ``Dict[str, Any]`` so ty no longer flags the
  ``kwargs[\"reference_images\"] = list`` assignment as
  ``invalid-assignment`` — without the explicit annotation ty narrows
  the dict type from the ``{prompt, aspect_ratio}`` literal to
  ``dict[str, str]``.
* Add the commit-author email to ``scripts/release.py`` AUTHOR_MAP so
  the contributor-attribution check resolves the noreply email
  (``cypres0099@users.noreply.github.com`` lacks the ``+id`` form, so
  it doesn't fall under the script's auto-resolve path for plain
  GitHub noreply emails).
The existing TestRegistryIntegration test asserted the schema's
properties were exactly {prompt, aspect_ratio} as a guard against
silently exposing user-level config knobs to the agent. Adding the
reference_images arg in the prior commit broke that assertion; this
updates it (and renames it from the now-misleading
test_schema_exposes_only_prompt_and_aspect_ratio_to_agent) to assert
the new exact set, and adds a guard that 'prompt' remains the only
required arg so plain text-to-image still works with no extra args.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have tool/vision Vision analysis and image generation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants