feat(image_gen): add reference_images for image-to-image (openai-codex)#21463
Open
cypres0099 wants to merge 3 commits into
Open
feat(image_gen): add reference_images for image-to-image (openai-codex)#21463cypres0099 wants to merge 3 commits into
cypres0099 wants to merge 3 commits into
Conversation
The Codex Responses ``image_generation`` tool already supports reference-image conditioning when the user message includes ``input_image`` content blocks alongside the text prompt — the same multimodal path Claude Code uses against the Codex backend. The ``openai-codex`` provider, however, only ever sent ``input_text``, so every call was effectively text-to-image regardless of any attached source image. Generated outputs drift from the source even when the user explicitly asks for an "edit" or "modification" because the source bytes never reach the model. This adds first-class reference-image support without changing defaults. The tool schema gains an optional ``reference_images`` array (file paths, ``data:`` URLs, or ``http(s)://`` URLs); the dispatcher forwards it as a kwarg only when non-empty so providers that have not opted in see exactly the same call shape as before. The ``openai-codex`` plugin now appends each entry as an ``input_image`` block. A shared ``agent.image_gen_provider.normalize_reference_image`` helper handles the path → b64 data-URL conversion and mime inference so other providers (openai REST ``images.edits``, xai, future fal img2img) can reuse the same resolution rules when they grow reference-image support. Smoke-tested against the live Codex Responses backend with a real photo + "add a dinosaur" prompt: the source composition (subject, clothing, lighting, background flora) is preserved and the requested element is added in-frame — i.e. true image-to-image, not text-shaped drift. Tests: - ``tests/agent/test_image_gen_provider_helpers.py`` — covers ``normalize_reference_image`` (file path, data URL, http URL, missing file, mime-by-suffix table, unsupported types). - ``tests/plugins/image_gen/test_openai_codex_provider.py`` — ``TestReferenceImages`` asserts the patched ``_collect_image_b64`` builds a content array with one ``input_image`` block per entry and skips unrecognised entries instead of failing the call. - ``tests/tools/test_image_generation_plugin_dispatch.py`` — two new tests assert the dispatcher forwards ``reference_images`` when present and *omits* the kwarg entirely when the list is empty (preserves the pre-PR call shape for plain text-to-image).
Collaborator
3 tasks
* Type-annotate ``kwargs`` in ``_dispatch_to_plugin_provider`` as
``Dict[str, Any]`` so ty no longer flags the
``kwargs[\"reference_images\"] = list`` assignment as
``invalid-assignment`` — without the explicit annotation ty narrows
the dict type from the ``{prompt, aspect_ratio}`` literal to
``dict[str, str]``.
* Add the commit-author email to ``scripts/release.py`` AUTHOR_MAP so
the contributor-attribution check resolves the noreply email
(``cypres0099@users.noreply.github.com`` lacks the ``+id`` form, so
it doesn't fall under the script's auto-resolve path for plain
GitHub noreply emails).
The existing TestRegistryIntegration test asserted the schema's
properties were exactly {prompt, aspect_ratio} as a guard against
silently exposing user-level config knobs to the agent. Adding the
reference_images arg in the prior commit broke that assertion; this
updates it (and renames it from the now-misleading
test_schema_exposes_only_prompt_and_aspect_ratio_to_agent) to assert
the new exact set, and adds a guard that 'prompt' remains the only
required arg so plain text-to-image still works with no extra args.
12 tasks
This was referenced May 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Codex Responses
image_generationtool already supports reference-image conditioning when the user message includesinput_imagecontent blocks alongside the text prompt — the same multimodal path Claude Code and other Codex-OAuth clients use. The bundledopenai-codexprovider, however, only ever sentinput_text, so every call was effectively text-to-image regardless of any source image the user wanted to edit. Outputs drift from the source even when the user explicitly asks for an "edit" because the source bytes never reach the model.This adds first-class reference-image support without changing defaults.
tools/image_generation_tool.py) gains an optionalreference_images: array[string]. Entries may be absolute file paths,data:image/...;base64,...URLs, orhttp(s)://URLs. Description tells the agent it's only honored by providers that opt in (currentlyopenai-codex); others will absorb the kwarg and continue text-only.openai-codexplugin appends each entry as aninput_imageblock in the Responsescontentarray. Unrecognised entries (missing files, unsupported types) are skipped with a warning rather than failing the whole call — partly-bad input still produces a usable request.agent.image_gen_provider.normalize_reference_imagedoes the path → b64 data-URL conversion and mime inference (.png/.jpg/.jpeg/.webp/.gif → correct mime, unknown → image/png). Future patches foropenaiREST/images/edits, xai, and a fal img2img port can reuse the same resolution rules.Smoke test
Tested live against the Codex Responses backend with a real photo + "add a dinosaur" prompt. The source composition (subject, clothing, lighting, background flora) is preserved and the requested element is added in-frame — true image-to-image, not text-shaped drift. Same image submitted with the unpatched provider produces a fresh-from-scratch generation with no source fidelity.
Tests
tests/agent/test_image_gen_provider_helpers.py(new)normalize_reference_image: file path,data:URL,http(s)://URL, missing file (returns None, no raise), mime-by-suffix table, unsupported types,~expansion.tests/plugins/image_gen/test_openai_codex_provider.pyTestReferenceImagesclass: text-only path stays single-block; data URL appended verbatim; file path encoded to data URL; mixed list (path + URL + path) preserves order; unrecognised entries silently dropped without failing the call.tests/tools/test_image_generation_plugin_dispatch.pyreference_imagesto the provider when present, and omits the kwarg entirely when the list is empty (preserves pre-PR call shape).pytest tests/agent/test_image_gen_provider_helpers.py tests/plugins/image_gen/test_openai_codex_provider.py tests/tools/test_image_generation_plugin_dispatch.py→ 47 passed. Wider sweep including alltests/plugins/image_gen/andtests/hermes_cli/test_image_gen_picker.py→ 117 passed.Backward compatibility
**kwargs(already the convention from the base class).Test plan
tests/agent/test_image_gen_provider_helpers.pycovers the helpertests/plugins/image_gen/test_openai_codex_provider.pycovers the plugin pathtests/tools/test_image_generation_plugin_dispatch.pycovers the dispatcheropenaiREST plugin to use/images/editswhenreference_imagesis setxaiand a futurefalimg2img port🤖 Generated with Claude Code