Skip to content

fix(image-gen): forward reference images to providers#18805

Closed
Kavyrocom wants to merge 1 commit into
NousResearch:mainfrom
Kavyrocom:fix/image-generate-reference-images
Closed

fix(image-gen): forward reference images to providers#18805
Kavyrocom wants to merge 1 commit into
NousResearch:mainfrom
Kavyrocom:fix/image-generate-reference-images

Conversation

@Kavyrocom

Copy link
Copy Markdown

Summary

Fixes image_generate reference image routing for plugin-backed image providers, with concrete support for the OpenAI Codex / GPT Image 2 provider.

Before this change, the tool schema could expose image references, but plugin dispatch did not reliably pass those references into the configured image provider. In practice, providers such as OpenAI Codex / GPT Image 2 received only the text prompt and aspect ratio, so multimodal image generation/editing requests could not use attached/reference images.

This patch:

  • Adds reference_images to the image_generate tool schema.
  • Normalizes reference_images in the tool handler.
  • Forwards reference_images through plugin provider dispatch.
  • Adds OpenAI Codex image provider support for:
    • HTTP(S) reference image URLs.
    • data:image/...;base64,... image URLs.
    • local absolute image paths converted to data URLs.
  • Validates local references before reading/sending them:
    • absolute paths only;
    • supported image magic only (png, jpeg, webp, gif);
    • bounded count and file/data size;
    • clear invalid_argument errors for invalid inputs.
  • Adds regression tests covering dispatch forwarding, remote refs, local refs, and invalid local refs.

Problem

Users can configure image generation through different backends, including FAL-backed models and OpenAI/Codex-backed GPT Image 2. Text-only image generation worked, but image input/reference workflows were incomplete for the plugin path: references could be present at the tool boundary but not delivered to the provider.

That meant requests such as “generate an image using these attached logos/references” degraded into prompt-only generation, causing models to approximate or hallucinate logos instead of using the provided image inputs.

Implementation notes

Tool layer

tools/image_generation_tool.py now exposes and normalizes reference_images, then forwards them to the selected plugin provider:

provider.generate(
    prompt=prompt,
    aspect_ratio=aspect_ratio,
    reference_images=reference_images or [],
)

The base ImageGenProvider.generate() contract already accepts **kwargs, so this remains forward-compatible for providers that ignore unknown keys.

OpenAI Codex / GPT Image 2 provider

plugins/image_gen/openai-codex/__init__.py builds multimodal Responses input content:

[
    {"type": "input_text", "text": prompt},
    {"type": "input_image", "image_url": image_url},
]

Local images are read only after validation and encoded as data:image/...;base64,....

Validation

Targeted test run:

./venv/bin/python -m pytest \
  tests/plugins/image_gen/test_openai_codex_provider.py \
  tests/plugins/image_gen/test_openai_provider.py \
  tests/tools/test_image_generation_plugin_dispatch.py \
  tests/hermes_cli/test_image_gen_picker.py \
  -q -o 'addopts='

Result:

64 passed

Additional checks:

./venv/bin/python -m py_compile \
  tools/image_generation_tool.py \
  plugins/image_gen/openai-codex/__init__.py

Static scan of added lines found no hardcoded credentials, shell execution, eval/exec, pickle deserialization, or SQL formatting patterns.

Privacy / safety

This PR intentionally avoids including any local deployment paths, credentials, generated user assets, or installation-specific configuration.

Local reference image handling is constrained to reduce accidental file exfiltration:

  • local refs must be absolute paths;
  • files must pass supported image magic checks;
  • max reference count and file/data URL sizes are enforced;
  • missing paths are logged by basename only.

Expose reference_images on the image_generate tool schema and pass them through plugin dispatch so multimodal image providers can receive input images.

Add OpenAI Codex image provider support for remote, data URL, and validated local image references. Local files must be absolute paths, actual supported images, and within bounded size/count limits.

Add regression coverage for dispatch forwarding, remote references, local data URL conversion, and invalid local references.
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have tool/vision Vision analysis and image generation comp/plugins Plugin system and bundled plugins labels May 2, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #15308 — both address reference image forwarding to plugin providers. This PR appears to be the bugfix completion of that feature.

1 similar comment
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #15308 — both address reference image forwarding to plugin providers. This PR appears to be the bugfix completion of that feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants