Skip to content

feat: add reference_image_path support to image_generate tool #25661

@MustBeSimo

Description

@MustBeSimo

Feature Request: Reference Image Support in image_generate Tool
Summary
The image_generate tool currently exposes only prompt and aspect_ratio. Models like GPT Image 2 (via the openai-codex backend) support reference image inputs for style transfer, subject likeness, and composition guidance, but there is no way to pass an image through the tool schema.
Current Behavior
image_generate(prompt, aspect_ratio)
The tool generates images from text only. When a user sends a reference image in the conversation and asks for a generation based on it, the agent can only describe the image in the prompt. The actual pixels are never passed to the image generation API.
Proposed Behavior
image_generate(prompt, aspect_ratio, reference_image_path?)
When reference_image_path is provided (absolute path to a local file), the backend includes it as a multimodal content part in the API call. For the openai-codex provider, this means adding an input_image part to the Codex Responses API input alongside the text prompt.
Backends that do not support reference images should silently ignore the parameter (existing behavior preserved).
Motivation
GPT 5.5 users on the openai-codex provider already have native image generation through the Codex Responses API. The missing piece is passing reference images for:

Subject likeness preservation (e.g., generating variations of a person)
Style transfer (matching a visual style from an example)
Composition guidance (using a layout reference)

This is especially relevant for users generating character-consistent images across sessions, where trait-based prompting alone produces inconsistent results.
Implementation Scope
Three files need changes:

tools/image_generation_tool.py

Add reference_image_path (optional string) to IMAGE_GENERATE_SCHEMA
Pass it through _handle_image_generate and _dispatch_to_plugin_provider

plugins/image_gen/openai-codex/init.py

Accept reference_image_path via **kwargs in generate()
In _collect_image_b64(), build multimodal input content with an input_image part when a reference is provided
Base64-encode the local file and include it as a data URL

agent/image_gen_provider.py (no change needed)

The generate() ABC already accepts **kwargs, so reference_image_path passes through without a signature change

Notes

The openai plugin (API key variant) could also benefit from this via the images.edit endpoint, but that is a separate implementation path.
FAL models that support image-to-image (e.g., img2img workflows) could also consume this parameter in future.
The parameter name reference_image_path was chosen over image or input_image to be explicit about its role (reference/guidance, not inpainting or editing).

Environment

Hermes Agent (latest main)
Provider: openai-codex
Model: gpt-5.5 (LLM) + gpt-image-2 (image gen)
Gateway: Telegram + WhatsApp

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/pluginsPlugin system and bundled pluginstool/visionVision analysis and image generationtype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions