Skip to content

ACP image content blocks dropped before API call (persist_user_message override clobbers multimodal content) #44242

@januscx

Description

@januscx

Summary

Image content blocks sent via the ACP adapter (session/prompt with an image content block) never reach the model. The model behaves as if no image was attached. This happens for every provider/model, independent of promptCapabilities.image and _model_supports_vision — the image is dropped before the request payload is built.

Reproduced on v0.16.0 (upstream 3edd09a).

Root cause

AIAgent._apply_persist_user_message_override (run_agent.py) rewrites the current-turn user message in place:

def _apply_persist_user_message_override(self, messages):
    idx = getattr(self, "_persist_user_message_idx", None)
    override = getattr(self, "_persist_user_message_override", None)
    if override is None or idx is None:
        return
    if 0 <= idx < len(messages):
        msg = messages[idx]
        if isinstance(msg, dict) and msg.get("role") == "user":
            msg["content"] = override   # <-- clobbers multimodal content

The ACP adapter passes a text-only persist_user_message for multimodal prompts (acp_adapter/server.py):

result = agent.run_conversation(
    user_message=user_content,                         # list: [{type:text}, {type:image_url}]
    ...
    persist_user_message=user_text or "[Image attachment]",   # plain string
)

build_turn_context runs a crash-resilience persist of the inbound user turn before the first API call, which calls _apply_persist_user_message_override(messages) on the same messages list the conversation loop later reads to build api_messages. So the multimodal content list ([{type:"text"}, {type:"image_url"}]) is overwritten with the plain persist_user_message string before the request is assembled. The image_url part is gone end-to-end.

Because the strip happens this early, _model_supports_vision() / _prepare_messages_for_non_vision_model() are never even reached (the message no longer has image parts), and a fully vision-capable model still sees text only.

Trace evidence

Instrumenting the ACP prompt handler and _prepare_messages_for_non_vision_model:

  • At acp_adapter/server.py after _content_blocks_to_openai_user_content(prompt): user_content_shape=['text', 'image_url']
  • At _prepare_messages_for_non_vision_model entry (just before the API call): user message content = str(len=135) ❌ — exactly the text-only persist_user_message, image dropped.

A direct provider call with the same image_url data URL (bypassing Hermes) works fine on a vision model, confirming the loss is internal to Hermes.

Minimal repro

Drive hermes acp over stdio (line-delimited JSON-RPC): initializesession/newsession/prompt with:

{
  "sessionId": "<id>",
  "prompt": [
    { "type": "text", "text": "Reply with ONLY the dominant color word of the image. If you received no image, reply IMAGE_NOT_RECEIVED." },
    { "type": "image", "data": "<base64 of a solid-color PNG>", "mimeType": "image/png" }
  ]
}

With any vision-capable model configured (e.g. openrouter/google/gemini-2.5-flash-lite), the model replies IMAGE_NOT_RECEIVED instead of the color.

Fix

Don't clobber multimodal list content — the synthetic-prefix cleanup the override exists for only applies to text turns:

        if 0 <= idx < len(messages):
            msg = messages[idx]
            if isinstance(msg, dict) and msg.get("role") == "user":
                if isinstance(msg.get("content"), list):
                    return
                msg["content"] = override

With this, the same repro returns the correct color. Text-only turns are unaffected.

Suggested cleaner fix

The override mutating the shared messages list (used for both the API call and persistence) is the underlying smell. A more robust fix would apply the clean/redacted persist_user_message only to a copy used for transcript/DB persistence, leaving the API-bound messages untouched — and for multimodal turns, persist a redacted form (e.g. text + [image] placeholder) so base64 blobs don't bloat the session DB while the image still reaches the model.

Environment

  • Hermes Agent v0.16.0 (2026.6.5), upstream 3edd09a
  • ACP adapter path (hermes acp over stdio)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/acpAgent Communication Protocol adaptercomp/agentCore agent loop, run_agent.py, prompt buildertool/visionVision analysis and image generationtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions