Skip to content

fix(agent): skip persist_user_message override when content is a multimodal list#44247

Open
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/acp-image-persist-override
Open

fix(agent): skip persist_user_message override when content is a multimodal list#44247
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/acp-image-persist-override

Conversation

@liuhao1024

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes _apply_persist_user_message_override() clobbering multimodal user message content (image blocks, audio blocks) with a plain text string, causing vision models to never receive attached images sent via the ACP adapter.

Related Issue

Fixes #44242

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • run_agent.py: Added isinstance(msg.get("content"), list) guard in _apply_persist_user_message_override() — when the user message content is a list (multimodal: text + image_url, etc.), skip the override entirely. Text-only string content is still overridden as before.
  • tests/run_agent/test_run_agent.py: Added test_persist_session_preserves_multimodal_content to verify multimodal content lists are preserved and not clobbered by the persist override.

How to Test

  1. Configure a vision-capable model (e.g. openrouter/google/gemini-2.5-flash-lite)
  2. Send an image via ACP session/prompt with a multimodal content block
  3. Before fix: model replies "IMAGE_NOT_RECEIVED" — image was dropped
  4. After fix: model correctly identifies the image content
  5. Run pytest tests/run_agent/test_run_agent.py::TestPersistUserMessageOverride -v — both tests pass

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Code Intelligence

  • Analyzed: run_agent.py::_apply_persist_user_message_override (callers: 2 in _persist_session and run_conversation)
  • Blast radius: LOW — guard only adds an early-return for list content; text-only paths unchanged
  • Related patterns: ACP adapter passes persist_user_message as plain string; _persist_session and run_conversation both call the override

…imodal list

_apply_persist_user_message_override() unconditionally overwrote
msg["content"] with the text-only persist_user_message string. When the
user message is multimodal (list of text + image_url blocks, as in ACP
session/prompt), this clobbers the image parts before the API call is
built, causing vision models to never receive attached images.

Guard the override with an isinstance(msg.get("content"), list) check:
multimodal content is preserved as-is, text-only strings are still
overridden.

Fixes NousResearch#44242
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder tool/vision Vision analysis and image generation P2 Medium — degraded but workaround exists labels Jun 11, 2026

@draix draix left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the fix and both competing PRs (#44247 and #44248 — the latter is being closed as a duplicate per the collaborator note). Here's my analysis:

Fix correctness: ✅ Correct and minimal

The one-liner guard is the right approach:

if isinstance(msg.get("content"), list):
    return

_apply_persist_user_message_override was designed for synthetic-prefix cleanup on text turns (voice prefix in cli.py, ACP steering text). Multimodal content lists have never been an intended target of this override, so an unconditional early-return for list content is safe and correct. The blast radius is LOW — only the list-content path changes; all existing text-turn behavior is unchanged.

DB persistence: ✅ Still covered

Image redaction for the session DB ([screenshot] placeholder) already runs downstream in _flush_messages_to_session_db, so skipping the override here doesn't cause raw base64 image data to land in the session DB.

One suggestion worth considering from #44248:

The sibling PR's test (test_override_leaves_multimodal_content_intact) calls _persist_session rather than _apply_persist_user_message_override directly, and also asserts on the DB write:

first_db_write = agent._session_db.append_message.call_args_list[0].kwargs
assert first_db_write["content"] == "What color is this?\n[screenshot]"

This gives you a full end-to-end regression for the actual failure scenario (the crash-resilience persist path) and ensures the DB redaction path isn't accidentally broken. Consider adding that assertion — or a similar one — to this test for belt-and-suspenders coverage.

Otherwise LGTM. The fix is correct, the existing text-turn test is untouched, and the approach matches the description in the original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ACP image content blocks dropped before API call (persist_user_message override clobbers multimodal content)

3 participants