fix(agent): skip persist_user_message override when content is a multimodal list#44247
fix(agent): skip persist_user_message override when content is a multimodal list#44247liuhao1024 wants to merge 1 commit into
Conversation
…imodal list
_apply_persist_user_message_override() unconditionally overwrote
msg["content"] with the text-only persist_user_message string. When the
user message is multimodal (list of text + image_url blocks, as in ACP
session/prompt), this clobbers the image parts before the API call is
built, causing vision models to never receive attached images.
Guard the override with an isinstance(msg.get("content"), list) check:
multimodal content is preserved as-is, text-only strings are still
overridden.
Fixes NousResearch#44242
draix
left a comment
There was a problem hiding this comment.
Reviewed the fix and both competing PRs (#44247 and #44248 — the latter is being closed as a duplicate per the collaborator note). Here's my analysis:
Fix correctness: ✅ Correct and minimal
The one-liner guard is the right approach:
if isinstance(msg.get("content"), list):
return_apply_persist_user_message_override was designed for synthetic-prefix cleanup on text turns (voice prefix in cli.py, ACP steering text). Multimodal content lists have never been an intended target of this override, so an unconditional early-return for list content is safe and correct. The blast radius is LOW — only the list-content path changes; all existing text-turn behavior is unchanged.
DB persistence: ✅ Still covered
Image redaction for the session DB ([screenshot] placeholder) already runs downstream in _flush_messages_to_session_db, so skipping the override here doesn't cause raw base64 image data to land in the session DB.
One suggestion worth considering from #44248:
The sibling PR's test (test_override_leaves_multimodal_content_intact) calls _persist_session rather than _apply_persist_user_message_override directly, and also asserts on the DB write:
first_db_write = agent._session_db.append_message.call_args_list[0].kwargs
assert first_db_write["content"] == "What color is this?\n[screenshot]"This gives you a full end-to-end regression for the actual failure scenario (the crash-resilience persist path) and ensures the DB redaction path isn't accidentally broken. Consider adding that assertion — or a similar one — to this test for belt-and-suspenders coverage.
Otherwise LGTM. The fix is correct, the existing text-turn test is untouched, and the approach matches the description in the original issue.
What does this PR do?
Fixes
_apply_persist_user_message_override()clobbering multimodal user message content (image blocks, audio blocks) with a plain text string, causing vision models to never receive attached images sent via the ACP adapter.Related Issue
Fixes #44242
Type of Change
Changes Made
run_agent.py: Addedisinstance(msg.get("content"), list)guard in_apply_persist_user_message_override()— when the user message content is a list (multimodal: text + image_url, etc.), skip the override entirely. Text-only string content is still overridden as before.tests/run_agent/test_run_agent.py: Addedtest_persist_session_preserves_multimodal_contentto verify multimodal content lists are preserved and not clobbered by the persist override.How to Test
openrouter/google/gemini-2.5-flash-lite)session/promptwith a multimodal content blockpytest tests/run_agent/test_run_agent.py::TestPersistUserMessageOverride -v— both tests passChecklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/ACode Intelligence
run_agent.py::_apply_persist_user_message_override(callers: 2 in_persist_sessionandrun_conversation)persist_user_messageas plain string;_persist_sessionandrun_conversationboth call the override