fix(sessions): stop tool-loaded images from being dropped before the next LLM call by Aaronontheweb · Pull Request #1265 · netclaw-dev/netclaw

Aaronontheweb · 2026-06-01T02:09:02Z

Summary

Multimodal file_read shipped in v0.22.0, but on the main streaming session path a tool-loaded image silently never reached the model — the agent was told "Image loaded for model-visible inspection" and then confabulated the contents (e.g. shown the akka.net logo, it described an invented "NetClaw dashboard wireframe").

Fixes #1264.

Root cause

The streaming tool-completion path hands its mutable accumulator to the media nudge and immediately clears it:

// LlmSessionActor.ApplyToolCallRecorded (same shape in CompleteToolBatch)
AddModelInputMediaNudge(_pendingModelInputMediaReferences); // stores THIS list instance
_pendingModelInputMediaReferences.Clear();                  // ...then empties it

SessionState.BuildNudgeMessage / AddUserMessage stored that list by reference into SerializableChatMessage.MediaReferences (an init property with no defensive copy). The .Clear() then emptied the nudge's media references before FireLlmCall → SessionMessageAssembler.Assemble → ChatMessageConverter.ToAiMessage hydrated them into DataContent. Net effect: text-only message, no image, hallucination.

Why it was sneaky:

The tool genuinely succeeded (materialization copied the file to session media), so the existing handoff-failure warning never fired.
Sub-agents were unaffected because they hydrate the image to DataContent immediately at nudge-creation; the main session defers hydration to assembly time (for prefix-cache stability), leaving a window for the Clear() to corrupt the aliased list.
The non-streaming completion path passes a fresh list and is unaffected — but the CLI/headless and normal chat use streaming, hence the "always" reproduction.

Fix

Defensively snapshot the caller's list ([.. mediaReferences]) at the two SessionState message constructors, so the immutable persistence message owns its own copy regardless of caller behavior. Two-line production change, with comments documenting the aliasing/clear hazard.

Verification

Unit regression tests (SessionStateTests): build a List, hand it to AddSystemNudge/AddUserMessage, .Clear() it, assert the message still carries the media reference. Fail without the snapshot.
Actor-level integration test (LlmSessionImageDeliveryTests): drives a streaming file_read image load through LlmSessionActor and asserts image DataContent reaches the chat client on the next LLM call. Confirmed it fails without the fix (exact message: "Tool-loaded image never reached the model…") and passes with it — this was the missing coverage that let the bug ship.
End-to-end: ran a patched daemon against a real vision model — the agent now correctly describes the image, and the trace prompt dump shows DataContent mediaType=image/png on the wire.
Full Netclaw.Actors.Tests suite: 2175 passing. dotnet slopwatch analyze: 0 issues. Copyright headers verified.

Files

src/Netclaw.Actors/Sessions/SessionState.cs — the fix
src/Netclaw.Actors.Tests/Sessions/SessionStateTests.cs — unit regression tests
src/Netclaw.Actors.Tests/Sessions/LlmSessionImageDeliveryTests.cs — new integration test

Follow-up

Audio/video model input is out of scope for this fix (images only today) and is tracked separately in #1266.

…next LLM call The streaming tool-completion path (LlmSessionActor.ApplyToolCallRecorded / CompleteToolBatch) handed its mutable _pendingModelInputMediaReferences accumulator to the model-input media nudge and then Clear()ed that same list instance. SerializableChatMessage stored the reference without copying, so the Clear() emptied the nudge's media references before the next LLM call hydrated them. The model was told "Image loaded for model-visible inspection" but never received the image bytes, and confabulated the contents. Fix: defensively snapshot the caller's media list ([.. mediaReferences]) in SessionState.BuildNudgeMessage and AddUserMessage so the immutable message owns its own copy regardless of caller behavior. Tests: - unit regression tests (caller reuses/clears the list) in SessionStateTests - an actor-level integration test that drives a streaming file_read image load and asserts image DataContent reaches the chat client on the next LLM call; it fails without the snapshot Verified end-to-end against a vision model: with the fix the image bytes reach the model and it describes the image correctly instead of hallucinating. Fixes netclaw-dev#1264

Aaronontheweb added bug Something isn't working context-pipeline LLM context assembly: prompt layers, dynamic injection, memory recall, temporal grounding sessions LLM session actor, turn lifecycle, pipelines labels Jun 1, 2026

Aaronontheweb mentioned this pull request Jun 1, 2026

Add audio and video as native model input to file_read (currently images only) #1266

Open

Aaronontheweb merged commit b7563be into netclaw-dev:dev Jun 1, 2026
14 checks passed

Aaronontheweb deleted the claude-wt-file_attach_images_toolhints branch June 1, 2026 02:27

Aaronontheweb mentioned this pull request Jun 1, 2026

Prepare release v0.22.1 #1273

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sessions): stop tool-loaded images from being dropped before the next LLM call#1265

fix(sessions): stop tool-loaded images from being dropped before the next LLM call#1265
Aaronontheweb merged 1 commit into
netclaw-dev:devfrom
Aaronontheweb:claude-wt-file_attach_images_toolhints

Aaronontheweb commented Jun 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Verification

Files

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Aaronontheweb commented Jun 1, 2026 •

edited

Loading