Skip to content

fix(sessions): stop tool-loaded images from being dropped before the next LLM call#1265

Merged
Aaronontheweb merged 1 commit into
netclaw-dev:devfrom
Aaronontheweb:claude-wt-file_attach_images_toolhints
Jun 1, 2026
Merged

fix(sessions): stop tool-loaded images from being dropped before the next LLM call#1265
Aaronontheweb merged 1 commit into
netclaw-dev:devfrom
Aaronontheweb:claude-wt-file_attach_images_toolhints

Conversation

@Aaronontheweb

@Aaronontheweb Aaronontheweb commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Multimodal file_read shipped in v0.22.0, but on the main streaming session path a tool-loaded image silently never reached the model — the agent was told "Image loaded for model-visible inspection" and then confabulated the contents (e.g. shown the akka.net logo, it described an invented "NetClaw dashboard wireframe").

Fixes #1264.

Root cause

The streaming tool-completion path hands its mutable accumulator to the media nudge and immediately clears it:

// LlmSessionActor.ApplyToolCallRecorded (same shape in CompleteToolBatch)
AddModelInputMediaNudge(_pendingModelInputMediaReferences); // stores THIS list instance
_pendingModelInputMediaReferences.Clear();                  // ...then empties it

SessionState.BuildNudgeMessage / AddUserMessage stored that list by reference into SerializableChatMessage.MediaReferences (an init property with no defensive copy). The .Clear() then emptied the nudge's media references before FireLlmCall → SessionMessageAssembler.Assemble → ChatMessageConverter.ToAiMessage hydrated them into DataContent. Net effect: text-only message, no image, hallucination.

Why it was sneaky:

  • The tool genuinely succeeded (materialization copied the file to session media), so the existing handoff-failure warning never fired.
  • Sub-agents were unaffected because they hydrate the image to DataContent immediately at nudge-creation; the main session defers hydration to assembly time (for prefix-cache stability), leaving a window for the Clear() to corrupt the aliased list.
  • The non-streaming completion path passes a fresh list and is unaffected — but the CLI/headless and normal chat use streaming, hence the "always" reproduction.

Fix

Defensively snapshot the caller's list ([.. mediaReferences]) at the two SessionState message constructors, so the immutable persistence message owns its own copy regardless of caller behavior. Two-line production change, with comments documenting the aliasing/clear hazard.

Verification

  • Unit regression tests (SessionStateTests): build a List, hand it to AddSystemNudge/AddUserMessage, .Clear() it, assert the message still carries the media reference. Fail without the snapshot.
  • Actor-level integration test (LlmSessionImageDeliveryTests): drives a streaming file_read image load through LlmSessionActor and asserts image DataContent reaches the chat client on the next LLM call. Confirmed it fails without the fix (exact message: "Tool-loaded image never reached the model…") and passes with it — this was the missing coverage that let the bug ship.
  • End-to-end: ran a patched daemon against a real vision model — the agent now correctly describes the image, and the trace prompt dump shows DataContent mediaType=image/png on the wire.
  • Full Netclaw.Actors.Tests suite: 2175 passing. dotnet slopwatch analyze: 0 issues. Copyright headers verified.

Files

  • src/Netclaw.Actors/Sessions/SessionState.cs — the fix
  • src/Netclaw.Actors.Tests/Sessions/SessionStateTests.cs — unit regression tests
  • src/Netclaw.Actors.Tests/Sessions/LlmSessionImageDeliveryTests.cs — new integration test

Follow-up

Audio/video model input is out of scope for this fix (images only today) and is tracked separately in #1266.

…next LLM call

The streaming tool-completion path (LlmSessionActor.ApplyToolCallRecorded /
CompleteToolBatch) handed its mutable _pendingModelInputMediaReferences
accumulator to the model-input media nudge and then Clear()ed that same list
instance. SerializableChatMessage stored the reference without copying, so the
Clear() emptied the nudge's media references before the next LLM call hydrated
them. The model was told "Image loaded for model-visible inspection" but never
received the image bytes, and confabulated the contents.

Fix: defensively snapshot the caller's media list ([.. mediaReferences]) in
SessionState.BuildNudgeMessage and AddUserMessage so the immutable message
owns its own copy regardless of caller behavior.

Tests:
- unit regression tests (caller reuses/clears the list) in SessionStateTests
- an actor-level integration test that drives a streaming file_read image load
  and asserts image DataContent reaches the chat client on the next LLM call;
  it fails without the snapshot

Verified end-to-end against a vision model: with the fix the image bytes reach
the model and it describes the image correctly instead of hallucinating.

Fixes netclaw-dev#1264
@Aaronontheweb Aaronontheweb added bug Something isn't working context-pipeline LLM context assembly: prompt layers, dynamic injection, memory recall, temporal grounding sessions LLM session actor, turn lifecycle, pipelines labels Jun 1, 2026
@Aaronontheweb Aaronontheweb merged commit b7563be into netclaw-dev:dev Jun 1, 2026
14 checks passed
@Aaronontheweb Aaronontheweb deleted the claude-wt-file_attach_images_toolhints branch June 1, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working context-pipeline LLM context assembly: prompt layers, dynamic injection, memory recall, temporal grounding sessions LLM session actor, turn lifecycle, pipelines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent invents image contents instead of reading them (the picture never reaches the model)

1 participant