Agent invents image contents instead of reading them (the picture never reaches the model)

## What happens

When you ask Netclaw to look at an image file — for example, *"read this PNG and tell me what's in it"* — the agent confidently describes a picture that isn't there. It makes things up.

In one test it was shown the blue **akka.net** mountain logo, and it instead described a *"NetClaw dashboard wireframe"* — colors, shapes, buttons, all completely invented.

## Why it matters

Reading images is one of the headline features of the latest release. With a model and provider that support image input, it should just work. But in everyday use it's broken. Worse, the agent never says *"I can't see it."* It just invents an answer — which is more dangerous than plainly failing, because you can't tell the answer is wrong.

## The cause, in plain terms

The image is correctly found, prepared, and handed off to be sent to the model on the very next step. But a fraction of a second later, the app throws away its own only copy of that image — *before* the message actually goes out the door.

So the model gets told *"an image was loaded for you to look at,"* receives no actual image, and fills the silence by guessing.

Two things made this easy to miss:

1. **Nothing looked broken.** The agent's read-file tool genuinely worked and reported success (*"Image loaded…"*). The failure happened one step later, out of sight.
2. **The model is innocent.** When the exact same image was sent straight to the model, it described the logo correctly. So this is our own plumbing, not the model's eyesight.

## How to reproduce

1. Use a model/provider that supports image input.
2. Run a headless prompt:
   ```
   netclaw chat -p "Read the image at /path/to/some.png and describe exactly what you see."
   ```
3. Point it at an image whose **filename doesn't hint at its contents** — rename it to something random first, so the agent can't cheat by guessing from the name.
4. The agent will describe something that isn't actually in the picture.

## Notes

- This affects the normal path used by the command line and regular chat. Sub-agents happen to be unaffected, because they handle the image in a different way.
- Separately: even once the image *does* reach the model, very small text inside a picture (like a tiny wordmark) can be misread. That's a normal limitation of the model's eyesight at low resolution — **not** part of this bug.

## Status

Root cause identified. A fix has been implemented and verified end-to-end — with the fix in place, the agent correctly describes the image instead of inventing one. Unit regression tests have been added, and a broader end-to-end test is being added to keep this from coming back. A PR is pending.

---

<sub>**For maintainers:** In `LlmSessionActor`'s streaming tool-completion (`ApplyToolCallRecorded` / `CompleteToolBatch`), the actor hands its `_pendingModelInputMediaReferences` list to the follow-up message and then `Clear()`s that *same* list instance. `SessionState.AddSystemNudge` / `AddUserMessage` stored the reference without copying, so the message was emptied before the next LLM call hydrated it. Fix: defensively snapshot (`[.. mediaReferences]`) at those two constructors. Missing coverage was an actor-level test that drives `file_read` on an image through the streaming path and asserts image bytes reach the chat client on the next call.</sub>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent invents image contents instead of reading them (the picture never reaches the model) #1264

What happens

Why it matters

The cause, in plain terms

How to reproduce

Notes

Status

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agent invents image contents instead of reading them (the picture never reaches the model) #1264

Description

What happens

Why it matters

The cause, in plain terms

How to reproduce

Notes

Status

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions