What happens
When you ask Netclaw to look at an image file — for example, "read this PNG and tell me what's in it" — the agent confidently describes a picture that isn't there. It makes things up.
In one test it was shown the blue akka.net mountain logo, and it instead described a "NetClaw dashboard wireframe" — colors, shapes, buttons, all completely invented.
Why it matters
Reading images is one of the headline features of the latest release. With a model and provider that support image input, it should just work. But in everyday use it's broken. Worse, the agent never says "I can't see it." It just invents an answer — which is more dangerous than plainly failing, because you can't tell the answer is wrong.
The cause, in plain terms
The image is correctly found, prepared, and handed off to be sent to the model on the very next step. But a fraction of a second later, the app throws away its own only copy of that image — before the message actually goes out the door.
So the model gets told "an image was loaded for you to look at," receives no actual image, and fills the silence by guessing.
Two things made this easy to miss:
- Nothing looked broken. The agent's read-file tool genuinely worked and reported success ("Image loaded…"). The failure happened one step later, out of sight.
- The model is innocent. When the exact same image was sent straight to the model, it described the logo correctly. So this is our own plumbing, not the model's eyesight.
How to reproduce
- Use a model/provider that supports image input.
- Run a headless prompt:
netclaw chat -p "Read the image at /path/to/some.png and describe exactly what you see."
- Point it at an image whose filename doesn't hint at its contents — rename it to something random first, so the agent can't cheat by guessing from the name.
- The agent will describe something that isn't actually in the picture.
Notes
- This affects the normal path used by the command line and regular chat. Sub-agents happen to be unaffected, because they handle the image in a different way.
- Separately: even once the image does reach the model, very small text inside a picture (like a tiny wordmark) can be misread. That's a normal limitation of the model's eyesight at low resolution — not part of this bug.
Status
Root cause identified. A fix has been implemented and verified end-to-end — with the fix in place, the agent correctly describes the image instead of inventing one. Unit regression tests have been added, and a broader end-to-end test is being added to keep this from coming back. A PR is pending.
For maintainers: In LlmSessionActor's streaming tool-completion (ApplyToolCallRecorded / CompleteToolBatch), the actor hands its _pendingModelInputMediaReferences list to the follow-up message and then Clear()s that same list instance. SessionState.AddSystemNudge / AddUserMessage stored the reference without copying, so the message was emptied before the next LLM call hydrated it. Fix: defensively snapshot ([.. mediaReferences]) at those two constructors. Missing coverage was an actor-level test that drives file_read on an image through the streaming path and asserts image bytes reach the chat client on the next call.
What happens
When you ask Netclaw to look at an image file — for example, "read this PNG and tell me what's in it" — the agent confidently describes a picture that isn't there. It makes things up.
In one test it was shown the blue akka.net mountain logo, and it instead described a "NetClaw dashboard wireframe" — colors, shapes, buttons, all completely invented.
Why it matters
Reading images is one of the headline features of the latest release. With a model and provider that support image input, it should just work. But in everyday use it's broken. Worse, the agent never says "I can't see it." It just invents an answer — which is more dangerous than plainly failing, because you can't tell the answer is wrong.
The cause, in plain terms
The image is correctly found, prepared, and handed off to be sent to the model on the very next step. But a fraction of a second later, the app throws away its own only copy of that image — before the message actually goes out the door.
So the model gets told "an image was loaded for you to look at," receives no actual image, and fills the silence by guessing.
Two things made this easy to miss:
How to reproduce
Notes
Status
Root cause identified. A fix has been implemented and verified end-to-end — with the fix in place, the agent correctly describes the image instead of inventing one. Unit regression tests have been added, and a broader end-to-end test is being added to keep this from coming back. A PR is pending.
For maintainers: In
LlmSessionActor's streaming tool-completion (ApplyToolCallRecorded/CompleteToolBatch), the actor hands its_pendingModelInputMediaReferenceslist to the follow-up message and thenClear()s that same list instance.SessionState.AddSystemNudge/AddUserMessagestored the reference without copying, so the message was emptied before the next LLM call hydrated it. Fix: defensively snapshot ([.. mediaReferences]) at those two constructors. Missing coverage was an actor-level test that drivesfile_readon an image through the streaming path and asserts image bytes reach the chat client on the next call.