fix(read): persist image data and inject MEDIA directive for channel delivery#11754
fix(read): persist image data and inject MEDIA directive for channel delivery#11754QDenka wants to merge 2 commits intoopenclaw:mainfrom
Conversation
…delivery When the read tool reads an image file, the base64 image data is returned as a content block visible to the LLM but never converted to a deliverable media URL. This means images read by agents are not sent to Telegram or other channels. Fix: after reading an image, persist the base64 data to a cache file under .openclaw/media-cache/ in the workspace and inject a MEDIA: directive into the text content block. The delivery pipeline then picks up the relative path and sends the image to the channel. Fixes openclaw#11735
| ): Promise<string | undefined> { | ||
| const ext = MIME_TO_EXT[imageBlock.mimeType] ?? "png"; | ||
| const hash = createHash("sha256") | ||
| .update(imageBlock.data.slice(0, 1024)) |
There was a problem hiding this comment.
Broken MEDIA path
persistReadImage returns a path like ./.openclaw/media-cache/..., but later code (and the test) treat the MEDIA: payload as a relative path without the leading ./ (see splitMediaFromOutput which returns ./... and the test’s match(/MEDIA:\.\/(.+)/)). With the current implementation, splitMediaFromOutput will parse MEDIA:././.openclaw/... and the extracted token becomes ././.openclaw/..., which will not map cleanly back to the workspace when consumers do join(workspaceRoot, token).
This breaks delivery for the generated directives unless the receiver happens to normalize ././.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-tools.read.ts
Line: 304:307
Comment:
**Broken MEDIA path**
`persistReadImage` returns a path like `./.openclaw/media-cache/...`, but later code (and the test) treat the `MEDIA:` payload as a *relative path without the leading `./`* (see `splitMediaFromOutput` which returns `./...` and the test’s `match(/MEDIA:\.\/(.+)/)`). With the current implementation, `splitMediaFromOutput` will parse `MEDIA:././.openclaw/...` and the extracted token becomes `././.openclaw/...`, which will not map cleanly back to the workspace when consumers do `join(workspaceRoot, token)`.
This breaks delivery for the generated directives unless the receiver happens to normalize `././`.
How can I resolve this? If you propose a fix, please make it concise.| function injectMediaDirective( | ||
| result: AgentToolResult<unknown>, | ||
| mediaPath: string, | ||
| ): AgentToolResult<unknown> { | ||
| const content = Array.isArray(result.content) ? result.content : []; | ||
| const nextContent = content.map((block) => { | ||
| if ( | ||
| block && | ||
| typeof block === "object" && | ||
| (block as { type?: unknown }).type === "text" && | ||
| typeof (block as { text?: unknown }).text === "string" | ||
| ) { | ||
| const b = block as TextContentBlock & { text: string }; | ||
| return { ...b, text: `${b.text}\nMEDIA:${mediaPath}` } satisfies TextContentBlock; | ||
| } |
There was a problem hiding this comment.
MEDIA injected into all text blocks
injectMediaDirective appends MEDIA: to every text block in the tool result. If read returns multiple text blocks (e.g., header + extra notes/errors), this will create multiple MEDIA: tokens; splitMediaFromOutput will then extract duplicates and also strip those lines from the user-visible text.
This should only inject into a single, intended text block (typically the first/header), or ensure only one directive is appended.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-tools.read.ts
Line: 327:341
Comment:
**MEDIA injected into all text blocks**
`injectMediaDirective` appends `MEDIA:` to *every* text block in the tool result. If `read` returns multiple text blocks (e.g., header + extra notes/errors), this will create multiple `MEDIA:` tokens; `splitMediaFromOutput` will then extract duplicates and also strip those lines from the user-visible text.
This should only inject into a single, intended text block (typically the first/header), or ensure only one directive is appended.
How can I resolve this? If you propose a fix, please make it concise.bfc1ccb to
f92900f
Compare
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
When the
readtool reads an image file, the base64 image data is returned as a content block visible to the LLM but never converted to a deliverable media URL. This means images read by agents are not sent to Telegram or other channels — the user only sees the agent's text reply without the image.Changes
src/agents/pi-tools.read.ts: After reading an image, persist the base64 data to a cache file under.openclaw/media-cache/in the workspace and inject aMEDIA:./relative-pathdirective into the text content block. The existing delivery pipeline then picks up the relative path viasplitMediaFromOutputand sends the image to the channel.src/agents/pi-tools.ts: PassworkspaceRoottocreateOpenClawReadToolfor the non-sandboxed path.src/agents/pi-tools.read.image-delivery.test.ts: Tests verifying MEDIA injection for image reads, no injection for text reads, and no injection without workspaceRoot.How it works
readon an image file{ type: 'image', data: '<base64>', mimeType: 'image/png' }content block.openclaw/media-cache/<hash>.pngMEDIA:./…directive is appended to the text content blocksplitMediaFromOutput) extracts the media URLsendMediaFixes #11735
Greptile Overview
Greptile Summary
This PR updates the
readtool wrapper so that when an image is read (base64imagecontent block), the image payload is persisted into a workspace-local cache directory (.openclaw/media-cache/) and aMEDIA:directive is appended to the tool’s text output. The existing media parsing/delivery pipeline (splitMediaFromOutput→ channelsendMedia) can then detect the directive and deliver the image to downstream channels. It also wiresworkspaceRootthrough the non-sandboxed tool creation path and adds a Vitest suite covering the injection behavior.Confidence Score: 2/5
splitMediaFromOutputpipeline, but the current directive/path formatting and injection behavior are inconsistent with how MEDIA tokens are parsed/consumed, and the persistence step can produce invalid/corrupted files without detection. These are likely to cause real delivery failures or duplicated MEDIA extraction in normal operation.(4/5) You can add custom instructions or style guidelines for the agent here!