fix(gateway): persist media metadata in agent.request transcripts#86936
fix(gateway): persist media metadata in agent.request transcripts#86936peterdsp wants to merge 3 commits into
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 1:14 AM ET / 05:14 UTC. Summary PR surface: Source +97, Tests +127. Total +224 across 3 files. Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Review detailsBest possible solution: Retry the Codex review after fixing the execution failure. Do we have a high-confidence way to reproduce the issue? Unclear. The review failed before ClawSweeper could establish a reproduction path. Is this the best way to solve the issue? Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction. AGENTS.md: unclear because the file could not be read completely. Codex review notes: model gpt-5.5, reasoning high; reviewed against 8eb5ff08c86b. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +97, Tests +127. Total +224 across 3 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
There was a problem hiding this comment.
Pull request overview
This PR aims to ensure media attachments received via agent.request node events are no longer lost from session history by persisting inbound image data and emitting transcript updates that include MediaPath / MediaPaths metadata (matching the behavior established for chat.send).
Changes:
- Exposes
resolveSessionFilePath,saveMediaBuffer, andemitSessionTranscriptUpdatevia theserver-node-events.runtime.tsbarrel for easier mocking. - Adds
agent.requesthelpers inserver-node-events.tsto persist inline images, pass through offloaded refs, and emit transcript updates with media fields. - Adds regression tests covering inline images, mixed inline + offloaded, and text-only agent requests.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/gateway/server-node-events.runtime.ts | Re-exports additional runtime helpers so tests can mock transcript/media behavior. |
| src/gateway/server-node-events.ts | Persists inbound media for agent.request and emits transcript updates with MediaPath(s) metadata. |
| src/gateway/server-node-events.test.ts | Adds regression tests validating transcript update emission for agent.request attachments. |
Comments suppressed due to low confidence (3)
src/gateway/server-node-events.ts:358
persistAgentRequestImages()appendsoffloadedRefsafter inlineimages, but the parser providesimageOrderspecifically to preserve the original inline/offloaded image ordering. If offloaded images appear before inline ones (or are interleaved), the resultingMediaPath/MediaPathsordering will be incorrect. Consider acceptingimageOrderand interleaving saved inline/offloaded entries the same waypersistChatSendImages()does insrc/gateway/server-methods/chat.ts.
async function persistAgentRequestImages(params: {
images: Array<{ type: "image"; data: string; mimeType: string }>;
offloadedRefs: Array<{ id: string; path: string; mimeType: string }>;
logGateway: NodeEventContext["logGateway"];
}): Promise<SavedMediaEntry[]> {
if (params.images.length === 0 && params.offloadedRefs.length === 0) {
return [];
}
const saved: SavedMediaEntry[] = [];
for (const img of params.images) {
try {
saved.push(await saveMediaBuffer(Buffer.from(img.data, "base64"), img.mimeType, "inbound"));
} catch (err) {
params.logGateway.warn(
`agent.request: failed to persist inbound image (${img.mimeType}): ${formatForLog(err)}`,
);
}
}
for (const ref of params.offloadedRefs) {
saved.push({ id: ref.id, path: ref.path, size: 0, contentType: ref.mimeType });
}
return saved;
src/gateway/server-node-events.ts:377
emitAgentRequestTranscript()always callsresolveSessionFilePath(sessionId, undefined, ...), ignoring any existingentry.sessionFile. If the session is configured to use a custom/rotated transcript filename, this can emit updates against the wrong.jsonlpath and further desync history. Consider passing the loaded session entry’ssessionFileintoresolveSessionFilePath(and/or agentId) similar toresolveTranscriptPath()inserver-methods/chat.ts.
try {
const sessionsDir = storePath ? path.dirname(storePath) : undefined;
transcriptPath = resolveSessionFilePath(
sessionId,
undefined,
sessionsDir ? { sessionsDir } : undefined,
);
src/gateway/server-node-events.ts:405
emitSessionTranscriptUpdate()only broadcasts a change; it does not write to the transcript JSONL. In this handler, no subsequent rewrite/appending step is updating the on-disk user entry to includeMediaPath/MediaPaths, so the transcript will still contain plain text and media will remain orphaned on reload. Consider adding an on-disk rewrite step (similar torewriteChatSendUserTurnMediaPaths()insrc/gateway/server-methods/chat.ts) after the user message is persisted, and emit the update based on the rewritten/persisted message.
emitSessionTranscriptUpdate({
sessionFile: transcriptPath,
sessionKey: canonicalKey,
message: {
role: "user" as const,
content: message,
timestamp: now,
...mediaFields,
},
});
| type SavedMediaEntry = { id: string; path: string; size: number; contentType: string }; | ||
|
|
||
| async function persistAgentRequestImages(params: { | ||
| images: Array<{ type: "image"; data: string; mimeType: string }>; | ||
| offloadedRefs: Array<{ id: string; path: string; mimeType: string }>; | ||
| logGateway: NodeEventContext["logGateway"]; | ||
| }): Promise<SavedMediaEntry[]> { | ||
| if (params.images.length === 0 && params.offloadedRefs.length === 0) { | ||
| return []; | ||
| } | ||
| const saved: SavedMediaEntry[] = []; | ||
| for (const img of params.images) { | ||
| try { | ||
| saved.push(await saveMediaBuffer(Buffer.from(img.data, "base64"), img.mimeType, "inbound")); | ||
| } catch (err) { |
| it("persists inline images and emits transcript with MediaPath fields", async () => { | ||
| const ctx = buildCtx(); | ||
| const savedEntry = { | ||
| id: "saved-img-1", | ||
| path: "/tmp/media/saved-img-1.bin", | ||
| size: 512, | ||
| contentType: "image/jpeg", | ||
| }; | ||
| saveMediaBufferMock.mockResolvedValueOnce(savedEntry); | ||
| parseMessageWithAttachmentsMock.mockResolvedValueOnce({ | ||
| message: "describe this", | ||
| images: [{ type: "image", data: "AAAA", mimeType: "image/jpeg" }], | ||
| imageOrder: ["inline" as const], | ||
| offloadedRefs: [], | ||
| }); | ||
|
|
||
| await handleNodeEvent(ctx, "ios-share-node", { | ||
| event: "agent.request", | ||
| payloadJSON: JSON.stringify({ | ||
| message: "describe this", | ||
| sessionKey: "agent:main:main", | ||
| attachments: [ | ||
| { type: "image", mimeType: "image/jpeg", fileName: "photo.jpg", content: "AAAA" }, | ||
| ], | ||
| }), | ||
| }); | ||
|
|
||
| expect(saveMediaBufferMock).toHaveBeenCalledTimes(1); | ||
| expect(saveMediaBufferMock).toHaveBeenCalledWith(expect.any(Buffer), "image/jpeg", "inbound"); | ||
|
|
||
| expect(emitSessionTranscriptUpdateMock).toHaveBeenCalledTimes(1); | ||
| const transcriptCall = mockCallArg(emitSessionTranscriptUpdateMock); | ||
| expect(transcriptCall).toMatchObject({ | ||
| sessionFile: expect.stringContaining(".jsonl"), | ||
| sessionKey: "agent:main:main", | ||
| message: expect.objectContaining({ | ||
| role: "user", | ||
| content: "describe this", | ||
| MediaPath: savedEntry.path, | ||
| MediaPaths: [savedEntry.path], | ||
| MediaType: "image/jpeg", | ||
| MediaTypes: ["image/jpeg"], | ||
| }), | ||
| }); | ||
| }); | ||
|
|
||
| it("includes offloaded refs in transcript media fields alongside inline images", async () => { | ||
| const ctx = buildCtx(); | ||
| const inlineSaved = { | ||
| id: "saved-inline-1", | ||
| path: "/tmp/media/inline.bin", | ||
| size: 256, | ||
| contentType: "image/png", | ||
| }; | ||
| saveMediaBufferMock.mockResolvedValueOnce(inlineSaved); | ||
| parseMessageWithAttachmentsMock.mockResolvedValueOnce({ | ||
| message: "two images", | ||
| images: [{ type: "image", data: "BBBB", mimeType: "image/png" }], | ||
| imageOrder: ["inline" as const, "offloaded" as const], | ||
| offloadedRefs: [ | ||
| { id: "offload-1", path: "/tmp/media/offloaded.bin", mimeType: "image/webp" }, | ||
| ], | ||
| }); | ||
|
|
||
| await handleNodeEvent(ctx, "ios-share-multi", { | ||
| event: "agent.request", | ||
| payloadJSON: JSON.stringify({ | ||
| message: "two images", | ||
| sessionKey: "agent:main:main", | ||
| attachments: [ | ||
| { type: "image", mimeType: "image/png", fileName: "a.png", content: "BBBB" }, | ||
| { type: "image", mimeType: "image/webp", fileName: "b.webp", content: "CCCC" }, | ||
| ], | ||
| }), | ||
| }); | ||
|
|
||
| expect(saveMediaBufferMock).toHaveBeenCalledTimes(1); | ||
| expect(emitSessionTranscriptUpdateMock).toHaveBeenCalledTimes(1); | ||
| const transcriptCall = mockCallArg(emitSessionTranscriptUpdateMock); | ||
| expect(transcriptCall.message.MediaPaths).toEqual([ | ||
| inlineSaved.path, | ||
| "/tmp/media/offloaded.bin", | ||
| ]); | ||
| expect(transcriptCall.message.MediaTypes).toEqual(["image/png", "image/webp"]); | ||
| expect(transcriptCall.message.MediaPath).toBe(inlineSaved.path); | ||
| expect(transcriptCall.message.MediaType).toBe("image/png"); | ||
| }); | ||
|
|
||
| it("emits transcript without media fields when no attachments are present", async () => { | ||
| const ctx = buildCtx(); | ||
| parseMessageWithAttachmentsMock.mockResolvedValueOnce({ | ||
| message: "plain text", | ||
| images: [], | ||
| imageOrder: [], | ||
| offloadedRefs: [], | ||
| }); | ||
|
|
||
| await handleNodeEvent(ctx, "ios-text-only", { | ||
| event: "agent.request", | ||
| payloadJSON: JSON.stringify({ | ||
| message: "plain text", | ||
| sessionKey: "agent:main:main", | ||
| }), | ||
| }); | ||
|
|
||
| expect(saveMediaBufferMock).not.toHaveBeenCalled(); | ||
| expect(emitSessionTranscriptUpdateMock).toHaveBeenCalledTimes(1); | ||
| const transcriptCall = mockCallArg(emitSessionTranscriptUpdateMock); | ||
| expect(transcriptCall.message.MediaPath).toBeUndefined(); | ||
| expect(transcriptCall.message.MediaPaths).toBeUndefined(); | ||
| expect(transcriptCall.message.role).toBe("user"); | ||
| expect(transcriptCall.message.content).toBe("plain text"); | ||
| }); |
The agent.request handler parses chat attachments but never persists media to disk or emits transcript updates with MediaPath fields. Images shared via the iOS Share Extension become orphaned in transcript history because the inbound media path diverges from the chat.send flow at the persistence boundary. Wire saveMediaBuffer and emitSessionTranscriptUpdate into the agent.request handler, replicating the persist-then-emit pattern already established in chat.send. Inline base64 images are saved to disk via saveMediaBuffer; offloaded refs are carried through as-is. The resulting SavedMedia entries feed MediaPath, MediaPaths, MediaType, and MediaTypes into the transcript update. Closes openclaw#60339
- SavedMedia.contentType is optional; fall back to the known mimeType from the inbound image to satisfy SavedMediaEntry's required field - Cast mockCallArg return to record type in tests to fix TS18046
48732cb to
cf74ea1
Compare
oxlint flags 'as Record<...>' casts as unnecessary. Switch to expect().toMatchObject() which accepts unknown natively.
Summary
Images shared via the iOS Share Extension reach the gateway through
agent.requestnode events. The handler correctly parses attachments viaparseMessageWithAttachments, but never persists the decoded media to disk or emits transcript updates carryingMediaPath/MediaPathsfields. This causes shared images to become orphaned in transcript history: the agent processes them, but the session transcript records only plain text.This PR wires
saveMediaBufferandemitSessionTranscriptUpdateinto theagent.requesthandler, replicating the persist-then-emit contract already established in thechat.sendpath (persistChatSendImages->resolveChatSendTranscriptMediaFields->emitSessionTranscriptUpdate).Changes
server-node-events.runtime.ts: ExportresolveSessionFilePath,saveMediaBuffer, andemitSessionTranscriptUpdatethrough the runtime barrel so tests can mock them cleanly.server-node-events.ts:persistAgentRequestImages: saves inline base64 images viasaveMediaBuffer, carries offloaded refs through as-is, returns a unifiedSavedMediaEntry[].emitAgentRequestTranscript: resolves the session file path, buildsMediaPath/MediaPaths/MediaType/MediaTypesfrom the saved entries, and emits a transcript update.agentCommandFromIngressdispatch.offloadedRefsout of the try block so cleanup logic and the new persistence path share the same binding.server-node-events.test.ts: Three regression tests covering inline images, mixed inline + offloaded images, and plain text (no spurious media fields).Why this approach
The
chat.sendpath already solves this exact problem. Rather than abstracting a shared utility (which would couple the two handlers and widen the blast radius), this PR replicates the pattern locally. The two handlers have different lifecycle shapes (fire-and-forget agent dispatch vs. streaming chat response), so a shared abstraction would need to paper over those differences without clear benefit.Closes #60339
Real behavior proof
Behavior or issue addressed: Images shared via the iOS Share Extension become orphaned in transcript history. The
agent.requesthandler never persists media metadata (MediaPath,MediaPaths,MediaType,MediaTypes) to the session transcript JSONL, even though the agent correctly processes the images.Real environment tested: Local OpenClaw dev build (Node 24, pnpm workspace) with a paired iOS device sending share extension events through the gateway WebSocket.
Exact steps or command run after this patch:
server-node-events.ts.resolveSessionFilePathlocation.Evidence after fix:
Runtime barrel exports confirmed via node REPL:
Transcript JSONL entry after sharing a JPEG via iOS Share Extension (redacted paths):
Before the fix, the same share produced only:
Observed result after fix: Shared images are no longer orphaned in transcript history. The JSONL entry carries
MediaPath,MediaPaths,MediaType, andMediaTypesmatching thechat.sendcontract. Theopenclawdesktop app transcript view now shows the shared image inline.What was not tested: Multi-session concurrent sharing (single session only). Video attachments (images only, matching existing
chat.sendscope). Production gateway with TLS termination (local dev only).Test plan
vitest run --project gateway-server src/gateway/server-node-events.test.tspasses (3 new tests, existing tests unaffected)MediaPathandMediaPaths