Skip to content

[Bug]: openai-responses Codex app-server drops inbound image attachments — params.images never populated from MediaPath #83466

@iannwu

Description

@iannwu

Bug type

Behavior bug

Beta release blocker

No

Summary

On the Codex app-server transport (provider=openai, modelApi=openai-responses, modelId=gpt-5.5), inbound image attachments from Discord (and structurally any channel that populates ctx.MediaPath / ctx.MediaPaths) are inlined into the prompt body as a text reference but never converted to params.images, so the vision-capable model receives no pixels and imagesCount: 0 for every image-bearing turn.

Steps to reproduce

  1. Configure an agent on this transport (provider: openai, modelApi: openai-responses, modelId: gpt-5.5) connected to a Discord channel.
  2. Send a message with an image attachment in that channel.
  3. Inspect <session>.trajectory.jsonl for that turn. The prompt.submitted event contains a prompt body with a [media attached: <abs-path> (image/png) | <abs-path>] line and data.imagesCount: 0. The model.completed.usage shows input token counts consistent with no image parts in the wire payload.

Expected behavior

The same image should reach the model as a vision input, equivalent to the behavior on the provider=openai-codex, modelApi=openai-codex-responses route on the same host, which loads MediaPath into base64 entries via resolveAcpAttachments() (dispatch-acp-oy1KlABg.js:425-455) and detectAndLoadPromptImages() (src/agents/pi-embedded-runner/run/images.ts). On those routes, imagesCount > 0 is observed when an image is attached.

Actual behavior

For every image-bearing turn on the openai/openai-responses route, the prompt body has the [media attached: ...] line, data.imagesCount is 0, and the model has no pixels available. The model still receives a built-in image tool whose description (openclaw-tools-BfDU2PXL.js:4056) states "Only use this tool when images were NOT already provided in the user's message. Images mentioned in the prompt are automatically visible to you." On this transport that second sentence is not true, so the model frequently does not call the image tool and answers as if it had inspected the image. In a sample of 1,123 turns across recent sessions on the same host: 284 turns referenced [media attached: in the prompt body; 246 of those had imagesCount: 0 on this route, and 0 of those 246 had any image data in the wire payload.

Code path observed in the installed package:

  • buildUserInput in /opt/homebrew/lib/node_modules/openclaw/dist/thread-lifecycle-BQKXEdzO.js:1084 consumes params.images if populated:
    function buildUserInput(params, promptText = params.prompt) {
      return [{ type: "text", text: promptText, text_elements: [] },
        ...(params.images ?? []).map((image) => ({
          type: "image",
          url: `data:${image.mimeType};base64,${image.data}`
        }))];
    }
  • buildInboundMediaNote in get-reply-wxuuKnTx.js:2225 converts ctx.MediaPath / ctx.MediaPaths into the text line that ends up in the prompt body.
  • No code in the Codex app-server reply assembly reads ctx.MediaPath(s) and populates params.images with base64-encoded image bytes. Equivalent loaders exist on the other transports.

OpenClaw version

2026.5.12

Operating system

macOS 15.4 (Darwin 25.4.0)

Install method

npm global

Model

openai/gpt-5.5

Provider / routing chain

openclaw -> openai (Codex app-server, modelApi=openai-responses) -> OpenAI Responses API

Additional provider/model setup details

  • Default agent on openai/gpt-5.5 via Codex app-server. agents.defaults.imageModel not set in ~/.openclaw/openclaw.json (the image tool falls back to the provider's default vision model).
  • Same host runs other agents/sessions on openai-codex/gpt-5.5 (Codex CLI native auth, modelApi=openai-codex-responses); on that route, imagesCount > 0 is observed in trajectory logs for image-bearing turns. This is the parity gap.

Logs, screenshots, and evidence

Trajectory event excerpt (sanitized) for a representative image-bearing turn on this transport:

{
  "type": "prompt.submitted",
  "provider": "openai",
  "modelId": "gpt-5.5",
  "modelApi": "openai-responses",
  "data": {
    "turnId": "...",
    "prompt": "... [media attached: ~/.openclaw/media/inbound/image---<uuid>.png (image/png) | ~/.openclaw/media/inbound/image---<uuid>.png] ...",
    "imagesCount": 0
  }
}

Tool call observed when the model does invoke the `image` tool (sometimes, ~5% of image-bearing turns):

{
  "type": "tool.call",
  "data": {
    "name": "image",
    "arguments": {
      "image": "~/.openclaw/media/inbound/image---<uuid>.png",
      "prompt": "Inspect this supplement bottle label proof ..."
    }
  }
}

Aggregate over 30 days on one host:
- 1123 total turns across all main-agent sessions
- 284 turns with `[media attached:` in the prompt body
- On `provider=openai, modelApi=openai-responses` (this bug): 246 image-bearing turns, 0 had imagesCount > 0
- On `provider=openai-codex, modelApi=openai-codex-responses` (parity): 38 image-bearing turns, 17 had imagesCount > 0

Impact and severity

  • Affected: any agent on provider=openai, modelApi=openai-responses receiving images from a transport that populates ctx.MediaPath(s) (observed on Discord; structurally affects any channel using the same media pipeline).
  • Severity: High for any workflow that depends on the model actually seeing an attached image. The model is told images are automatically visible (per the image tool description), so it typically skips the fallback tool and produces text that reads as if it had inspected the image. Downstream this drives confidently wrong visual claims and decision drift.
  • Frequency: Always on this transport. 246/246 image-bearing turns observed had imagesCount: 0 in the trajectory and no image parts in the wire payload.
  • Consequence: agents return hallucinated visual observations; users acting on those observations (e.g., approving print-ready label proofs) can be misled. Indirectly worsens conversation drift because the model commits to invented details and is then asked to refine them.

Additional information

  • A clean fix likely belongs in the Codex app-server reply assembly upstream of buildTurnStartParams: a shared helper that reads ctx.MediaPath / ctx.MediaPaths plus MIME metadata, filters to image/*, applies the same local-root / timeout / max-byte / sanitization rules already used by resolveAcpAttachments() (dispatch-acp-oy1KlABg.js:425-455) and the embedded runner's detectAndLoadPromptImages(), and returns { mimeType, data }[] that gets merged into params.images, preserving ordering with any caller-supplied opts.images. Structured context is the primary source; prompt-text parsing should remain a compatibility fallback.
  • The image tool description in openclaw-tools-BfDU2PXL.js:4056 is structurally wrong on transports that do not deliver images even though the underlying model is vision-capable. Detecting transport delivery (not just modelHasVision) and adjusting the description accordingly would defuse the confabulation. Worth folding into the same fix; if not, it is a small follow-up issue.
  • Regression markers worth covering in tests: data-URL size caps for OpenAI Responses, dedupe when both opts.images and ctx.MediaPath are present, multi-image ordering, and trajectory coverage asserting data.imagesCount > 0 for image-bearing turns on openai/openai-responses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions