Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
Images sent via the /v1/responses HTTP API are silently dropped by the embedded PI runner's sanitizeImageBlocks, even though the Anthropic model supports vision. The gateway accepts the input_image block (200 OK) but strips it before the prompt reaches the model. The model then hallucinates or says "I don't see any photo."
Steps to reproduce
- Configure OpenClaw with anthropic/claude-sonnet-4-5 (or claude-sonnet-4-6) as the default model
- Send a POST to /v1/responses with an input_image block:
{
"model": "openclaw",
"input": [{
"type": "message",
"role": "user",
"content": [
{"type": "input_text", "text": "What is in this photo?"},
{"type": "input_image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "<base64>"}}
]
}],
"user": "test"
}
- Observe gateway log: Native image: dropped 1 image(s) after sanitization (prompt:images).
- Model responds: "I don't see any photo attached to your message."
Expected behavior
Images should pass through sanitization to the model when the provider catalog reports the model as vision-capable (text+image).
Actual behavior
Images are dropped. The model never sees them. If the model is Georgia (a business agent), she hallucinates a plausible description instead of saying she can't see the image.
OpenClaw version
2026.5.18 (50a2481)
Operating system
Ubuntu 24.04 (Azure VM)
Install method
npm global
Model
anthropic/claude-sonnet-4-5
Provider / routing chain
WhatsApp → wa-bridge (FastAPI) → POST /v1/responses → anthropic
Additional provider/model setup details
openclaw models list shows the configured model as text-only:
anthropic/claude-sonnet-4-5 text 195k no no default
But openclaw models list --all (full provider catalog) correctly shows:
anthropic/claude-sonnet-4-5 text+image 195k no no default
The embedded PI runner's sanitizeImageBlocks checks the resolved/configured model capability (which says text), not the full provider catalog (which says text+image). Since the model appears text-only, all images are stripped.
This affects every Anthropic model accessed via the direct anthropic provider — claude-sonnet-4-5, claude-sonnet-4-6, etc. all show text in the configured view despite the provider catalog listing them as text+image.
Logs, screenshots, and evidence
Logs:
agent model: anthropic/claude-sonnet-4-5 (thinking=medium, fast=off)
Native image: dropped 1 image(s) after sanitization (prompt:images).
I don't see any photo attached to your message.
Model capability mismatch:
# Configured view (wrong):
openclaw models list
anthropic/claude-sonnet-4-5 text 195k default
# Full provider catalog (correct):
openclaw models list --all
anthropic/claude-sonnet-4-5 text+image 195k default
Impact and severity
Affected: WhatsApp users sending images via wa-bridge → /v1/responses
Severity: High (all inbound images silently dropped, model hallucinates responses)
Frequency: 100% — every image sent via /v1/responses to any direct Anthropic model
Consequence: Vision completely broken on the /v1/responses HTTP API path for Anthropic provider. Model fabricates image descriptions instead of processing actual image content.
Additional information
No response
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
Images sent via the /v1/responses HTTP API are silently dropped by the embedded PI runner's sanitizeImageBlocks, even though the Anthropic model supports vision. The gateway accepts the input_image block (200 OK) but strips it before the prompt reaches the model. The model then hallucinates or says "I don't see any photo."
Steps to reproduce
{ "model": "openclaw", "input": [{ "type": "message", "role": "user", "content": [ {"type": "input_text", "text": "What is in this photo?"}, {"type": "input_image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "<base64>"}} ] }], "user": "test" }Expected behavior
Images should pass through sanitization to the model when the provider catalog reports the model as vision-capable (text+image).
Actual behavior
Images are dropped. The model never sees them. If the model is Georgia (a business agent), she hallucinates a plausible description instead of saying she can't see the image.
OpenClaw version
2026.5.18 (50a2481)
Operating system
Ubuntu 24.04 (Azure VM)
Install method
npm global
Model
anthropic/claude-sonnet-4-5
Provider / routing chain
WhatsApp → wa-bridge (FastAPI) → POST /v1/responses → anthropic
Additional provider/model setup details
openclaw models list shows the configured model as text-only:
anthropic/claude-sonnet-4-5 text 195k no no defaultBut openclaw models list --all (full provider catalog) correctly shows:
anthropic/claude-sonnet-4-5 text+image 195k no no defaultThe embedded PI runner's
sanitizeImageBlockschecks the resolved/configured model capability (which says text), not the full provider catalog (which says text+image). Since the model appears text-only, all images are stripped.This affects every Anthropic model accessed via the direct anthropic provider — claude-sonnet-4-5, claude-sonnet-4-6, etc. all show text in the configured view despite the provider catalog listing them as text+image.
Logs, screenshots, and evidence
Impact and severity
Affected: WhatsApp users sending images via wa-bridge → /v1/responses
Severity: High (all inbound images silently dropped, model hallucinates responses)
Frequency: 100% — every image sent via /v1/responses to any direct Anthropic model
Consequence: Vision completely broken on the /v1/responses HTTP API path for Anthropic provider. Model fabricates image descriptions instead of processing actual image content.
Additional information
No response