fix: DeepSeek Chat Completions API rejects image_url content blocks by SugerWu · Pull Request #26364 · NousResearch/hermes-agent

SugerWu · 2026-05-15T13:41:07Z

Problem

DeepSeek's OpenAI-compatible Chat Completions endpoint uses Rust-derived serde deserialization which strictly validates content type variants. Unlike OpenAI's Python-based parser which ignores unknown fields, DeepSeek's Rust schema only accepts "text" content blocks. Any message containing "image_url" is rejected with:

HTTP 400: Failed to deserialize the JSON body into the target type: messages[N]: unknown variant `image_url`, expected `text`

This affects all users of DeepSeek as their primary provider who also use vision-capable tools (computer_use screenshots, vision_analyze) because:

DeepSeek's model (e.g. deepseek-v4-flash) IS vision-capable at the model level, so the models.dev cache returns supports_vision=True.
But the Chat Completions API endpoint does not expose this capability through the standard image_url schema.
Vision requests hitting DeepSeek directly get HTTP 400 instead of being routed through the aggregator chain.

Root Cause

Three distinct issues contributed to the failure:

1. Missing provider-level vision routing

The _PROVIDERS_WITHOUT_VISION frozenset in auxiliary_client.py did not include "deepseek". Vision auto-routing incorrectly sent image_url payloads to DeepSeek's text-only endpoint.

2. Incomplete preflight image stripping

The preflight image-strip logic in run_agent.py only checked _model_supports_vision() from the models.dev cache. When the cache reported vision support (correct at the model level), images were NOT stripped before the wire call — even though the provider endpoint doesn't accept them.

Additionally, several helper functions (_preprocess_anthropic_content, _content_has_image_parts, _strip_images_from_messages) did not handle the _multimodal tool-result envelope format used by computer_use/vision tools, so they missed images embedded in tool results.

3. Brittle error detection

The reactive image-rejection handler extracted the error message using str(api_error.body or api_error.message or str(api_error)). The boolean or chain meant that if .body was an empty dict {} (falsy), it fell through to .message which had a different format ("HTTP 400: ...") that didn't match the _IMAGE_REJECTION_PHRASES keywords.

Solution

Three-layer defense to ensure images never reach DeepSeek's schema validator:

Layer 1 — Provider routing (`agent/auxiliary_client.py`)

Add "deepseek" to _PROVIDERS_WITHOUT_VISION so vision requests are always routed through the aggregator chain (OpenRouter → Nous) instead of hitting DeepSeek's text-only endpoint directly.
Skip DeepSeek in get_available_vision_backends() since its endpoint cannot accept image_url content.

Layer 2 — Preflight stripping (`run_agent.py`)

Add _provider_is_known_non_vision() method that checks the provider against the _PROVIDERS_WITHOUT_VISION set. This acts as an override signal so image stripping runs even when the models.dev cache reports vision support.
Wire the check into _prepare_messages_for_non_vision_model() so images are stripped during kwarg construction.
Add an additional proactive _strip_images_from_messages() call on api_kwargs before every wire call as a safety net.
Fix _preprocess_anthropic_content() and _content_has_image_parts() to properly unwrap the _multimodal tool-result envelope used by computer_use and other vision-capable handlers.
Fix _strip_images_from_messages() to handle the _multimodal envelope format.

Layer 3 — Robust error detection (`run_agent.py`)

Change the _err_body extraction to concatenate all three sources (.body, .message, str(error)) instead of using a boolean or chain, ensuring the match buffer always contains the full error text.
Add "unknown variant" and "failed to deserialize" to the _IMAGE_REJECTION_PHRASES tuple to specifically catch DeepSeek's Rust-serde error format.
Add IMAGE_STRIP_DEBUG logging throughout so future issues can be diagnosed without code changes.

Testing

Verified in isolation that all strip functions work correctly:

_strip_images_from_messages() correctly removes image_url parts from unwrapped tool result content lists
_prepare_messages_for_non_vision_model() correctly replaces images with text descriptions via the Anthropic fallback path
_provider_is_known_non_vision() correctly returns True for "deepseek"
_content_has_image_parts() correctly detects images in the _multimodal envelope

Also verified that computer_use screenshots can be captured without triggering DeepSeek HTTP 400 after the fix.

DeepSeek's OpenAI-compatible Chat Completions endpoint uses Rust-derived serde deserialization which strictly expects "text" content types — image_url blocks (even from valid vision models like deepseek-v4-flash) are rejected with HTTP 400 "unknown variant image_url, expected text". Three-layer fix: 1. Provider-level routing (auxiliary_client.py) - Add "deepseek" to _PROVIDERS_WITHOUT_VISION frozenset so vision requests are routed through the aggregator chain (OpenRouter → Nous) instead of hitting DeepSeek's text-only endpoint directly. - Skip deepseek in get_available_vision_backends() for the same reason. 2. Preflight image stripping (run_agent.py) - Add _provider_is_known_non_vision() method that checks the provider against the _PROVIDERS_WITHOUT_VISION set, overriding the models.dev cache when the provider's endpoint doesn't accept image_url. - Run proactive _strip_images_from_messages() on api_kwargs before every wire call when the provider is known non-vision. - Fix _prepare_messages_for_non_vision_model() to also check the provider blacklist, not just the model capability cache. - Fix _preprocess_anthropic_content() and _content_has_image_parts() to properly unwrap the _multimodal tool-result envelope used by computer_use and other vision-capable tools. 3. Robust error detection (run_agent.py) - Concatenate api_error.body, .message, and str(error) into the match buffer so the reactive image-rejection handler works regardless of which attribute carries the keyword signature. - Add "unknown variant" and "failed to deserialize" to the _IMAGE_REJECTION_PHRASES tuple to catch DeepSeek's Rust-serde error format. - Add IMAGE_STRIP_DEBUG logging throughout for future troubleshooting.

alt-glitch · 2026-05-15T14:03:05Z

Related to recently merged #25925 which added runtime image-stripping gates. This PR complements it by adding DeepSeek to _PROVIDERS_WITHOUT_VISION (static blocklist) and fixing _multimodal envelope handling in the preflight strip logic.

Copilot AI review requested due to automatic review settings May 15, 2026 13:41

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/deepseek DeepSeek API tool/vision Vision analysis and image generation labels May 15, 2026

alt-glitch mentioned this pull request May 15, 2026

fix: strip image parts for non-vision models with provider profiles #26498

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: DeepSeek Chat Completions API rejects image_url content blocks#26364

fix: DeepSeek Chat Completions API rejects image_url content blocks#26364
SugerWu wants to merge 1 commit into
NousResearch:mainfrom
SugerWu:fix/deepseek-image-rejection

SugerWu commented May 15, 2026

Uh oh!

alt-glitch commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SugerWu commented May 15, 2026

Problem

Root Cause

1. Missing provider-level vision routing

2. Incomplete preflight image stripping

3. Brittle error detection

Solution

Layer 1 — Provider routing (agent/auxiliary_client.py)

Layer 2 — Preflight stripping (run_agent.py)

Layer 3 — Robust error detection (run_agent.py)

Testing

Uh oh!

alt-glitch commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Layer 1 — Provider routing (`agent/auxiliary_client.py`)

Layer 2 — Preflight stripping (`run_agent.py`)

Layer 3 — Robust error detection (`run_agent.py`)