Skip to content

fix: DeepSeek Chat Completions API rejects image_url content blocks#26364

Open
SugerWu wants to merge 1 commit into
NousResearch:mainfrom
SugerWu:fix/deepseek-image-rejection
Open

fix: DeepSeek Chat Completions API rejects image_url content blocks#26364
SugerWu wants to merge 1 commit into
NousResearch:mainfrom
SugerWu:fix/deepseek-image-rejection

Conversation

@SugerWu

@SugerWu SugerWu commented May 15, 2026

Copy link
Copy Markdown

Problem

DeepSeek's OpenAI-compatible Chat Completions endpoint uses Rust-derived serde deserialization which strictly validates content type variants. Unlike OpenAI's Python-based parser which ignores unknown fields, DeepSeek's Rust schema only accepts "text" content blocks. Any message containing "image_url" is rejected with:

HTTP 400: Failed to deserialize the JSON body into the target type: messages[N]: unknown variant `image_url`, expected `text`

This affects all users of DeepSeek as their primary provider who also use vision-capable tools (computer_use screenshots, vision_analyze) because:

  1. DeepSeek's model (e.g. deepseek-v4-flash) IS vision-capable at the model level, so the models.dev cache returns supports_vision=True.
  2. But the Chat Completions API endpoint does not expose this capability through the standard image_url schema.
  3. Vision requests hitting DeepSeek directly get HTTP 400 instead of being routed through the aggregator chain.

Root Cause

Three distinct issues contributed to the failure:

1. Missing provider-level vision routing

The _PROVIDERS_WITHOUT_VISION frozenset in auxiliary_client.py did not include "deepseek". Vision auto-routing incorrectly sent image_url payloads to DeepSeek's text-only endpoint.

2. Incomplete preflight image stripping

The preflight image-strip logic in run_agent.py only checked _model_supports_vision() from the models.dev cache. When the cache reported vision support (correct at the model level), images were NOT stripped before the wire call — even though the provider endpoint doesn't accept them.

Additionally, several helper functions (_preprocess_anthropic_content, _content_has_image_parts, _strip_images_from_messages) did not handle the _multimodal tool-result envelope format used by computer_use/vision tools, so they missed images embedded in tool results.

3. Brittle error detection

The reactive image-rejection handler extracted the error message using str(api_error.body or api_error.message or str(api_error)). The boolean or chain meant that if .body was an empty dict {} (falsy), it fell through to .message which had a different format ("HTTP 400: ...") that didn't match the _IMAGE_REJECTION_PHRASES keywords.

Solution

Three-layer defense to ensure images never reach DeepSeek's schema validator:

Layer 1 — Provider routing (agent/auxiliary_client.py)

  • Add "deepseek" to _PROVIDERS_WITHOUT_VISION so vision requests are always routed through the aggregator chain (OpenRouter → Nous) instead of hitting DeepSeek's text-only endpoint directly.
  • Skip DeepSeek in get_available_vision_backends() since its endpoint cannot accept image_url content.

Layer 2 — Preflight stripping (run_agent.py)

  • Add _provider_is_known_non_vision() method that checks the provider against the _PROVIDERS_WITHOUT_VISION set. This acts as an override signal so image stripping runs even when the models.dev cache reports vision support.
  • Wire the check into _prepare_messages_for_non_vision_model() so images are stripped during kwarg construction.
  • Add an additional proactive _strip_images_from_messages() call on api_kwargs before every wire call as a safety net.
  • Fix _preprocess_anthropic_content() and _content_has_image_parts() to properly unwrap the _multimodal tool-result envelope used by computer_use and other vision-capable handlers.
  • Fix _strip_images_from_messages() to handle the _multimodal envelope format.

Layer 3 — Robust error detection (run_agent.py)

  • Change the _err_body extraction to concatenate all three sources (.body, .message, str(error)) instead of using a boolean or chain, ensuring the match buffer always contains the full error text.
  • Add "unknown variant" and "failed to deserialize" to the _IMAGE_REJECTION_PHRASES tuple to specifically catch DeepSeek's Rust-serde error format.
  • Add IMAGE_STRIP_DEBUG logging throughout so future issues can be diagnosed without code changes.

Testing

Verified in isolation that all strip functions work correctly:

  • _strip_images_from_messages() correctly removes image_url parts from unwrapped tool result content lists
  • _prepare_messages_for_non_vision_model() correctly replaces images with text descriptions via the Anthropic fallback path
  • _provider_is_known_non_vision() correctly returns True for "deepseek"
  • _content_has_image_parts() correctly detects images in the _multimodal envelope

Also verified that computer_use screenshots can be captured without triggering DeepSeek HTTP 400 after the fix.

DeepSeek's OpenAI-compatible Chat Completions endpoint uses Rust-derived
serde deserialization which strictly expects "text" content types —
image_url blocks (even from valid vision models like deepseek-v4-flash)
are rejected with HTTP 400 "unknown variant image_url, expected text".

Three-layer fix:

1. Provider-level routing (auxiliary_client.py)
   - Add "deepseek" to _PROVIDERS_WITHOUT_VISION frozenset so vision
     requests are routed through the aggregator chain (OpenRouter → Nous)
     instead of hitting DeepSeek's text-only endpoint directly.
   - Skip deepseek in get_available_vision_backends() for the same reason.

2. Preflight image stripping (run_agent.py)
   - Add _provider_is_known_non_vision() method that checks the provider
     against the _PROVIDERS_WITHOUT_VISION set, overriding the models.dev
     cache when the provider's endpoint doesn't accept image_url.
   - Run proactive _strip_images_from_messages() on api_kwargs before
     every wire call when the provider is known non-vision.
   - Fix _prepare_messages_for_non_vision_model() to also check the
     provider blacklist, not just the model capability cache.
   - Fix _preprocess_anthropic_content() and _content_has_image_parts()
     to properly unwrap the _multimodal tool-result envelope used by
     computer_use and other vision-capable tools.

3. Robust error detection (run_agent.py)
   - Concatenate api_error.body, .message, and str(error) into the match
     buffer so the reactive image-rejection handler works regardless of
     which attribute carries the keyword signature.
   - Add "unknown variant" and "failed to deserialize" to the
     _IMAGE_REJECTION_PHRASES tuple to catch DeepSeek's Rust-serde error
     format.
   - Add IMAGE_STRIP_DEBUG logging throughout for future troubleshooting.
Copilot AI review requested due to automatic review settings May 15, 2026 13:41
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/deepseek DeepSeek API tool/vision Vision analysis and image generation labels May 15, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to recently merged #25925 which added runtime image-stripping gates. This PR complements it by adding DeepSeek to _PROVIDERS_WITHOUT_VISION (static blocklist) and fixing _multimodal envelope handling in the preflight strip logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/deepseek DeepSeek API tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants