fix: DeepSeek Chat Completions API rejects image_url content blocks#26364
Open
SugerWu wants to merge 1 commit into
Open
fix: DeepSeek Chat Completions API rejects image_url content blocks#26364SugerWu wants to merge 1 commit into
SugerWu wants to merge 1 commit into
Conversation
DeepSeek's OpenAI-compatible Chat Completions endpoint uses Rust-derived
serde deserialization which strictly expects "text" content types —
image_url blocks (even from valid vision models like deepseek-v4-flash)
are rejected with HTTP 400 "unknown variant image_url, expected text".
Three-layer fix:
1. Provider-level routing (auxiliary_client.py)
- Add "deepseek" to _PROVIDERS_WITHOUT_VISION frozenset so vision
requests are routed through the aggregator chain (OpenRouter → Nous)
instead of hitting DeepSeek's text-only endpoint directly.
- Skip deepseek in get_available_vision_backends() for the same reason.
2. Preflight image stripping (run_agent.py)
- Add _provider_is_known_non_vision() method that checks the provider
against the _PROVIDERS_WITHOUT_VISION set, overriding the models.dev
cache when the provider's endpoint doesn't accept image_url.
- Run proactive _strip_images_from_messages() on api_kwargs before
every wire call when the provider is known non-vision.
- Fix _prepare_messages_for_non_vision_model() to also check the
provider blacklist, not just the model capability cache.
- Fix _preprocess_anthropic_content() and _content_has_image_parts()
to properly unwrap the _multimodal tool-result envelope used by
computer_use and other vision-capable tools.
3. Robust error detection (run_agent.py)
- Concatenate api_error.body, .message, and str(error) into the match
buffer so the reactive image-rejection handler works regardless of
which attribute carries the keyword signature.
- Add "unknown variant" and "failed to deserialize" to the
_IMAGE_REJECTION_PHRASES tuple to catch DeepSeek's Rust-serde error
format.
- Add IMAGE_STRIP_DEBUG logging throughout for future troubleshooting.
Collaborator
|
Related to recently merged #25925 which added runtime image-stripping gates. This PR complements it by adding DeepSeek to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
DeepSeek's OpenAI-compatible Chat Completions endpoint uses Rust-derived serde deserialization which strictly validates content type variants. Unlike OpenAI's Python-based parser which ignores unknown fields, DeepSeek's Rust schema only accepts
"text"content blocks. Any message containing"image_url"is rejected with:This affects all users of DeepSeek as their primary provider who also use vision-capable tools (computer_use screenshots, vision_analyze) because:
supports_vision=True.image_urlschema.Root Cause
Three distinct issues contributed to the failure:
1. Missing provider-level vision routing
The
_PROVIDERS_WITHOUT_VISIONfrozenset inauxiliary_client.pydid not include"deepseek". Vision auto-routing incorrectly sentimage_urlpayloads to DeepSeek's text-only endpoint.2. Incomplete preflight image stripping
The preflight image-strip logic in
run_agent.pyonly checked_model_supports_vision()from the models.dev cache. When the cache reported vision support (correct at the model level), images were NOT stripped before the wire call — even though the provider endpoint doesn't accept them.Additionally, several helper functions (
_preprocess_anthropic_content,_content_has_image_parts,_strip_images_from_messages) did not handle the_multimodaltool-result envelope format used by computer_use/vision tools, so they missed images embedded in tool results.3. Brittle error detection
The reactive image-rejection handler extracted the error message using
str(api_error.body or api_error.message or str(api_error)). The booleanorchain meant that if.bodywas an empty dict{}(falsy), it fell through to.messagewhich had a different format ("HTTP 400: ...") that didn't match the_IMAGE_REJECTION_PHRASESkeywords.Solution
Three-layer defense to ensure images never reach DeepSeek's schema validator:
Layer 1 — Provider routing (
agent/auxiliary_client.py)"deepseek"to_PROVIDERS_WITHOUT_VISIONso vision requests are always routed through the aggregator chain (OpenRouter → Nous) instead of hitting DeepSeek's text-only endpoint directly.get_available_vision_backends()since its endpoint cannot accept image_url content.Layer 2 — Preflight stripping (
run_agent.py)_provider_is_known_non_vision()method that checks the provider against the_PROVIDERS_WITHOUT_VISIONset. This acts as an override signal so image stripping runs even when the models.dev cache reports vision support._prepare_messages_for_non_vision_model()so images are stripped during kwarg construction._strip_images_from_messages()call onapi_kwargsbefore every wire call as a safety net._preprocess_anthropic_content()and_content_has_image_parts()to properly unwrap the_multimodaltool-result envelope used by computer_use and other vision-capable handlers._strip_images_from_messages()to handle the_multimodalenvelope format.Layer 3 — Robust error detection (
run_agent.py)_err_bodyextraction to concatenate all three sources (.body,.message,str(error)) instead of using a booleanorchain, ensuring the match buffer always contains the full error text."unknown variant"and"failed to deserialize"to the_IMAGE_REJECTION_PHRASEStuple to specifically catch DeepSeek's Rust-serde error format.IMAGE_STRIP_DEBUGlogging throughout so future issues can be diagnosed without code changes.Testing
Verified in isolation that all strip functions work correctly:
_strip_images_from_messages()correctly removesimage_urlparts from unwrapped tool result content lists_prepare_messages_for_non_vision_model()correctly replaces images with text descriptions via the Anthropic fallback path_provider_is_known_non_vision()correctly returnsTruefor"deepseek"_content_has_image_parts()correctly detects images in the_multimodalenvelopeAlso verified that computer_use screenshots can be captured without triggering DeepSeek HTTP 400 after the fix.