You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For custom/local provider models that aren't in the models.dev catalog (local vLLM, internal proxy, OpenAI-compatible endpoint with a private fine-tune), agent.models_dev.get_model_capabilities() returns None. run_agent.AIAgent._model_supports_vision() then returns False, and _prepare_messages_for_non_vision_model() strips every image part out of the user turn before the request leaves the process.
Result: a vision-capable LLaVA / Qwen-VL / private fine-tune behind a custom provider never sees the user's image. The user gets a degraded text-only response with no warning.
There is currently no config-level way to declare "this custom model supports vision." The only options are (a) patch the source, or (b) inject a synthetic entry into the local models.dev cache file.
Proposed Solution
Accept a supports_vision: true flag in two places, both consulted before falling back to models.dev:
# Top-level shortcut (the legacy single-model config style)model:
provider: customdefault: my-llavabase_url: http://localhost:8000/v1supports_vision: true# Per-model under providers (matches the schema proposed in #8731)model:
provider: my-vllm # name of the custom provider entrydefault: my-llavaproviders:
my-vllm:
base_url: http://localhost:8000/v1models:
my-llava:
supports_vision: true
providers.<cfg.model.provider>.models.<model>.supports_vision — covers named custom providers, where runtime self.provider is rewritten to "custom" by _resolve_named_custom_runtime while the config still carries the user-declared name
No schema change to _normalize_custom_provider_entry — models.<id> dicts already pass through unchanged, so the new field reaches cfg_get without warnings.
Scope caveat (please read)
This is a strict-minimum patch. It only fixes the strip path in run_agent.py. The auto-mode routing decision in agent/image_routing.py:_lookup_supports_vision is not changed, which means in default agent.image_input_mode: auto, images are still pre-processed through vision_analyze even after the override is set. To get the override to drive routing too, the user must also set:
agent:
image_input_mode: native
If reviewers prefer, the same fallback can be added to _lookup_supports_vision in this PR or a follow-up. Spelled out so reviewers can decide before merge.
Patching the local models.dev cache file works but doesn't survive cache refresh and isn't discoverable to other users of the same config.
Setting agent.image_input_mode: native alone is insufficient on its own — decide_image_input_mode() honors it and attaches images natively, but _model_supports_vision() still returns False and _prepare_messages_for_non_vision_model() strips the images downstream. With this PR, declaring supports_vision: true makes the strip path agree with the routing path.
Problem or Use Case
For custom/local provider models that aren't in the models.dev catalog (local vLLM, internal proxy, OpenAI-compatible endpoint with a private fine-tune),
agent.models_dev.get_model_capabilities()returnsNone.run_agent.AIAgent._model_supports_vision()then returnsFalse, and_prepare_messages_for_non_vision_model()strips every image part out of the user turn before the request leaves the process.Result: a vision-capable LLaVA / Qwen-VL / private fine-tune behind a custom provider never sees the user's image. The user gets a degraded text-only response with no warning.
There is currently no config-level way to declare "this custom model supports vision." The only options are (a) patch the source, or (b) inject a synthetic entry into the local models.dev cache file.
Proposed Solution
Accept a
supports_vision: trueflag in two places, both consulted before falling back to models.dev:Resolution order in
_model_supports_vision():model.supports_visionproviders.<self.provider>.models.<model>.supports_visionproviders.<cfg.model.provider>.models.<model>.supports_vision— covers named custom providers, where runtimeself.provideris rewritten to"custom"by_resolve_named_custom_runtimewhile the config still carries the user-declared namemodels_dev.get_model_capabilities()lookupNo schema change to
_normalize_custom_provider_entry—models.<id>dicts already pass through unchanged, so the new field reachescfg_getwithout warnings.Scope caveat (please read)
This is a strict-minimum patch. It only fixes the strip path in
run_agent.py. Theauto-mode routing decision inagent/image_routing.py:_lookup_supports_visionis not changed, which means in defaultagent.image_input_mode: auto, images are still pre-processed throughvision_analyzeeven after the override is set. To get the override to drive routing too, the user must also set:If reviewers prefer, the same fallback can be added to
_lookup_supports_visionin this PR or a follow-up. Spelled out so reviewers can decide before merge.Alternatives Considered
vision+reasoning+tools+streamingtogether, plus/modelUI integration and a synthetic capability path inmodels_dev.py. PR feat: allow custom provider capability overrides #8942 has been waiting on review since 2026-04-13 (509 / -62 across 9 files). This issue/PR is the vision-only minimal subset that unblocks the most common pain point without making four design decisions at once. The broader change can land later as a superset.agent.image_input_mode: nativealone is insufficient on its own —decide_image_input_mode()honors it and attaches images natively, but_model_supports_vision()still returnsFalseand_prepare_messages_for_non_vision_model()strips the images downstream. With this PR, declaringsupports_vision: truemakes the strip path agree with the routing path.Feature Type
Configuration option
Scope
Small (single file, < 50 lines) — see #17936.
Contribution
Notes