Skip to content

[Feature]: Allow declaring supports_vision for custom-provider models (vision-only subset of #8731) #17940

@CNSeniorious000

Description

@CNSeniorious000

Problem or Use Case

For custom/local provider models that aren't in the models.dev catalog (local vLLM, internal proxy, OpenAI-compatible endpoint with a private fine-tune), agent.models_dev.get_model_capabilities() returns None. run_agent.AIAgent._model_supports_vision() then returns False, and _prepare_messages_for_non_vision_model() strips every image part out of the user turn before the request leaves the process.

Result: a vision-capable LLaVA / Qwen-VL / private fine-tune behind a custom provider never sees the user's image. The user gets a degraded text-only response with no warning.

There is currently no config-level way to declare "this custom model supports vision." The only options are (a) patch the source, or (b) inject a synthetic entry into the local models.dev cache file.

Proposed Solution

Accept a supports_vision: true flag in two places, both consulted before falling back to models.dev:

# Top-level shortcut (the legacy single-model config style)
model:
  provider: custom
  default: my-llava
  base_url: http://localhost:8000/v1
  supports_vision: true

# Per-model under providers (matches the schema proposed in #8731)
model:
  provider: my-vllm     # name of the custom provider entry
  default: my-llava
providers:
  my-vllm:
    base_url: http://localhost:8000/v1
    models:
      my-llava:
        supports_vision: true

Resolution order in _model_supports_vision():

  1. model.supports_vision
  2. providers.<self.provider>.models.<model>.supports_vision
  3. providers.<cfg.model.provider>.models.<model>.supports_vision — covers named custom providers, where runtime self.provider is rewritten to "custom" by _resolve_named_custom_runtime while the config still carries the user-declared name
  4. existing models_dev.get_model_capabilities() lookup

No schema change to _normalize_custom_provider_entrymodels.<id> dicts already pass through unchanged, so the new field reaches cfg_get without warnings.

Scope caveat (please read)

This is a strict-minimum patch. It only fixes the strip path in run_agent.py. The auto-mode routing decision in agent/image_routing.py:_lookup_supports_vision is not changed, which means in default agent.image_input_mode: auto, images are still pre-processed through vision_analyze even after the override is set. To get the override to drive routing too, the user must also set:

agent:
  image_input_mode: native

If reviewers prefer, the same fallback can be added to _lookup_supports_vision in this PR or a follow-up. Spelled out so reviewers can decide before merge.

Alternatives Considered

  • Feature Request: Allow manual capability declaration (vision/reasoning/tools) in custom_providers #8731 / feat: allow custom provider capability overrides #8942 propose the broader feature: declare vision + reasoning + tools + streaming together, plus /model UI integration and a synthetic capability path in models_dev.py. PR feat: allow custom provider capability overrides #8942 has been waiting on review since 2026-04-13 (509 / -62 across 9 files). This issue/PR is the vision-only minimal subset that unblocks the most common pain point without making four design decisions at once. The broader change can land later as a superset.
  • Patching the local models.dev cache file works but doesn't survive cache refresh and isn't discoverable to other users of the same config.
  • Setting agent.image_input_mode: native alone is insufficient on its own — decide_image_input_mode() honors it and attaches images natively, but _model_supports_vision() still returns False and _prepare_messages_for_non_vision_model() strips the images downstream. With this PR, declaring supports_vision: true makes the strip path agree with the routing path.

Feature Type

Configuration option

Scope

Small (single file, < 50 lines) — see #17936.

Contribution

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildersweeper:implemented-on-mainSweeper: behavior already present on current maintool/visionVision analysis and image generationtype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions