Skip to content

Vision-capable model detection missing for custom providers — HTTP 400 'text is not set' #25594

@saved-j

Description

@saved-j

Bug Description

Non-vision models (and custom providers whose models aren't in models.dev registry) receive multipart tool results [{type:text,...},{type:image_url,...}] from vision-capable tools (computer_use, vision_analyze). The API rejects these with HTTP 400:

{"code": "400", "message": "Param Incorrect", "param": "`text` is not set"}

Root Cause

In run_agent.py, both the concurrent (~line 11102) and sequential (~line 11521) tool-result paths unconditionally pass through multipart content without checking _model_supports_vision():

# Current code — no vision check
_tool_content = (
    function_result["content"]
    if _is_multimodal_tool_result(function_result)
    else function_result
)

The comment says "Text-only servers that reject images are handled by the adaptive _vision_supported recovery in the API retry loop" — but this recovery does not handle the 'text' is not set error format that Xiaomi MiMo and similar OpenAI-compatible APIs produce.

Affected scenarios

  • Custom providers (e.g. custom:xiaomi-tp): get_model_capabilities() returns None because models.dev doesn't know the custom provider → _model_supports_vision() always returns False
  • Known non-vision models (mimo-v2-flash, mimo-v2-pro, deepseek-v4-flash): correctly identified as non-vision, but still receive multipart content

Expected Behavior

Non-vision models should receive plain-text tool results. The vision routing should convert multipart content to text summary before sending to the API:

if _is_multimodal_tool_result(function_result):
    if self._model_supports_vision():
        _tool_content = function_result["content"]  # multipart for vision models
    else:
        _tool_content = _multimodal_text_summary(function_result)  # text-only fallback
else:
    _tool_content = function_result

This check needs to be applied in both:

  1. _execute_tool_calls_concurrent (~line 11102)
  2. _execute_tool_calls_sequential (~line 11521)

Broader Suggestion

The _model_supports_vision() check relies on models.dev registry, which doesn't cover custom providers. Consider either:

  1. A config-level supports_vision flag per provider/model entry
  2. A runtime negotiation (send multipart, fall back to text on 400 — but handle this specific error format)
  3. Auto-detection by sending a probe image on first call

Environment

  • macOS 14.8.7, Hermes Agent (current main)
  • Providers affected: xiaomi (mimo-v2-flash), custom:xiaomi-tp (mimo-v2-omni)
  • Provider NOT affected: xiaomi-tp (mimo-v2.5-pro, supports_vision=False but no vision tools used)

Workaround

User-level patch: add _model_supports_vision() check before both content unpacking sites (already applied locally). Requires session restart (.pyc cleanup + /resume).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildersweeper:implemented-on-mainSweeper: behavior already present on current maintool/visionVision analysis and image generationtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions