Vision-capable model detection missing for custom providers — HTTP 400 'text is not set'

## Bug Description

Non-vision models (and `custom` providers whose models aren't in models.dev registry) receive multipart tool results `[{type:text,...},{type:image_url,...}]` from vision-capable tools (`computer_use`, `vision_analyze`). The API rejects these with HTTP 400:

```json
{"code": "400", "message": "Param Incorrect", "param": "`text` is not set"}
```

## Root Cause

In `run_agent.py`, both the concurrent (~line 11102) and sequential (~line 11521) tool-result paths unconditionally pass through multipart content without checking `_model_supports_vision()`:

```python
# Current code — no vision check
_tool_content = (
    function_result["content"]
    if _is_multimodal_tool_result(function_result)
    else function_result
)
```

The comment says *"Text-only servers that reject images are handled by the adaptive `_vision_supported` recovery in the API retry loop"* — but this recovery does not handle the `'text' is not set` error format that Xiaomi MiMo and similar OpenAI-compatible APIs produce.

### Affected scenarios
- **Custom providers** (e.g. `custom:xiaomi-tp`): `get_model_capabilities()` returns `None` because models.dev doesn't know the custom provider → `_model_supports_vision()` always returns `False`
- **Known non-vision models** (mimo-v2-flash, mimo-v2-pro, deepseek-v4-flash): correctly identified as non-vision, but still receive multipart content

## Expected Behavior

Non-vision models should receive plain-text tool results. The vision routing should convert multipart content to text summary before sending to the API:

```python
if _is_multimodal_tool_result(function_result):
    if self._model_supports_vision():
        _tool_content = function_result["content"]  # multipart for vision models
    else:
        _tool_content = _multimodal_text_summary(function_result)  # text-only fallback
else:
    _tool_content = function_result
```

This check needs to be applied in **both**:
1. `_execute_tool_calls_concurrent` (~line 11102)
2. `_execute_tool_calls_sequential` (~line 11521)

## Broader Suggestion

The `_model_supports_vision()` check relies on `models.dev` registry, which doesn't cover custom providers. Consider either:
1. A config-level `supports_vision` flag per provider/model entry
2. A runtime negotiation (send multipart, fall back to text on 400 — but handle this specific error format)
3. Auto-detection by sending a probe image on first call

## Environment
- macOS 14.8.7, Hermes Agent (current main)
- Providers affected: `xiaomi` (mimo-v2-flash), `custom:xiaomi-tp` (mimo-v2-omni)
- Provider NOT affected: `xiaomi-tp` (mimo-v2.5-pro, `supports_vision=False` but no vision tools used)

## Workaround

User-level patch: add `_model_supports_vision()` check before both content unpacking sites (already applied locally). Requires session restart (`.pyc` cleanup + `/resume`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision-capable model detection missing for custom providers — HTTP 400 'text is not set' #25594

Bug Description

Root Cause

Affected scenarios

Expected Behavior

Broader Suggestion

Environment

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Vision-capable model detection missing for custom providers — HTTP 400 'text is not set' #25594

Description

Bug Description

Root Cause

Affected scenarios

Expected Behavior

Broader Suggestion

Environment

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions