Skip to content

feat(agent): single-knob native vision for custom-provider models#29679

Merged
teknium1 merged 4 commits into
mainfrom
hermes/hermes-518fd79e
May 21, 2026
Merged

feat(agent): single-knob native vision for custom-provider models#29679
teknium1 merged 4 commits into
mainfrom
hermes/hermes-518fd79e

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Salvages and completes #17936 by @CNSeniorious000.

Summary

Setting model.supports_vision: true on a custom-provider model now routes attached images natively (as image_url parts the model sees as pixels) end-to-end. Single config knob — no need to also pin agent.image_input_mode: native.

Motivating case: Qwen3.6-35B-A3B served by local llama.cpp via provider: custom. The model is image-capable but absent from models.dev, so Hermes was running every attached image through vision_analyze first and feeding the main model a lossy text description.

Changes

  • agent/image_routing.py — extract _supports_vision_override() resolver (top-level shortcut → named-provider per-model → models.dev fallback), wire it into _lookup_supports_vision() so auto-mode routing respects the override. Strict YAML bool coercion (recognises true/false/yes/no/on/off/1/0; bool("false") == True no longer leaks through). Handles named-custom-provider runtime/config disambiguation (self.provider == "custom" vs cfg.model.provider == "my-vllm").
  • run_agent.py — refactor _model_supports_vision to call the shared helper (single source of truth for the strip path and the routing path).
  • cli.py — quiet-mode -Q -q --image path now consults decide_image_input_mode() instead of unconditionally calling the text-pipeline (mirrors the interactive path).
  • scripts/release.py — AUTHOR_MAP entry for @CNSeniorious000.
  • tests/agent/test_image_routing.py — 26 new tests across TestCoerceCapabilityBool, TestSupportsVisionOverride, TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride.

Validation

Baseline (no override) With supports_vision: true
Unit tests (image_routing + vision_aware) 66/66 92/92 (+26 new)
CLI test subset 749/749 749/749
Live E2E chat -Q -q --image (haiku-4.5 via OR-as-custom, 64x64 red PNG) vision_analyze called (8 log lines), 5.8s, text-pipeline reply vision_analyze NOT called, 3.9s, native image_url → "red"

Credit

@CNSeniorious000 wrote the strip-path fix (commits 1 & 2). His PR body flagged the routing-side gap as out-of-scope and offered to extend it — taking him up on the offer with the remaining work. Authorship preserved per-commit via rebase merge.

Closes #17936.

CNSeniorious000 and others added 4 commits May 20, 2026 22:55
Custom/local provider models absent from models.dev get classified as
non-vision and have their image content stripped before reaching the
upstream API. Surface a user-facing override:

  model:
    supports_vision: true

  providers:
    my-vllm:
      models:
        my-llava:
          supports_vision: true

The override short-circuits the models.dev lookup in
_model_supports_vision(), which is the single gate guarding image-strip
preprocessing on every transport path.

Refs #8731.
Named custom providers are rewritten to provider="custom" at runtime
(hermes_cli/runtime_provider.py:_resolve_named_custom_runtime), so a
config under providers.my-vllm.models.my-llava.supports_vision was
unreachable via self.provider alone. Also try cfg.model.provider as a
candidate provider key, covering both runtime and config naming.

Adds a regression test for the named-provider path.
The contributor PR (#17936) only patched the strip path in
`_model_supports_vision()`. The auto-mode router in
`agent/image_routing._lookup_supports_vision` still only read models.dev,
so a custom-provider model declared as vision-capable would still get its
images routed through vision_analyze in the default `agent.image_input_mode:
auto` setting. Users had to set both `supports_vision: true` AND
`image_input_mode: native` to bypass the text pipeline.

Single-knob behavior now: `supports_vision: true` alone is enough in auto
mode. The strip path and the routing path consult the same resolver.

- Extract override resolution into `_supports_vision_override()` in
  agent/image_routing.py and wire it into `_lookup_supports_vision()`.
- Refactor `run_agent._model_supports_vision` to call the same helper
  (DRY, single source of truth for the resolution order).
- Strict YAML boolean coercion: `supports_vision: "false"` (quoted —
  a common YAML mistake) no longer coerces to True via bool() truthiness.
  Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1.
  Unrecognised values return None and fall through to models.dev.
- Add @CNSeniorious000 to AUTHOR_MAP for release attribution.

Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride,
TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing
contributor tests + image_routing + vision_native_fast_path +
native_image_buffer_isolation all green (92/92).
The interactive CLI input path consults decide_image_input_mode() to pick
between native image_url attachment and the vision_analyze text pipeline,
but the non-interactive 'hermes chat -Q -q ... --image FOO' path
unconditionally called _preprocess_images_with_vision() — so even with
`model.supports_vision: true` set, --image always went through the
text-pipeline. Symptom: vision_analyze runs 4-5s per image and the model
sees a lossy text summary instead of the actual pixels.

Mirror the interactive path: load config, call decide_image_input_mode,
branch on native vs text. Falls back to the text-pipeline on any import
or build error (Pyright-clean: _build_parts guarded with `is not None`).

Live E2E (provider=custom, base_url=openrouter, anthropic/claude-haiku-4.5,
red 64x64 PNG):
  baseline (no override): vision_analyze called (8 log lines), 5.8s
  with supports_vision:   vision_analyze NOT called (0 log lines),  3.9s
Same model, same image, single knob flips text→native routing.
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-518fd79e vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8992 on HEAD, 8984 on base (🆕 +8)

🆕 New issues (2):

Rule Count
invalid-argument-type 2
First entries
cli.py:14490: [invalid-argument-type] invalid-argument-type: Argument to bound method `AIAgent.run_conversation` is incorrect: Expected `str`, found `Any | list[dict[str, Any]] | str`
cli.py:14474: [invalid-argument-type] invalid-argument-type: Argument to bound method `HermesCLI._resolve_turn_agent_config` is incorrect: Expected `str`, found `Any | list[dict[str, Any]] | str`

✅ Fixed issues: none

Unchanged: 4741 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants