feat(agent): single-knob native vision for custom-provider models by teknium1 · Pull Request #29679 · NousResearch/hermes-agent

teknium1 · 2026-05-21T06:21:02Z

Salvages and completes #17936 by @CNSeniorious000.

Summary

Setting model.supports_vision: true on a custom-provider model now routes attached images natively (as image_url parts the model sees as pixels) end-to-end. Single config knob — no need to also pin agent.image_input_mode: native.

Motivating case: Qwen3.6-35B-A3B served by local llama.cpp via provider: custom. The model is image-capable but absent from models.dev, so Hermes was running every attached image through vision_analyze first and feeding the main model a lossy text description.

Changes

agent/image_routing.py — extract _supports_vision_override() resolver (top-level shortcut → named-provider per-model → models.dev fallback), wire it into _lookup_supports_vision() so auto-mode routing respects the override. Strict YAML bool coercion (recognises true/false/yes/no/on/off/1/0; bool("false") == True no longer leaks through). Handles named-custom-provider runtime/config disambiguation (self.provider == "custom" vs cfg.model.provider == "my-vllm").
run_agent.py — refactor _model_supports_vision to call the shared helper (single source of truth for the strip path and the routing path).
cli.py — quiet-mode -Q -q --image path now consults decide_image_input_mode() instead of unconditionally calling the text-pipeline (mirrors the interactive path).
scripts/release.py — AUTHOR_MAP entry for @CNSeniorious000.
tests/agent/test_image_routing.py — 26 new tests across TestCoerceCapabilityBool, TestSupportsVisionOverride, TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride.

Validation

	Baseline (no override)	With `supports_vision: true`
Unit tests (image_routing + vision_aware)	66/66	92/92 (+26 new)
CLI test subset	749/749	749/749
Live E2E `chat -Q -q --image` (haiku-4.5 via OR-as-custom, 64x64 red PNG)	`vision_analyze` called (8 log lines), 5.8s, text-pipeline reply	`vision_analyze` NOT called, 3.9s, native `image_url` → "red"

Credit

@CNSeniorious000 wrote the strip-path fix (commits 1 & 2). His PR body flagged the routing-side gap as out-of-scope and offered to extend it — taking him up on the offer with the remaining work. Authorship preserved per-commit via rebase merge.

Closes #17936.

Custom/local provider models absent from models.dev get classified as non-vision and have their image content stripped before reaching the upstream API. Surface a user-facing override: model: supports_vision: true providers: my-vllm: models: my-llava: supports_vision: true The override short-circuits the models.dev lookup in _model_supports_vision(), which is the single gate guarding image-strip preprocessing on every transport path. Refs #8731.

Named custom providers are rewritten to provider="custom" at runtime (hermes_cli/runtime_provider.py:_resolve_named_custom_runtime), so a config under providers.my-vllm.models.my-llava.supports_vision was unreachable via self.provider alone. Also try cfg.model.provider as a candidate provider key, covering both runtime and config naming. Adds a regression test for the named-provider path.

@CNSeniorious000

The contributor PR (#17936) only patched the strip path in `_model_supports_vision()`. The auto-mode router in `agent/image_routing._lookup_supports_vision` still only read models.dev, so a custom-provider model declared as vision-capable would still get its images routed through vision_analyze in the default `agent.image_input_mode: auto` setting. Users had to set both `supports_vision: true` AND `image_input_mode: native` to bypass the text pipeline. Single-knob behavior now: `supports_vision: true` alone is enough in auto mode. The strip path and the routing path consult the same resolver. - Extract override resolution into `_supports_vision_override()` in agent/image_routing.py and wire it into `_lookup_supports_vision()`. - Refactor `run_agent._model_supports_vision` to call the same helper (DRY, single source of truth for the resolution order). - Strict YAML boolean coercion: `supports_vision: "false"` (quoted — a common YAML mistake) no longer coerces to True via bool() truthiness. Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1. Unrecognised values return None and fall through to models.dev. - Add @CNSeniorious000 to AUTHOR_MAP for release attribution. Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride, TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing contributor tests + image_routing + vision_native_fast_path + native_image_buffer_isolation all green (92/92).

The interactive CLI input path consults decide_image_input_mode() to pick between native image_url attachment and the vision_analyze text pipeline, but the non-interactive 'hermes chat -Q -q ... --image FOO' path unconditionally called _preprocess_images_with_vision() — so even with `model.supports_vision: true` set, --image always went through the text-pipeline. Symptom: vision_analyze runs 4-5s per image and the model sees a lossy text summary instead of the actual pixels. Mirror the interactive path: load config, call decide_image_input_mode, branch on native vs text. Falls back to the text-pipeline on any import or build error (Pyright-clean: _build_parts guarded with `is not None`). Live E2E (provider=custom, base_url=openrouter, anthropic/claude-haiku-4.5, red 64x64 PNG): baseline (no override): vision_analyze called (8 log lines), 5.8s with supports_vision: vision_analyze NOT called (0 log lines), 3.9s Same model, same image, single knob flips text→native routing.

github-actions · 2026-05-21T06:21:47Z

🔎 Lint report: `hermes/hermes-518fd79e` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8992 on HEAD, 8984 on base (🆕 +8)

🆕 New issues (2):

Rule	Count
`invalid-argument-type`	2

First entries

cli.py:14490: [invalid-argument-type] invalid-argument-type: Argument to bound method `AIAgent.run_conversation` is incorrect: Expected `str`, found `Any | list[dict[str, Any]] | str`
cli.py:14474: [invalid-argument-type] invalid-argument-type: Argument to bound method `HermesCLI._resolve_turn_agent_config` is incorrect: Expected `str`, found `Any | list[dict[str, Any]] | str`

✅ Fixed issues: none

Unchanged: 4741 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

CNSeniorious000 and others added 4 commits May 20, 2026 22:55

teknium1 merged commit 975e130 into main May 21, 2026
17 of 18 checks passed

teknium1 deleted the hermes/hermes-518fd79e branch May 21, 2026 06:27

teknium1 mentioned this pull request May 21, 2026

feat(agent): allow declaring supports_vision via user config #17936

Closed

tillfalko mentioned this pull request May 21, 2026

fix(vision): make vision tools honor model.supports_vision #29987

Closed

17 tasks

r266-tech mentioned this pull request May 21, 2026

docs(vision): document model.supports_vision override for custom-provider models #30015

Open

AhmetArif0 mentioned this pull request May 22, 2026

fix(computer_use): consult config.yaml supports_vision override in capture routing #30135

Open

3 tasks

alt-glitch mentioned this pull request May 25, 2026

feat(image_routing): support vision: true in custom_providers list #31912

Open

teknium1 mentioned this pull request May 28, 2026

docs: 30-day overhaul — correctness audit, PR coverage, Nous Portal weave, sidebar reorg #33782

Merged

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): single-knob native vision for custom-provider models#29679

feat(agent): single-knob native vision for custom-provider models#29679
teknium1 merged 4 commits into
mainfrom
hermes/hermes-518fd79e

teknium1 commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented May 21, 2026

Summary

Changes

Validation

Credit

Uh oh!

github-actions Bot commented May 21, 2026

🔎 Lint report: hermes/hermes-518fd79e vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🔎 Lint report: `hermes/hermes-518fd79e` vs `origin/main`