feat(image-input): native multimodal routing based on model vision capability#16506
Merged
Conversation
Contributor
🚨 CRITICAL Supply Chain Risk DetectedThis PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging. 🚨 CRITICAL: Install-hook file added or modifiedThese files can execute code during package installation or interpreter startup. Files: Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting. |
Collaborator
…pability
Attach user-sent images as OpenAI-style content parts on the user turn when
the active model supports native vision, so vision-capable models see real
pixels instead of a lossy text description from vision_analyze.
Routing decision (agent/image_routing.py::decide_image_input_mode):
agent.image_input_mode = auto | native | text (default: auto)
In auto mode:
- If auxiliary.vision.provider/model is explicitly configured, keep the
text pipeline (user paid for a dedicated vision backend).
- Else if models.dev reports supports_vision=True for the active
provider/model, attach natively.
- Else fall back to text (current behaviour).
Call sites updated: gateway/run.py (all messaging platforms), tui_gateway
(dashboard/Ink), cli.py (interactive /attach + drag-drop).
run_agent.py changes:
- _prepare_anthropic_messages_for_api now passes image parts through
unchanged when the model supports vision — the Anthropic adapter
translates them to native image blocks. Previous behaviour
(vision_analyze → text) only runs for non-vision Anthropic models.
- New _prepare_messages_for_non_vision_model mirrors the same contract
for chat.completions and codex_responses paths, so non-vision models
on any provider get text-fallback instead of failing at the provider.
- New _model_supports_vision() helper reads models.dev caps.
vision_analyze description rewritten: positions it as a tool for images
NOT already visible in the conversation (URLs, tool output, deeper
inspection). Prevents the model from redundantly calling it on images
already attached natively.
Config default: agent.image_input_mode = auto.
Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py),
all existing tests that reference _prepare_anthropic_messages_for_api
still pass (198 targeted + new tests green).
…okens in compressor
Two follow-ups that make the native image routing safer for long / heavy
sessions:
1) Oversize handling in build_native_content_parts:
- 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES,
the most restrictive provider — Gemini inline data).
- Delegates to vision_tools._resize_image_for_vision (Pillow-based,
already battle-tested) to downscale to 5 MB first-try.
- If Pillow is missing or resize still overshoots, the image is
dropped and reported back in skipped[]; caller falls back to text
enrichment for that image.
2) Image-token accounting in context_compressor:
- New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant;
within the realistic range for Anthropic/GPT-4o/Gemini billing).
- _content_length_for_budget() helper: sums text-part lengths and
charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/
input_image part. Base64 payload inside image_url is NOT counted
as chars — dimensions don't matter, only image-presence.
- Both tail-cut sites (_prune_old_tool_results L527 and
_find_tail_cut_by_tokens L1126) now call the helper so multi-image
conversations don't slip past compression budget.
Tests: 9 new in test_image_routing.py (oversize triggers resize,
resize-fails-returns-None, oversize-skipped-reported), 11 new in
test_compressor_image_tokens.py (flat charge per image, multiple images,
Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on
raw base64, bounds-check on the constant, integration test that an
image-heavy tail actually gets trimmed).
15f0738 to
02e328c
Compare
…ied per-provider limits
The previous commit imposed a hardcoded 20 MB base64 ceiling on all
providers, triggering auto-resize on anything larger. This was wrong in
both directions:
* Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400
'image exceeds 5 MB maximum' above that).
* Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without
complaint (empirically verified April 2026 with progressive PNG
sizes).
New behaviour:
* _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a
ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder).
* Providers NOT in the table get no ceiling — images attach at native
size and we trust the provider to return its own error if it
disagrees. A provider-specific 400 message is clearer than us
guessing wrong and silently degrading image quality.
* build_native_content_parts() gains a keyword-only provider arg;
gateway/CLI/TUI pass the active provider so Anthropic users get
auto-resize protection while OpenAI users don't pay it.
* Resize target dropped from 5 MB to 4 MB to slide safely under
Anthropic's boundary with header overhead.
Empirical measurements (direct API, no Hermes in the loop):
image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5
0.19 MB ✓ ✓ ✓
12.37 MB ✗ 400 5MB ✓ ✓
23.85 MB ✗ 400 5MB ✓ ✓
49.46 MB ✗ 413 ✓ ✓
Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through,
Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts
routes ceiling by provider, unknown provider gets no ceiling. All 52
targeted tests pass.
…eject
Replace proactive per-provider size ceilings with a reactive shrink path
on the provider's actual rejection. All providers now attempt native
full-size attachment first; if the provider returns an image-too-large
error, the agent silently shrinks and retries once.
Why the previous design was wrong: hardcoding provider ceilings
(anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image
paid no tax, but Anthropic users lost quality on anything >5MB even
though the empirical behaviour at provider-reject time is the same
(shrink + retry). Baking the table into the routing layer also
requires updating Hermes every time a provider's limit changes.
Reactive design:
- image_routing.py: _file_to_data_url encodes native size, no ceiling.
build_native_content_parts drops its provider kwarg.
- error_classifier.py: new FailoverReason.image_too_large + pattern
match ("image exceeds", "image too large", etc.) checked BEFORE
context_overflow so Anthropic's 5MB rejection lands in the right
bucket.
- run_agent.py: new _try_shrink_image_parts_in_messages walks api
messages in-place, re-encodes oversized data: URL image parts
through vision_tools._resize_image_for_vision to fit under 4MB,
handles both chat.completions (dict image_url) and Responses
(string image_url) shapes, ignores http URLs (provider-fetched).
New image_shrink_retry_attempted flag in the retry loop fires the
shrink exactly once per turn after credential-pool recovery but
before auth retries.
E2E verified live against Anthropic claude-sonnet-4-6:
- 17.9MB PNG (23.9MB b64) attached at native size
- Anthropic returns 400 "image exceeds 5 MB maximum"
- Agent logs '📐 Image(s) exceeded provider size limit — shrank and
retrying...'
- Retry succeeds, correct response delivered in 6.8s total.
Tests: 12 new (8 shrink-helper shapes + 4 classifier signals),
replaces 5 proactive-ceiling tests with 3 simpler 'native attach works'
tests. 181 targeted tests pass. test_enum_members_exist in
test_error_classifier.py updated for the new enum value.
hermes-agent-dhabibi
pushed a commit
to hermes-agent-dhabibi/hermes-agent
that referenced
this pull request
Apr 27, 2026
…pability (NousResearch#16506) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto | native | text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value. Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
This was referenced Apr 27, 2026
cluricaun28
referenced
this pull request
in cluricaun28/Logos
Apr 28, 2026
…pability (#16506)
* feat(image-input): native multimodal routing based on model vision capability
Attach user-sent images as OpenAI-style content parts on the user turn when
the active model supports native vision, so vision-capable models see real
pixels instead of a lossy text description from vision_analyze.
Routing decision (agent/image_routing.py::decide_image_input_mode):
agent.image_input_mode = auto | native | text (default: auto)
In auto mode:
- If auxiliary.vision.provider/model is explicitly configured, keep the
text pipeline (user paid for a dedicated vision backend).
- Else if models.dev reports supports_vision=True for the active
provider/model, attach natively.
- Else fall back to text (current behaviour).
Call sites updated: gateway/run.py (all messaging platforms), tui_gateway
(dashboard/Ink), cli.py (interactive /attach + drag-drop).
run_agent.py changes:
- _prepare_anthropic_messages_for_api now passes image parts through
unchanged when the model supports vision — the Anthropic adapter
translates them to native image blocks. Previous behaviour
(vision_analyze → text) only runs for non-vision Anthropic models.
- New _prepare_messages_for_non_vision_model mirrors the same contract
for chat.completions and codex_responses paths, so non-vision models
on any provider get text-fallback instead of failing at the provider.
- New _model_supports_vision() helper reads models.dev caps.
vision_analyze description rewritten: positions it as a tool for images
NOT already visible in the conversation (URLs, tool output, deeper
inspection). Prevents the model from redundantly calling it on images
already attached natively.
Config default: agent.image_input_mode = auto.
Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py),
all existing tests that reference _prepare_anthropic_messages_for_api
still pass (198 targeted + new tests green).
* feat(image-input): size-cap + resize oversized images, charge image tokens in compressor
Two follow-ups that make the native image routing safer for long / heavy
sessions:
1) Oversize handling in build_native_content_parts:
- 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES,
the most restrictive provider — Gemini inline data).
- Delegates to vision_tools._resize_image_for_vision (Pillow-based,
already battle-tested) to downscale to 5 MB first-try.
- If Pillow is missing or resize still overshoots, the image is
dropped and reported back in skipped[]; caller falls back to text
enrichment for that image.
2) Image-token accounting in context_compressor:
- New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant;
within the realistic range for Anthropic/GPT-4o/Gemini billing).
- _content_length_for_budget() helper: sums text-part lengths and
charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/
input_image part. Base64 payload inside image_url is NOT counted
as chars — dimensions don't matter, only image-presence.
- Both tail-cut sites (_prune_old_tool_results L527 and
_find_tail_cut_by_tokens L1126) now call the helper so multi-image
conversations don't slip past compression budget.
Tests: 9 new in test_image_routing.py (oversize triggers resize,
resize-fails-returns-None, oversize-skipped-reported), 11 new in
test_compressor_image_tokens.py (flat charge per image, multiple images,
Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on
raw base64, bounds-check on the constant, integration test that an
image-heavy tail actually gets trimmed).
* fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits
The previous commit imposed a hardcoded 20 MB base64 ceiling on all
providers, triggering auto-resize on anything larger. This was wrong in
both directions:
* Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400
'image exceeds 5 MB maximum' above that).
* Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without
complaint (empirically verified April 2026 with progressive PNG
sizes).
New behaviour:
* _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a
ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder).
* Providers NOT in the table get no ceiling — images attach at native
size and we trust the provider to return its own error if it
disagrees. A provider-specific 400 message is clearer than us
guessing wrong and silently degrading image quality.
* build_native_content_parts() gains a keyword-only provider arg;
gateway/CLI/TUI pass the active provider so Anthropic users get
auto-resize protection while OpenAI users don't pay it.
* Resize target dropped from 5 MB to 4 MB to slide safely under
Anthropic's boundary with header overhead.
Empirical measurements (direct API, no Hermes in the loop):
image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5
0.19 MB ✓ ✓ ✓
12.37 MB ✗ 400 5MB ✓ ✓
23.85 MB ✗ 400 5MB ✓ ✓
49.46 MB ✗ 413 ✓ ✓
Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through,
Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts
routes ceiling by provider, unknown provider gets no ceiling. All 52
targeted tests pass.
* refactor(image-input): attempt native, shrink-and-retry on provider reject
Replace proactive per-provider size ceilings with a reactive shrink path
on the provider's actual rejection. All providers now attempt native
full-size attachment first; if the provider returns an image-too-large
error, the agent silently shrinks and retries once.
Why the previous design was wrong: hardcoding provider ceilings
(anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image
paid no tax, but Anthropic users lost quality on anything >5MB even
though the empirical behaviour at provider-reject time is the same
(shrink + retry). Baking the table into the routing layer also
requires updating Hermes every time a provider's limit changes.
Reactive design:
- image_routing.py: _file_to_data_url encodes native size, no ceiling.
build_native_content_parts drops its provider kwarg.
- error_classifier.py: new FailoverReason.image_too_large + pattern
match ("image exceeds", "image too large", etc.) checked BEFORE
context_overflow so Anthropic's 5MB rejection lands in the right
bucket.
- run_agent.py: new _try_shrink_image_parts_in_messages walks api
messages in-place, re-encodes oversized data: URL image parts
through vision_tools._resize_image_for_vision to fit under 4MB,
handles both chat.completions (dict image_url) and Responses
(string image_url) shapes, ignores http URLs (provider-fetched).
New image_shrink_retry_attempted flag in the retry loop fires the
shrink exactly once per turn after credential-pool recovery but
before auth retries.
E2E verified live against Anthropic claude-sonnet-4-6:
- 17.9MB PNG (23.9MB b64) attached at native size
- Anthropic returns 400 "image exceeds 5 MB maximum"
- Agent logs '📐 Image(s) exceeded provider size limit — shrank and
retrying...'
- Retry succeeds, correct response delivered in 6.8s total.
Tests: 12 new (8 shrink-helper shapes + 4 classifier signals),
replaces 5 proactive-ceiling tests with 3 simpler 'native attach works'
tests. 181 targeted tests pass. test_enum_members_exist in
test_error_classifier.py updated for the new enum value.
teknium1
added a commit
that referenced
this pull request
Apr 30, 2026
…ers (#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since #11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610 #10283 #10246 #11564 #13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (#16052 #16539 #16566 #15841 #14798 #10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227 #14166 #14730 #17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934 #14810 #14045 #17286 #17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (#16506 #15027 #13428 #12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (#12929 #12972 #10763 #16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231 #14232 #14307 #13683 #12373 #11891 #11291 #10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (#15045 #14473 #15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
This was referenced May 8, 2026
teknium1
added a commit
that referenced
this pull request
May 10, 2026
… not aux text (#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).
nickdlkk
pushed a commit
to nickdlkk/hermes-agent
that referenced
this pull request
May 11, 2026
…ers (NousResearch#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since NousResearch#11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (NousResearch#17417 NousResearch#16997 NousResearch#16193 NousResearch#14315 NousResearch#13151 NousResearch#11794 NousResearch#10610 NousResearch#10283 NousResearch#10246 NousResearch#11564 NousResearch#13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (NousResearch#16052 NousResearch#16539 NousResearch#16566 NousResearch#15841 NousResearch#14798 NousResearch#10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (NousResearch#17305 NousResearch#17026 NousResearch#17000 NousResearch#15077 NousResearch#14557 NousResearch#14227 NousResearch#14166 NousResearch#14730 NousResearch#17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (NousResearch#17130 NousResearch#17113 NousResearch#17175 NousResearch#17150 NousResearch#16707 NousResearch#12312 NousResearch#12305 NousResearch#12934 NousResearch#14810 NousResearch#14045 NousResearch#17286 NousResearch#17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (NousResearch#16506 NousResearch#15027 NousResearch#13428 NousResearch#12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (NousResearch#12929 NousResearch#12972 NousResearch#10763 NousResearch#16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (NousResearch#16576 NousResearch#16572 NousResearch#16383 NousResearch#15878 NousResearch#15608 NousResearch#15606 NousResearch#14809 NousResearch#14767 NousResearch#14231 NousResearch#14232 NousResearch#14307 NousResearch#13683 NousResearch#12373 NousResearch#11891 NousResearch#11291 NousResearch#10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (NousResearch#15045 NousResearch#14473 NousResearch#15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
JZKK720
pushed a commit
to JZKK720/hermes-agent
that referenced
this pull request
May 11, 2026
… not aux text (NousResearch#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (NousResearch#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).
rmulligan
pushed a commit
to rmulligan/hermes-agent
that referenced
this pull request
May 11, 2026
… not aux text (NousResearch#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (NousResearch#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).
JinyuID
pushed a commit
to JinyuID/hermes-agent
that referenced
this pull request
May 11, 2026
… not aux text (NousResearch#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (NousResearch#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).
dusterbloom
pushed a commit
to dusterbloom/hermes-agent
that referenced
this pull request
May 12, 2026
… not aux text (NousResearch#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (NousResearch#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).
bot-ted
added a commit
to bot-ted/hermes-agent
that referenced
this pull request
May 12, 2026
* chore: add Qwinty to AUTHOR_MAP
* fix(browser_tool): do not cache transient None cloud provider resolution
Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True`
before resolution. If credentials were briefly unavailable on the first
call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned
the entire process to local mode forever, even after credentials
self-healed seconds later.
Root cause: bookkeeping was set up-front, so any code path that fell
through to `return _cached_cloud_provider` (config read failure, no
credentials yet, explicit-provider instantiation failure) committed the
transient `None` to the cache permanently.
Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now
set only when (a) the user explicitly chose `cloud_provider: local`, or
(b) a provider was successfully resolved. All transient `None` paths
return without poisoning the cache, so the next call retries. Explicit
provider instantiation failures now log at warning level with stack
trace so operators can diagnose them.
Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py
covering explicit local, successful resolution, no-credentials-yet,
config read failure, and explicit provider instantiation failure.
Stash-verify confirmed the 3 transient-None tests fail without the fix.
All 320 existing browser tests still green.
Closes #22324
* fix(browser_tool): fall through to autodetect on config read failure
* fix(email): send IMAP ID extension to support 163/NetEase mailbox
163/NetEase IMAP servers reject every UID SEARCH/FETCH with `BYE Unsafe
Login` unless the client first identifies itself via the RFC 2971 ID
command after LOGIN. Without this, the email gateway logs in OK but
then fails on the very first poll and the connection is torn down.
Send the ID payload best-effort after both `imap.login()` sites
(`EmailAdapter.connect` and `_fetch_new_messages`). Failures are
swallowed at debug level so non-supporting IMAP servers (Gmail,
Outlook, Fastmail, Yahoo, etc.) keep working unchanged.
Closes #22271
* fix(email): use real hermes version in IMAP ID command
* fix(deps): declare youtube-transcript-api in pyproject.toml [youtube] extra
skills/media/youtube-content/scripts/fetch_transcript.py and
optional-skills/productivity/memento-flashcards/scripts/youtube_quiz.py
both import youtube-transcript-api at runtime, but the package was not
listed in pyproject.toml. A fresh `uv sync` therefore omits it, and
both skills fail on first invocation with:
ModuleNotFoundError: No module named 'youtube_transcript_api'
Add a new [youtube] optional-dependency group with
youtube-transcript-api>=1.2.0 (the v1.x API surface the scripts already
use) and include it in [all] so standard installs pick it up.
Regression tests: TestPyprojectDeclaresYoutubeExtra verifies the extra
is present in pyproject.toml and included in [all].
Closes #22243
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(agent): extract thinking from content-list blocks for DeepSeek V4 Pro
DeepSeek V4 Pro returns thinking content as typed blocks inside the
content array rather than as a top-level reasoning_content field:
[{"type": "thinking", "thinking": "..."}, {"type": "output", ...}]
_extract_reasoning only handled content as a plain string, so the
thinking text was silently dropped. On the next turn the session was
replayed without the thinking block, causing:
HTTP 400: The content[].thinking in the thinking mode must be
passed back to the API.
Fix: when content is a list and no structured reasoning field was
found, scan for items with type=='thinking' and accumulate their
'thinking' (or 'text') value into reasoning_parts. Structured fields
(reasoning, reasoning_content, reasoning_details) still take priority
so existing provider behaviour is unchanged.
Closes #21944
* fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open
ALTER TABLE calls inside _migrate_add_optional_columns were guarded by a
snapshot of PRAGMA table_info taken at function entry. When the gateway
dispatcher opens the kanban DB twice per tick (once in _tick_once_for_board
and once via init_db's discard-and-reconnect path), a second connection can
run the same migration before the first one commits, causing:
sqlite3.OperationalError: duplicate column name: consecutive_failures
This crashed the dispatcher on every first tick after a gateway restart
(subsequent ticks succeeded because the columns were then present).
Fix: introduce _add_column_if_missing() which wraps ALTER TABLE in a
try/except that swallows OperationalError whose message contains
'duplicate column name'. All ALTER TABLE calls in
_migrate_add_optional_columns are routed through this helper.
Closes #21708
* fix(doctor): skip pluggable provider profiles when a dedicated check exists (#22346)
Problem
-------
`hermes doctor` ran two health checks for Anthropic: a dedicated one
with the correct `x-api-key` + `anthropic-version` headers, and a
generic Bearer-auth one driven by the pluggable `ProviderProfile` for
"anthropic". The generic check called `https://api.anthropic.com/v1/models`
with `Authorization: Bearer ...`, which Anthropic answers with HTTP 404,
producing a noisy duplicate warning even when the dedicated check passed.
Root cause
----------
`hermes_cli/doctor.py:_build_apikey_providers_list` deduplicated profiles
against a `_known_canonical` set built from the static list (Z.AI/GLM,
Kimi, DeepSeek, …). Providers with their own dedicated check above the
generic loop (Anthropic, OpenRouter, Bedrock) were not in that set, so
their profiles were appended and ran a second, broken check.
Fix
---
Add `{"anthropic", "openrouter", "bedrock"}` to the skip set, and
also skip profiles whose aliases match any of those names (e.g.
`claude`, `claude-oauth` → anthropic).
Tests
-----
tests/hermes_cli/test_doctor_dedicated_provider_skip.py:
- test_build_apikey_providers_list_skips_dedicated_check_providers:
asserts the assembled list does not contain anthropic, openrouter,
or bedrock entries.
- test_build_apikey_providers_list_includes_non_dedicated_providers:
sanity guard that legitimate providers (DeepSeek, Z.AI/GLM) survive.
Both confirmed via stash-verify (fail pre-fix with anthropic/openrouter
leaking, pass post-fix).
Fixes #22346
* fix(doctor): normalize provider name and aliases before dedicated-skip check
* fix(completion): use valid zsh _arguments exclusion-group syntax
The generated zsh completion script used `(-h --help)` as the exclusion
group for `_arguments`, which zsh rejects with:
_arguments:comparguments: invalid argument: (-h --help){-h,--help}[...]
Exclusion groups in `_arguments` cannot contain long options. Use the
canonical `(-)` form (exclude all other options) which correctly
handles flag pairs like `-h`/`--help`.
Fixes NousResearch/hermes-agent#22686
* fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805)
Two co-located fixes:
1. agent/model_metadata.py: bump hy3-preview static fallback from
256000 to 262144 (256 * 1024) to match OpenRouter live metadata
so cache and offline both agree (issue #22268).
2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the
exact-value change-detector (assert ctx == 256000) with an
invariant assertion (registered + >= 4096). Per AGENTS.md
'Don't write change-detector tests': pinning the upstream-controlled
context length is exactly the test class the rule forbids — it
breaks every time the provider bumps the published value, with
zero behavioral coverage gained.
Salvage of #22574 with a redirect on the test approach. The
contributor's diff bumped the integer and added a SECOND
change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144,
which would re-break on the next published bump. We instead delete
the change-detector entirely and assert the relationship.
Closes #22268.
* fix(delegate): add explicit do-not-use guidance to acp_command/acp_args schema (carve-out of #22680)
acp_command / acp_args descriptions previously primed the model to
populate them — "Per-task ACP command override (e.g. 'copilot')" —
even when no ACP CLI was installed. Models with weaker schema-following
discipline would set them and the spawn would fail.
Add explicit "Do NOT set unless the user has explicitly told you"
guidance at both the top-level acp_command and the per-task override.
Strengthen acp_args to mention it's empty unless acp_command is set.
Adds 2 tests pinning the descriptions.
Note: this is a cosmetic prompt-engineering fix — the params remain
exposed in the schema. The fully-correct fix is to gate them behind
a config flag or runtime ACP-CLI detection so the schema only emits
them when an ACP harness is available. Tracked as a follow-up; this
PR ships the low-cost stopgap.
Salvage of #22680 (delegate schema only). The original PR also
bundled unrelated fixes for #22548, #21944, #22150 — those
need separate PRs since #22548 and #21944 are already addressed
on main (#22780 + #22798 in flight) and #22150 deserves its own
review.
Closes #22013.
* feat(gateway): add Telegram notification mode to suppress intermediate push notifications
Add a configurable notifications mode for the Telegram platform adapter
that controls which messages trigger push notifications.
- display.platforms.telegram.notifications: "all" (default) | "important"
- HERMES_TELEGRAM_NOTIFICATIONS env var override
- In "important" mode, all sends use disable_notification=True except:
- Approvals (send_exec_approval) and slash confirmations
- Final response messages (metadata["notify"]=True)
- Zero overhead in default "all" mode
- Zero impact on non-Telegram platforms
Closes #22771
* chore: add CalmProton to AUTHOR_MAP
* fix(telegram): default notifications to 'important' (silence intermediate)
Per-tool-call push notifications on Telegram are noisy enough that
'all' is the wrong default — long agent runs spam the user's notification
shade with status messages they didn't ask to be pinged about. Final
responses, approval prompts, and slash confirmations still notify;
intermediate progress, streaming, and tool-progress messages now
deliver silently via disable_notification.
Users who want the legacy behavior can opt back in with:
display:
platforms:
telegram:
notifications: all
or HERMES_TELEGRAM_NOTIFICATIONS=all.
* fix(gateway): adopt unit's HERMES_HOME for --system CLI ops
When systemd_restart / systemd_status / systemd_stop run under sudo,
HERMES_HOME is stripped and HOME=/root, so get_hermes_home() resolves
to /root/.hermes instead of the unit's pinned home. read_runtime_status
and get_running_pid then look at the wrong gateway_state.json — the
60s status poll never sees "running", times out, and forces another
systemctl restart that SIGTERMs the in-progress new gateway.
Read the unit's pinned HERMES_HOME from `systemctl show -p Environment`
and mirror it into os.environ before any HERMES_HOME-derived read.
Early-out when system=False (user-scope inherits naturally). Errors
swallowed so a transient systemctl failure doesn't break unrelated
CLI ops.
Closes #22035.
* chore: add mbac to AUTHOR_MAP
* fix(openrouter): add x-grok-conv-id header for Grok models to improve prompt cache hit rates (carve-out of #22708)
Pass session_id through to provider profile build_api_kwargs_extras so
the OpenRouter profile can attach an xAI cache-affinity header
(x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt
cache requires server affinity via this header — without it the cache
is poisoned and Grok prompt-cache hit rates drop dramatically on
multi-turn sessions.
Carve-out of #22708 by Ninso112. The original PR bundled a /diff
slash command, a zsh completion fix (already on main via #22802),
and holographic memory null-guards. This salvage keeps just the
Grok header work — small, targeted, and well-tested. Other
contributors and changes preserved for separate review.
Closes #22705.
* chore: add Ninso112 to AUTHOR_MAP
* fix(gateway): preserve Ctrl+C for Windows foreground runs
* fix: make session search initialize session db
* perf(teams): defer httpx import to first webhook call (#22831)
Same pattern as the google_chat lazy-load (PR #22681), applied to the
Teams plugin. The bundled `plugins/platforms/teams/adapter.py` did
`import httpx` at module top, which dragged the entire httpx +
httpcore stack into every process that triggered plugin discovery —
including `hermes` invocations that never instantiate the Teams
adapter.
`httpx` is only needed inside one method
(`TeamsMeetingPipeline._write_summary_via_incoming_webhook`), and the
`httpx.AsyncBaseTransport` parameter annotation is already string-only
thanks to the existing `from __future__ import annotations`. Move the
runtime import inside the method.
Measured impact (7-run medians, 9950X3D):
teams plugin alone: 118 → 89 ms (-25%)
46 → 38 MB (-17%)
import cli (full): unchanged
import model_tools: unchanged
The full-CLI numbers are flat because httpx is loaded transitively
from many other modules on that path. The microbench win is the real
signal: 29 ms / 8 MB shaved off any process that touches the teams
plugin without otherwise pulling httpx — primarily future workflows
where the gateway is enabled but Teams is not configured.
Tests: 44/44 `tests/gateway/test_teams.py` pass; 345 across all
plugin-platform suites (teams + qqbot + google_chat). The test file
imports `httpx` itself for the `MockTransport` fixture, which is
correct — tests legitimately use httpx, only the plugin's module-level
import was the issue.
* fix(acp): honor task cwd for foreground terminal commands
* feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838)
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.
- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
[{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
[0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
tui_gateway/server.py: propagate the kwarg (mirrors providers_order
plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md
Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
* fix(gateway): preserve reasoning_content, codex_message_items, finish_reason on transcript replay (#22839)
PR #2974 whitelisted three reasoning fields (reasoning, reasoning_details,
codex_reasoning_items) for the gateway's simple-text replay branch. Three
more fields were added to the DB later but the whitelist was never updated:
- reasoning_content: provider-facing thinking text. _copy_reasoning_content_for_api
promotes 'reasoning' -> 'reasoning_content' at send time only when the
strings happen to match. Carrying the original verbatim avoids loss
for providers that return them as distinct fields (DeepSeek/Kimi/
Moonshot thinking modes), and preserves the empty-string sentinel
that DeepSeek V4 Pro requires for thinking-mode replay.
- codex_message_items: exact assistant message items with 'phase'.
OpenAI docs: 'preserve and resend phase on all assistant messages —
dropping it can degrade performance.' Required for prefix cache hits.
No recovery path exists — once dropped, gone.
- finish_reason: informational; cheap to keep so transcripts replay
identically across CLI and gateway.
The CLI is unaffected because cli.py keeps the live in-memory message list
across turns (cli.py:10046 'self.conversation_history = result["messages"]').
The gateway rebuilds agent_history from the SQLite transcript on every turn,
so any field stripped during replay is silently lost.
Refactors the inline whitelist into a module-level _build_replay_entry()
helper so the contract can be unit-tested. 16 new tests pin the field set
and falsy-value handling.
Verified end-to-end: DB stores all 8 fields, replay now preserves all 8
(was preserving only 5 for assistant text turns).
* docs(openrouter): document auxiliary.<task>.extra_body for OR routing and Pareto (#22844)
The plumbing for setting OpenRouter provider preferences and the Pareto Code
router on auxiliary tasks already exists — auxiliary.<task>.extra_body is
forwarded verbatim by call_llm() / async_call_llm(). It just wasn't documented,
so users who wanted (e.g.) Pareto Code routing for compression but the strongest
coder for the main agent had no way to discover the escape hatch.
- hermes_cli/config.py: expand the auxiliary section header with a YAML
example showing provider routing plus plugins under extra_body, and an
explicit note that main-agent provider_routing / openrouter.min_coding_score
do NOT propagate to aux calls (each task is independent by design)
- website/docs/user-guide/configuration.md: new 'OpenRouter routing and
Pareto Code for auxiliary tasks' subsection with worked example
- website/docs/integrations/providers.md: cross-link from the Pareto Code
Router section to the aux-side doc
E2E verified that auxiliary.<task>.extra_body reaches the OpenRouter API with
the configured provider routing and plugins blocks intact.
* docs: round 2 audit — messaging, developer-guide, guides, integrations (#22858)
Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/,
guides/, and integrations/ against the live registries and gateway code.
messaging/
- index.md: API Server toolset is hermes-api-server (was 'hermes (default)');
Google Chat slug is hermes-google_chat (underscore — plugin name uses _).
- google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such
extra); list the actual deps (google-cloud-pubsub, google-api-python-client,
google-auth, google-auth-oauthlib).
- qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is
silently ignored by the adapter); QQ_STT_BASE_URL is not read directly —
baseUrl lives under platforms.qqbot.extra.stt.
- teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline
plugin must be enabled), not a built-in subcommand.
- sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default
SMS_WEBHOOK_HOST).
- open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to
per-profile .env, not 'hermes config set' (same pattern fixed in
api-server.md last round). Also bumped example ports to 8650+ to dodge the
default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646)
collision.
developer-guide/
- architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for
run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py,
gateway/run.py replaced with 'large file' to stop drifting.
- agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)').
- gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway
platform tree updated (qqbot is a sub-package, not qqbot.py; added
yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/
(always active)' was wrong — it's an empty extension point and
_register_builtin_hooks() is a no-op stub.
- acp-internals.md: drop fictional 'message_callback' from the bridged-
callbacks list; clarify thinking_callback is currently set to None.
- provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry,
NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama
Cloud, LM Studio, Tencent TokenHub. Fallback section described only the
legacy single-pair model — corrected to the canonical list-form
fallback_providers chain.
- environments.md: parsers list missing llama4_json and the deepseek_v31
alias; both register via @register_parser.
- browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py
which doesn't exist in-repo.
- contributing.md: tinker-atropos is a git submodule — note that
'git submodule update --init' is required if cloning without
--recurse-submodules.
guides/
- operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is
positional (not --schedule), the script-only flag is --no-agent (not
--script-only), and there's no --command flag. Replaced with a real example
that creates the script under ~/.hermes/scripts/ and uses the actual flags.
Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'.
- automation-templates.md: 'cron create --skills "a,b"' doesn't work —
the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST
rewrite.
- minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently
fails because --region isn't registered on the auth-add argparse spec.
Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for
China-region access.
- cron-script-only.md: 'hermes send' is fictional — replaced the comparison-
table mention with a webhook-subscription pointer; also fixed the dead link
to /guides/pipe-script-output (page doesn't exist).
- cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed
at 'hermes gateway' (foreground) / 'hermes gateway start' (service).
- local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right
knob is the HERMES_API_TIMEOUT env var.
- python-library.md: run_conversation() return dict has only final_response
and messages — task_id is stored on the agent instance, not echoed back.
- use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in
one quoted string, so cmd.exe gets a single arg instead of the multi-token
command line it needs. Removed the surrounding quotes — argparse nargs='*'
collects each token correctly.
integrations/
- providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist);
actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG
and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 ->
api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai)
refreshed. Fallback section rewritten to lead with the canonical
fallback_providers list form (was leading with the legacy fallback_model
single dict); supported-providers list extended to include azure-foundry,
alibaba-coding-plan, lmstudio.
index.md
- '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with
integrations/index.md ('19+') and undercounted — bumped to 20+ and added
Weixin/QQ Bot/Yuanbao/Google Chat to the list.
Validation: 'npm run build' clean (exit 0); broken-link count unchanged at
155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.
* perf(image_gen): defer fal_client import to first generation request (#22859)
`tools/image_generation_tool.py` did `import fal_client` at module
top, which pulled the entire fal_client + httpx + rich stack on every
process that ran `discover_builtin_tools()` — every `hermes` cold
start, even ones that never touch image generation.
Make the import lazy: replace the eager import with a placeholder
(`fal_client: Any = None`) and add an idempotent `_load_fal_client()`
that rebinds the module global on first use. Call it from the two
runtime entry points (`_ManagedFalSyncClient.__init__` and
`_submit_fal_request`) and from the SDK-presence check in
`check_image_generation_requirements`.
The loader short-circuits if the global is already truthy, which
preserves the test pattern of monkeypatching `fal_client` to install
a mock — the `monkeypatch.setattr(image_tool, "fal_client", ...)`
calls in test_image_generation.py keep working unchanged.
Measured impact (15-run min times, 9950X3D):
tools.image_generation_tool alone: 77 → 20 ms (-74%)
36 → 20 MB (-44%)
import cli (full): 734 → 720 ms (-2%)
import model_tools: 372 → 366 ms (-2%)
The microbench is dramatic but the full-CLI win is small — fal_client
shares its httpx + rich dependencies with the rest of the agent, so
on a real cold start most of the 16 MB / 64 ms is already paid by
other imports. The win matters mostly for processes that touch this
tool without otherwise loading httpx (rare) and for architectural
consistency with the previous lazy-load PRs (#22681 google_chat,
#22831 teams).
Tests: 55/55 `tests/tools/test_image_generation.py` pass, including
the cases that monkeypatch the module global to install a mock
fal_client. End-to-end verification confirms `import model_tools`
no longer pulls `fal_client` into `sys.modules`.
* fix(gateway): finalize final stream edit on done
* chore: add kidonng to AUTHOR_MAP
* fix(skills-hub): cover remaining SSRF fetch paths after #10029
* fix(context_compressor): treat streaming premature-close as transient error
Problem:
When a provider or proxy drops a streaming response mid-flight (httpcore
raises RemoteProtocolError: "incomplete chunked read", "peer closed
connection", "response ended prematurely", etc.), _generate_summary
would not classify it as a transient error. Instead of retrying on the
main model, it entered the generic 60-second cooldown, leaving context
growing unbounded until the cooldown expired. Issue #18458.
Root cause:
_is_connection_error in auxiliary_client.py did not match httpcore's
streaming premature-close error substrings. context_compressor.py's
_generate_summary except block never called _is_connection_error, so
those errors fell through to the 60-second generic cooldown rather than
triggering the retry-on-main fallback path used for timeouts.
Fix:
1. auxiliary_client.py — extend _is_connection_error keyword list with:
"incomplete chunked read", "peer closed connection",
"response ended prematurely", "unexpected eof",
"remoteprotocolerror", "localprotocolerror".
Also guard the `from openai import ...` with try/except ImportError
so the function works in environments without the openai package.
2. context_compressor.py — import _is_connection_error and call it in
_generate_summary's except block as _is_streaming_closed. Include
_is_streaming_closed in the fallback-to-main condition (alongside
_is_model_not_found, _is_timeout, _is_json_decode) and use the
shorter 30s transient cooldown for streaming-closed errors.
Tests:
4 new regression tests in TestStreamingClosedFallback:
- test_incomplete_chunked_read_falls_back_to_main
- test_peer_closed_connection_falls_back_to_main
- test_streaming_closed_on_main_uses_short_cooldown (stash-verified)
- test_non_streaming_unknown_error_still_uses_long_cooldown
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(session): route OR-combined short CJK tokens to LIKE fallback (#20494)
The FTS5 trigram tokenizer requires >=3 CJK characters per individual
token to produce matchable trigrams. A query like "广西 OR 桂林 OR 漓江"
has cjk_count=6 (passes the existing >=3 guard) but each token is only
2 CJK chars, so the trigram index returns 0 results.
Fix:
- Add per-token check: if any non-operator CJK token has <3 CJK chars,
force the LIKE fallback path regardless of total cjk_count.
- Expand the LIKE fallback to build one LIKE condition per non-operator
token joined with OR, so each term is matched independently.
Regression tests added in TestCJKSearchFallback:
- test_cjk_or_combined_short_tokens_returns_results
- test_cjk_short_token_or_query_preserves_filters
* fix(checkpoint): guard _touch_project against non-dict project metadata
Problem
=======
`tools.checkpoint_manager._touch_project` reads the project metadata
file with `json.loads(meta_path.read_text(...))`, then immediately does:
meta["workdir"] = str(_normalize_path(working_dir))
The `except` block only catches `(OSError, ValueError)`. When the file
parses successfully but returns a non-dict value (a list `[]`, `null`,
or a scalar from a corrupted or hand-truncated write), `json.loads`
succeeds without error and `meta` is set to, e.g., `[]`. The subsequent
subscript assignment then raises `TypeError: list indices must be
integers or slices, not str`, which is NOT caught by the narrow except
clause.
This TypeError propagates up through `_take` to `ensure_checkpoint`,
where the broad `except Exception` safety net swallows it. The effect
is that `ensure_checkpoint` silently returns False for the entire
session — all checkpoints are skipped for the affected working directory
without any user-visible error.
Root cause
==========
Missing `isinstance(meta, dict)` guard after `json.loads`, identical in
pattern to bugs fixed in `cron/jobs.py` (#22569) and
`tools/process_registry.py` (#22544). The same guard is already
present one function below in `_list_projects` (line 506), but was
inadvertently omitted in `_touch_project`.
Fix
===
Add two lines after the try/except:
```python
if not isinstance(meta, dict):
meta = {}
```
This matches the existing guard in `_list_projects` and ensures a fresh
empty dict is used whenever the persisted value is not a mapping —
preserving the `created_at` semantics via `setdefault` on the next line.
Tests
=====
`TestTouchProjectMalformedMeta` covers four non-dict root values
(`[]`, `null`, `42`, `"oops"`). Each writes a corrupted metadata file,
calls `_touch_project`, and asserts: (a) no exception raised, (b) the
metadata file is rewritten as a valid dict containing `last_touch` and
`workdir`. All four fail on main with `TypeError`, pass with fix.
Full `tests/tools/test_checkpoint_manager.py` regression: 77 passed.
* fix(update): prebuild psutil on Termux Android via Linux path shim
* fix(update): use termux-all uv fallback path on Termux
* fix(install): also patch psutil on Termux fresh-install path
The Termux update path (PR #22814) prebuilds psutil from a marker-patched
sdist so 'platform android is not supported' doesn't kill it. The same
psutil setup.py error blocks fresh installs via scripts/install.sh — only
the update path was wired up. Without this, a brand-new Termux user can't
get past the very first 'pip install -e .[termux-all]' call.
- New scripts/install_psutil_android.py — standalone version of the same
patcher hermes_cli/main.py uses, callable from bash.
- scripts/install.sh detects sys.platform == 'android' and runs the
patcher before pip install.
- TODO note added to both copies pointing at upstream
https://github.com/giampaolo/psutil/pull/2762; remove both when that
ships.
Note: we keep psutil as a base dep on Android (do not adopt the proposed
sys_platform != 'android' marker in pyproject). Removing it would crash
five unguarded 'import psutil' sites at runtime
(tools/code_execution_tool.py, tools/tts_tool.py, tools/process_registry.py
(2x), gateway/platforms/whatsapp.py).
* fix(process_registry): kill orphaned Popen on post-spawn setup failure
After Popen succeeds with os.setsid (detached process group), 5 things
happen with no try/except: Thread construction, reader.start(), lock
acquisition, prune+register, checkpoint write. If any raises, the
Popen object goes unregistered and the detached process group leaks
indefinitely.
Wrap the post-spawn setup in try/except. On failure:
- os.killpg(getpgid(pid), SIGKILL) takes down the entire process
group (not just the shell - important because of detached PG +
-lic shell wrapper that may have spawned children)
- proc.kill() fallback for ProcessLookupError/PermissionError/OSError
- proc.wait(timeout=5) reaps with a bound
- re-raise to preserve original traceback
Nested try/except around cleanup so a secondary failure can't mask the
original.
Closes #2749.
* fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV
Problem: terminal.docker_env set in config.yaml was silently ignored.
Docker containers never received the user-specified env vars.
Root cause: docker_env was missing from all three config→env bridging
maps (cli.py env_mappings, gateway/run.py _terminal_env_map,
hermes_cli/config.py _config_to_env_sync) and from the terminal_tool
_get_env_config() reader. _create_environment() consumed the key from
container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV
was never set.
Also extend the list-serialisation branches in cli.py and gateway/run.py
to handle dict values via json.dumps (lists already used json.dumps;
plain str() on a dict produces undecodable output).
Fix:
- cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings;
serialise dict values with json.dumps alongside existing list path
- gateway/run.py: same additions to _terminal_env_map and serialisation
- hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV"
to _config_to_env_sync so `hermes config set terminal.docker_env …`
persists to .env correctly
- tools/terminal_tool.py: add docker_env key to _get_env_config() reading
TERMINAL_DOCKER_ENV via _parse_env_var with default "{}"
Tests: add test_docker_env_is_bridged_everywhere to
tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on
origin/main, passes with fix.
Fixes #20537
* fix(gateway): degrade gracefully when all platform adapters are missing
When connected_count == 0 AND enabled_platform_count > 0, the gateway
treated 'all adapters returned None' identically to 'all adapters
failed to connect' — both as fatal startup errors. The 'returned None'
case happens when imports fail silently or when adapters are present
in config but their dependencies aren't installed (e.g. discord.py
missing). Cron jobs and other gateway-runtime work would unnecessarily
fail to start.
Split: only return False when startup_retryable_errors is non-empty
(real connection attempt failed). When the list is empty AND enabled
> 0, log a warning and continue running, matching the 'no platforms
enabled' cron path.
Salvage of #22642's gateway slice. Drops the bundled run_agent.py
memory-nudge counter hydration block (issue #22357 territory) which
wasn't mentioned in the PR description.
Closes #5196.
* fix(fallback): resolve api_key_env in fallback chain entries (carve-out of #22665)
Fallback chain entries with 'api_key_env: ENV_VAR_NAME' weren't being
resolved by either the init-time fallback path (line ~1660) or the
runtime _try_activate_fallback path (line ~8045). Only literal
'api_key' was honored; the snake_case 'api_key_env' alias documented
elsewhere in the config was silently dropped, so a 'provider: custom'
fallback with base_url + api_key_env worked as primary but failed as
fallback with 'no endpoint credentials found' / 401.
Adds 'or fb.get("api_key_env")' to the existing 'key_env' lookup in
both call sites, with empty-string-to-None coercion so unset env vars
don't poison the resolver.
Salvage of #22665's fallback portion. The original PR also bundled
gateway-degrade-on-no-adapters changes (those land via the carve-out
in #22853 which is the same code) and run_agent.py memory-nudge
counter hydration (issue #22357 territory, not mentioned in the
title). Drops both bundled pieces; keeps just the api_key_env fix.
Closes #5392.
* fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664)
RuntimeError('claude CLI turn timed out') from a local OpenAI-compatible
shim was falling through to FailoverReason.unknown, surfacing as 'Empty
response from model' and burning 3 retry slots on the same failing
endpoint. _classify_by_message had no timeout-message branch — only
billing/rate_limit/auth/context_overflow/model_not_found patterns. The
type-based check at line 565 also requires isinstance(error, (TimeoutError,
ConnectionError, OSError)) — a plain RuntimeError doesn't match.
Add _TIMEOUT_MESSAGE_PATTERNS for 'timed out', 'deadline exceeded',
'request timed out', 'operation timed out', 'upstream timed out', 'turn
timed out'. _classify_by_message returns FailoverReason.timeout (retryable=True)
when any pattern matches.
Salvage of #22664's classifier portion. The original PR also bundled a
fallback self-selection guard which is now redundant (already on main
via #22780) plus DeepSeek thinking and session_search fixes that are
their own separate concerns.
Follow-up to #22780 — fixes the still-broken classification of
generic-typed provider-shim timeouts that #22780's dedup didn't cover.
* fix(test_gateway): stop run_gateway() tests from rewriting the dev's installed systemd unit (#22900)
run_gateway() calls refresh_systemd_unit_if_needed() on every invocation
so restart settings stay current after exit-code-75 respawns. The
user-scope unit path resolves under Path.home() (NOT sandboxed by
conftest, only HERMES_HOME is), and generate_systemd_unit() bakes the
current HERMES_HOME into the unit's Environment= line.
Result: any test that exercises run_gateway() end-to-end on a real
Linux dev box silently rewrites the developer's installed
~/.config/systemd/user/hermes-gateway.service with a polluted
HERMES_HOME pointing at /tmp/pytest-of-<user>/.../hermes_test. On the
next reboot, systemd loads that unit, the gateway starts looking at an
empty tmp dir, and Telegram/Discord/etc. all show as 'No messaging
platforms enabled' even though the user's real config is fine. Three
tests in tests/hermes_cli/test_gateway.py hit this path:
test_run_gateway_exits_cleanly_on_keyboard_interrupt,
test_run_gateway_exits_nonzero_when_start_gateway_reports_failure, and
test_run_gateway_root_guard_has_escape_hatch.
Two-layer fix:
1. _install_fake_gateway_run helper (covers all four run_gateway() call
sites in test_gateway.py and any future ones) now also stubs
supports_systemd_services and refresh_systemd_unit_if_needed.
2. refresh_systemd_unit_if_needed() itself sniffs the generated unit
body for /pytest-of- and /hermes_test markers and refuses to write
when present. Defense in depth so a future test that bypasses the
helper still can't corrupt the dev's gateway. Tests that legitimately
exercise the refresh flow (test_run_gateway_refreshes_outdated_unit_on_boot)
patch generate_systemd_unit to return synthetic content that doesn't
carry those markers, so they keep working.
Adds test_refresh_refuses_to_bake_pytest_tmpdir_into_real_user_unit as a
regression test for the source-side guard.
* fix(gateway): detect gateway process via /proc in Docker without procps
Salvage of NousResearch/hermes-agent#7622.
Docker images often lack procps so `ps` is unavailable. Try reading
/proc/*/cmdline first (works in any Linux container) and fall back to
`ps -A eww` only when /proc is not present. PermissionError on
individual PIDs is silently skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test(gateway): stub /proc unavailability in find_gateway_pids fallback test
Follow-up test fix for #22693 — the existing test for ps-failure +
pid-file fallback needed the /proc walk path stubbed too since /proc
is now consulted first.
* fix(gateway): pass max_total_size_mb and max_file_size_mb to CheckpointManager
The /rollback command handler in gateway/run.py was constructing
CheckpointManager with only enabled and max_snapshots, omitting
max_total_size_mb and max_file_size_mb that the __init__ expects.
This caused a TypeError on every /rollback invocation when checkpoints
were enabled.
Fixes: NousResearch/hermes-agent#18841
* chore: add DanielLSM to AUTHOR_MAP
* fix: use credential_pool for custom endpoint model listing probes
Same-provider /model switches on a 'custom' endpoint kept stale credentials
because (a) _resolve_named_custom_runtime's bare-custom + explicit_base_url
path went straight to OPENAI_API_KEY/OPENROUTER_API_KEY env fallbacks
without consulting the credential pool, and (b) switch_model() guarded
against custom-provider re-resolution to preserve base_url, locking in
the prior api_key.
Now the bare-custom path queries the credential pool first (mirroring
the named-custom-provider branch behavior), and the same-provider switch
guard is removed since resolve_runtime_provider has since grown a robust
custom-resolution path that preserves base_url from model_cfg.
Refs #18681 (the gateway-side api_key wiring is still separate),
#16254, #12919.
* chore: add v1b3coder to AUTHOR_MAP
* fix(cli): preserve config comments on setting writes
* chore: add ming1523 to AUTHOR_MAP
* feat(docs): richer info panels on the Skills Hub for built-in + optional skills (#22905)
The Skills Hub at /skills had cards that, when expanded, showed only the
one-line description, tags, author, version, and an install command. For
the 163 bundled and optional skills shipped with the repo, this was thinner
than the data we already have on disk.
Three changes, all under website/:
1. extract-skills.py now pulls four extra fields per local skill:
- 'overview' — first non-heading body paragraph from SKILL.md (stripped
of admonitions/code fences, capped at ~500 chars at a sentence boundary)
- 'envVars' / 'commands' — from the prerequisites: block in frontmatter
- 'license' — from the top-level frontmatter
- 'docsPath' — slug to the per-skill /docs/user-guide/skills/.../* page,
computed with the same logic as generate-skill-docs.py
162 of 163 local skills get a non-empty overview automatically. The
remaining one (media/heartmula) has only headings/code in its body and
falls through to the description.
2. Skill TS interface + SkillCard expanded-panel render the new fields:
- Overview paragraph at the top of the panel
- Prerequisites box (env vars + required commands) when frontmatter
declares them
- License row alongside author/version
- 'View full documentation →' link to the per-skill docs page
Search now covers the overview text too, so users can find skills by
matching content from inside SKILL.md, not just the one-line description.
3. styles.module.css gains six new classes (overviewBlock, detailLabel,
overviewText, prereqBlock/Row/Kind/List/Item, docsLink) styled to match
the existing dark panel aesthetic.
External / community skills (Anthropic, LobeHub, Claude Marketplace cached
indexes) keep the old behavior — overview is empty, no prereqs, no docsPath.
Validation: 'npm run build' clean (exit 0); broken-link count unchanged at
155 baseline; all 163 generated docsPath values resolve to existing pages
under website/docs/user-guide/skills/.
* perf(cli): skip welcome banner on `chat -q` single-query mode (#22904)
`hermes chat -q "..."` printed the full welcome banner before
running the query — kawaii ASCII logo, available toolsets list,
available skills list, model name, session ID, working directory,
update-available notice. Building it took ~420 ms on cold start
(~200 ms version-update probe, the rest is toolset / skill enumeration
plus Rich panel rendering).
For a one-shot `-q` query the banner is noise: the user already
picked the prompt, doesn't need a toolset reference, and gets the
session ID + resume hint from `_print_exit_summary()` after the
response prints.
The fully-quiet `-Q` / `--quiet` machine-readable path was already
banner-free; this brings the human-facing single-query path in line
so all non-interactive invocations are fast.
Measured impact (`hermes chat -q "ok" --max-turns 1`, 10-run
percentiles, 9950X3D):
median: 1.90 → 1.75 s (-150 ms)
min: 1.80 → 1.73 s ( -70 ms)
P25: 1.82 → 1.74 s ( -80 ms)
Wider variance than expected; the banner cost overlaps with API
latency on real `chat -q` runs. Min-time delta of 70 ms is the
cleanest signal — that's the deterministic banner-build cost gone.
The 150 ms median delta picks up cases where the version-update
probe also finishes during the wait.
Interactive mode (`hermes` with no `-q`) and the `--list-tools` /
`--list-toolsets` one-shot listing commands still show the banner —
those are the contexts where it's actually wanted.
Tests: 656/656 `tests/cli/` pass on top of latest main (modulo 5 pre-
existing flakes in `test_cli_save_config_value.py` that fail with
`No module named 'ruamel'` both with and without this change).
* feat(curator): show rename map in user-visible summary (#22910)
* feat(curator): show rename map (where skills went) in user-visible summary
The full data has always been on disk in REPORT.md, but the user-visible
curator summary (gateway 💾 line, CLI session-start panel,
`hermes curator status`) was counts-only — "consolidated 4 into 2
umbrellas" with no names. Users only discovered renames when something
they expected was gone.
New `_build_rename_summary()` formats the rename map and appends it to
`final_summary`:
auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
archived 3 skill(s):
• docx-extraction → document-tools
• pdf-extraction → document-tools
• old-stale-thing — pruned (stale)
full report: hermes curator status
Empty on no-op ticks (no archives), so most ticks add zero log noise.
Cap of 10 entries keeps agent.log readable when a 50-skill
consolidation lands; the full list is always in REPORT.md.
`hermes curator status` indents continuation lines so the multi-line
summary reads as one logical field.
5 new tests in tests/agent/test_curator_classification.py covering
empty / consolidation / pruning / cap / mixed cases.
* feat(curator): show recent run summary once on `hermes update`
The rename map is now visible from where users actually look — the
update flow they explicitly run, instead of just the live gateway log
or transient CLI session-start panel.
Behavior:
- After `hermes update`, if the most recent curator run produced a
rename map (multi-line summary) that the user hasn't seen yet, print
it once with a 'last run Xh ago' header and a one-time-message
footer.
- Stamp `last_run_summary_shown_at = last_run_at` after printing so
subsequent `hermes update` invocations are silent until a newer
curator run lands.
- Silent on no-op runs (single-line summary like 'auto: no changes;
llm: no change'). Still stamps shown so we don't reconsider on
every update.
- Silent when the curator has never run (the existing first-run
notice handles that case).
Output:
ℹ Skill curator — last run 4h ago
auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
archived 3 skill(s):
• docx-extraction → document-tools
• pdf-extraction → document-tools
• old-stale-thing — pruned (stale)
full report: hermes curator status
(This message shows once per curator run. View anytime: hermes curator status)
State migration:
- `_default_state()` gains `last_run_summary_shown_at: None`. Existing
state files lack the field; `.get()` returns None; the comparison
treats any prior run as 'not yet shown' and prints once on next
update. Self-healing.
Wiring:
- Both `hermes update` paths in main.py call the new
`_print_curator_recent_run_notice()` right after the existing
first-run notice. Best-effort try/except so a state-load bug
never breaks the update flow.
6 tests in tests/hermes_cli/test_curator_recent_run_notice.py:
no-run / single-line / multi-line / show-once / new-run-resets /
time-formatter buckets.
* chore(skills): move heavy training skills + outlines to optional-skills (#22912)
These skills require heavy GPU/CUDA stacks or are niche enough that they shouldn't
be active by default. Moved to optional-skills/ where users opt-in via
`hermes skills install official/...`.
Moved:
- mlops/training/axolotl
- mlops/training/trl-fine-tuning
- mlops/training/unsloth
- mlops/inference/outlines
Counts: 91 -> 87 built-in, 72 -> 76 optional.
Auto-regenerated docs (per-skill pages + catalogs) reflect the move.
* fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913)
Linux's MAX_ARG_STRLEN caps any single argv element at 128 KB
(32 * PAGE_SIZE). The previous heredoc-in-the-command-string approach
in _write_to_sandbox put the entire tool result inside the 'bash -c'
arg, so any result over ~128 KB raised OSError [Errno 7] 'Argument
list too long' before the heredoc ever ran. The caller logged a
warning, but quiet_mode (CLI default) sets tools.* to ERROR — so the
warning never reached agent.log either, and the agent saw a 1.5 KB
preview tagged 'Full output could not be saved to sandbox'. Hits
delegate_task with 3+ subagent outputs routinely now.
Switch to passing content via env.execute(stdin_data=...). cmd is
now just 'mkdir -p X && cat > Y' (under 1 KB), and the heavyweight
payload travels through stdin where there is no argv-element limit.
E2E reproduced the user's exact 144,778-char delegate_task envelope:
old code OSError'd, new code round-trips cleanly to disk with all
three task summaries intact.
* docs(skills): clarify kanban fan-out decomposition
* chore: AUTHOR_MAP entry for eloklam (#22898)
* fix(kanban): request default board explicitly (#21819)
* test(kanban): assert re-block notification is delivered after unblock cycle
Adds test_notifier_second_blocked_delivers to cover the case where a
task is blocked, unblocked, then blocked again — the second blocked
event must still deliver a gateway notification.
Currently fails because blocked is treated as a terminal event kind,
causing the subscription to be dropped after the first block.
* fix(kanban): remove blocked kind from unsub
* chore(test): comment of test case rewrite to english
* docs(user-stories): add 18 verified social entries (99 → 117) (#22920)
Found 18 real Hermes-Agent stories from HN, X, and Reddit not yet
captured on the page. All URLs HTTP-verified to return 200 with
matching titles.
Reddit (15): r/hermesagent (Obsidian-as-memory writeup at 794 upvotes,
LLM cheatsheet at 635 upvotes, Kanban game-changer post, OpenRouter #1
ranking, AMA from the Nous team, etc.); r/LocalLLaMA, r/Rag,
r/openclaw, r/SideProject, r/LocalLLM threads where users describe
their actual setups (Qwen3.5-9b on 16gb VRAM, 5060Ti + Telegram, smart
routing tiers).
X (3): @vmiss33's 'what I use Hermes for' guide, @HeyYanvi's
X-to-NotebookLM podcast workflow, @ExileAI_0's spare-laptop Iris
running RenPy + ComfyUI, @brucexu_eth's Hermes Inc. Telegram startup
sim from the hackathon, Hype's deep-dive blog.
HN (1): 'I'm using Hermes — sandbox it like any agent.'
No component changes — all new entries fit the existing schema
(real URL, real author, real date).
* feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955)
When the active main model has native vision and the provider supports
multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini
3, OpenRouter, Nous), vision_analyze loads the image bytes and returns
them to the model as a multimodal tool-result envelope. The model then
sees the pixels directly on its next turn instead of receiving a lossy
text description from an auxiliary LLM.
Falls back to the legacy aux-LLM text path for non-vision models and
unverified providers.
Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and
Cline. All four converge on the same pattern: tool results carry image
content blocks for vision-capable provider/model combinations.
Changes
- tools/vision_tools.py: _vision_analyze_native fast path + provider
capability table (_supports_media_in_tool_results). Schema description
updated to reflect new behaviour.
- agent/codex_responses_adapter.py: function_call_output.output now
accepts the array form for multimodal tool results (was string-only).
Preflight validates input_text/input_image parts.
- agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so
tools see the live CLI/gateway override, not the stale config.yaml
default. set_runtime_main()/clear_runtime_main() helpers.
- run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn
start so vision_analyze's fast-path check sees the actual runtime.
- tests/conftest.py: clear runtime-main override between tests.
Tests
- tests/tools/test_vision_native_fast_path.py: provider capability
table, envelope shape, fast-path gating (vision-capable model uses
fast path; non-vision model falls through to aux).
- tests/run_agent/test_codex_multimodal_tool_result.py: list tool
content becomes function_call_output.output array; preflight
preserves arrays and drops unknown part types.
Live verified
- Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a
typed filepath, gets pixels back, reads exact text from images that
no aux description could capture (font color irony, multi-line
fruit-count list, etc.).
PR replaces the closed prior efforts (#16506 shipped the inbound user-
attached path; this PR closes the gap for tool-discovered images).
* fix(stream-retry): collapse two-line drop status, name provider, and let agent.log capture diagnostics (#22993)
Subagent stream drops were spamming the parent terminal with two lines
per blip ('Connection dropped...' + 'Reconnected...') while leaving zero
breadcrumb in agent.log to debug them.
Two underlying bugs, fixed together:
1. quiet_mode raised the run_agent/tools/etc. loggers to ERROR, which
filters records before root-logger file handlers see them. The comment
claimed 'File handlers still capture everything' — that was wrong.
Removed in both run_agent.py and cli.py; console quietness already
comes from hermes_logging not installing a console StreamHandler in
non-verbose mode.
2. The stream-retry blocks emitted two _emit_status calls per drop
('⚠️ Connection dropped... Reconnecting...' + '🔄 Reconnected —
resuming…') with no provider name, so multi-provider sessions had to
dig through agent.log to attribute a drop. Replaced both call sites
with a single _emit_stream_drop helper that emits ONE line naming the
provider and error class, and always writes a structured WARNING to
agent.log with subagent_id, depth, provider, base_url, error_type.
Net UX change: 6 lines per triple-subagent drop → 3 lines, each
naming the provider. agent.log now has a structured breadcrumb per
retry that didn't exist before.
Tests: 6 new tests in tests/run_agent/test_stream_drop_logging.py
covering the logger-level guard, structured WARNING content, single
status line per drop (no Reconnected follow-up), and provider naming.
* fix(kanban): drop redundant init_db() in gateway watchers (#21378)
Both `_kanban_notifier_watcher` and `_kanban_dispatcher_watcher`'s
`_tick_once_for_board` called `_kb.connect(board=slug)` immediately
followed by `_kb.init_db(board=slug)`. Since `connect()` already runs
the schema + idempotent migration on first open per process, the
explicit `init_db()` was redundant — and worse, `init_db()` deliberately
busts the per-process `_INITIALIZED_PATHS` cache and re-runs the migration
on a *second* connection that races the first.
On every cold gateway start against a legacy DB this surfaced as either
`sqlite3.OperationalError: duplicate column name: <col>` or intermittent
`database is locked` errors logged at the first tick. The duplicate-column
case is now tolerated by `_add_column_if_missing` (commit 78698381a), but
the wasted second migration plus the database-is-locked race remain
fixable by skipping the redundant call entirely.
Drops `_kb.init_db(board=slug)` at both call sites and adds a regression
test in `tests/hermes_cli/test_kanban_notify.py` that pins the absence
via source inspection plus a runtime spy.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
* chore: AUTHOR_MAP entry for li0near gmail (#21378)
* chore(models): refresh OpenRouter + Nous fallback lists (#23001)
Reorder Anthropic Opus 4.7/4.6 + Sonnet 4.6 to the top, cluster free
models at the bottom of the OpenRouter list, and mirror the same
ordering into the Nous portal list (paid models only).
- Add inclusionai/ring-2.6-1t:free
- Drop minimax-m2.5, minimax-m2.5:free, sonnet-4.5, mimo-v2.5,
glm-5v-turbo, glm-5-turbo, trinity-large-preview:free,
trinity-large-thinking, qwen3.5-plus-02-15
- Replace qwen3.5-35b-a3b with qwen3.6-35b-a3b
- Drop x-ai/grok-4.20-beta from the Nous list
* fix(kanban): /kanban slash command emits argparse garbage instead of help
Closes #21794.
`/kanban`, `/kanban help`, `/kanban --help`, and `/kanban <sub> -h`
all returned broken output to the gateway and interactive CLI. Three
underlying bugs in `hermes_cli.kanban.run_slash`:
1. argparse writes help to **stdout** but `run_slash` only captured
stderr at parse time, so `-h` text was silently swallowed and
replaced with the `(usage error: 0)` sentinel.
2. The wrapping parser used `prog="/"` and routed via a synthetic
"_top → kanban" subparser, producing `usage: / kanban …` (stray
space) and `usage: /kanban kanban …` (doubled token) in error text.
3. Bare `/kanban` and `/kanban help` dumped argparse's full ~3KB
usage tree, which reads as visual garbage in a chat bubble.
Fix: drive the kanban_parser directly (no double-wrap), rewrite prog
strings on every leaf subparser, capture stdout AND stderr around
parse_args, distinguish SystemExit(0) (help — return captured stdout)
from SystemExit(2) (error — return single-line ⚠-prefixed message),
and add an explicit chat-friendly short-help block returned for bare
invocation and the help aliases (`help`, `--help`, `-h`, `?`).
Added 5 regression tests covering bare invocation, every help alias,
subcommand help, unknown action, and missing required arg.
Affects every chat platform via gateway/run.py::_handle_kanban_command
and the interactive CLI via cli.py::_handle_kanban_command.
Co-Authored-By: Nagatha (Claude Opus 4.7) <noreply@anthropic.com>
* chore: AUTHOR_MAP entry for tymrtn (#21794)
* feat(stream-retry): add upstream + timing diagnostics to drop log (#23005)
The previous PR (#22993) gave us a structured WARNING per stream drop
but the only diagnostic was 'error_type=APIError error=Network
connection lost.' — same nothing the user started with. To actually
diagnose why subagents drop streams disproportionately we need to know
WHERE the drop happened.
Adds three breadcrumbs to the agent.log WARNING:
1. Inner exception chain. openai SDK wraps httpx errors as
APIConnectionError / APIError so the catch site only sees the
wrapper. _flatten_exception_chain walks __cause__/__context__ up to
4 levels deep and renders 'Outer(msg) <- Inner(msg)' so we can
tell ConnectError vs RemoteProtocolError vs ReadError vs
ProxyError without enabling verbose mode.
2. Upstream HTTP headers. Snapshots cf-ray, x-openrouter-provider,
x-openrouter-model, x-openrouter-id, x-request-id, server, via,
etc. from stream.response immediately after open (so they survive
even when the stream dies before the first chunk). These answer
'is one CF edge / one downstream provider responsible, or random?'
3. Per-attempt counters. bytes streamed, chunk count, elapsed time on
the dying attempt, and time-to-first-byte. Distinguishes 'couldn't
connect at all' (0s, 0 bytes) from 'died after 30s mid-stream'
(very different root causes — first is auth/routing, second is
upstream idle-kill or proxy timeout).
Plumbing:
- _stream_diag_init / _stream_diag_capture_response live on AIAgent
and produce a per-attempt dict held on request_client_holder['diag']
for closure access from the retry block.
- _call_chat_completions and _call_anthropic both initialize the diag
and increment counters per chunk/event (best-effort, never raises in
the streaming hot path).
- _log_stream_retry / _emit_stream_drop accept an optional diag and
render the new fields. Final-exhaustion log goes through the same
helper so it gets the same diagnostic dump.
- UI status line gains a brief 'after Xs' suffix when timing is
available — distinguishes 'connect failed' from 'died mid-stream'
at a glance without grepping logs.
Sample WARNING after this change:
Stream drop mid tool-call on attempt 2/3 — retrying.
subagent_id=sa-2-cafef00d depth=1 provider=openrouter
base_url=https://openrouter.ai/api/v1
error_type=APIError error=Connection error.
chain=APIError(Connection error.) <- RemoteProtocolError(peer
closed connection without sending complete message body)
http_status=200 bytes=12400 chunks=47 elapsed=12.00s ttfb=0.83s
upstream=[cf-ray=8f1a2b3c4d5e6f7g-LAX
x-openrouter-provider=Anthropic
x-openrouter-id=gen-abc123 server=cloudflare]
Tests: 10 covering diag init, header capture (whitelist enforced for
PII), exception-chain walking + depth cap, log content with full diag,
log content without diag (placeholders), UI elapsed-suffix on/off.
* fix(review): tell background reviewer not to capture transient env failures as skills (#23004)
Closes #6051.
Reported failure mode: agent migrated to WSL2, browser launch failed
because Playwright wasn't installed yet. Background reviewer captured
the failure as a durable skill (`browser-tool-launch-issue`) and the
agent kept refusing the browser tool for weeks after Playwright was
installed and verified working. Negative claims also propagated into
unrelated skills ("browser tools do not work", "cannot use Y from
execute_code").
Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both
lean hard on "be active, save things, a pass that does nothing is a
missed learning opportunity." Neither distinguished durable knowledge
from transient environment state. The reviewer was doing what it was
told.
Fix at the write site — both prompts now carry a "Do NOT capture"
section calling out:
• Environment-dependent failures (missing binaries, fresh-install
errors, post-migration path mismatches, 'command not found',
unconfigured credentials, uninstalled packages)
• Negative claims about tools or features ("X does not work")
that harden into self-cited refusals
• Session-specific transient errors that resolved before the
conversation ended
• One-off task narratives ("summarize today's market", "analyze
this PR") — also addresses the #12812 / #4538 family
Plus a positive-reframing line: when a tool fails because of setup
state, capture the FIX (install command, config step, env var)
under an existing setup/troubleshooting skill — never "this tool
doesn't work" as a standalone constraint.
Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py
(2 new + all existing review-prompt assertions). Substring-based
checks so future prompt edits don't false-fail.
* feat(codex): add gpt-5.3-codex-spark model
* fix(model-metadata): restore gpt-5.3-codex-spark fallback context
* fix(model-metadata): set codex-spark fallback context to 128k
* fix: surface Codex CLI-only models
* chore: add codex-spark salvage contributors to AUTHOR_MAP
Maps olegwn@gmail.com → nederev (PR #18286) and vesper@askclaw.dev →
askclaw-vesper (PR #19530) so the contributor attribution check passes
when their commits land via this salvage.
* docs(codex-spark): document ChatGPT Pro entitlement gating
PR #12994 stripped gpt-5.3-codex-spark on the assumption that it was
unsupported. It's actually research-preview, ChatGPT-Pro-only, exposed
via the Codex OAuth backend at chatgpt.com/backend-api/codex/models —
not via the public OpenAI API.
Add explanatory comments in:
- DEFAULT_CODEX_MODELS / _FORWARD_COMPAT_TEMPLATE_MODELS (codex_models.py)
- _CODEX_OAUTH_CONTEXT_FALLBACK (model_metadata.py)
- list_authenticated_providers' live-discovery branch (model_switch.py)
so future maintainers don't strip the entry again. Also documents the
intentional asymmetry that Spark stays out of the "openai" provider
catalog (it isn't on the public API) and why the supported_in_api
filter is *not* applied for the openai-codex route.
* test(codex-spark): add live-API regression and make picker test deterministic
Two follow-ups from self-review:
1. Add unit test for _fetch_models_from_api covering the live HTTP path.
The salvaged PR #19530 dropped the supported_in_api:false filter in
both _fetch_models_from_api and _read_cache_models, but only the
cache path had a regression test. This adds the symmetric live-fetch
test (mocked httpx) so a future drive-by cha…
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…pability (NousResearch#16506) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto | native | text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…ers (NousResearch#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since NousResearch#11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (NousResearch#17417 NousResearch#16997 NousResearch#16193 NousResearch#14315 NousResearch#13151 NousResearch#11794 NousResearch#10610 NousResearch#10283 NousResearch#10246 NousResearch#11564 NousResearch#13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (NousResearch#16052 NousResearch#16539 NousResearch#16566 NousResearch#15841 NousResearch#14798 NousResearch#10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (NousResearch#17305 NousResearch#17026 NousResearch#17000 NousResearch#15077 NousResearch#14557 NousResearch#14227 NousResearch#14166 NousResearch#14730 NousResearch#17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (NousResearch#17130 NousResearch#17113 NousResearch#17175 NousResearch#17150 NousResearch#16707 NousResearch#12312 NousResearch#12305 NousResearch#12934 NousResearch#14810 NousResearch#14045 NousResearch#17286 NousResearch#17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (NousResearch#16506 NousResearch#15027 NousResearch#13428 NousResearch#12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (NousResearch#12929 NousResearch#12972 NousResearch#10763 NousResearch#16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (NousResearch#16576 NousResearch#16572 NousResearch#16383 NousResearch#15878 NousResearch#15608 NousResearch#15606 NousResearch#14809 NousResearch#14767 NousResearch#14231 NousResearch#14232 NousResearch#14307 NousResearch#13683 NousResearch#12373 NousResearch#11891 NousResearch#11291 NousResearch#10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (NousResearch#15045 NousResearch#14473 NousResearch#15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
jsboige
pushed a commit
to jsboige/hermes-agent
that referenced
this pull request
May 14, 2026
…ers (NousResearch#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since NousResearch#11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (NousResearch#17417 NousResearch#16997 NousResearch#16193 NousResearch#14315 NousResearch#13151 NousResearch#11794 NousResearch#10610 NousResearch#10283 NousResearch#10246 NousResearch#11564 NousResearch#13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (NousResearch#16052 NousResearch#16539 NousResearch#16566 NousResearch#15841 NousResearch#14798 NousResearch#10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (NousResearch#17305 NousResearch#17026 NousResearch#17000 NousResearch#15077 NousResearch#14557 NousResearch#14227 NousResearch#14166 NousResearch#14730 NousResearch#17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (NousResearch#17130 NousResearch#17113 NousResearch#17175 NousResearch#17150 NousResearch#16707 NousResearch#12312 NousResearch#12305 NousResearch#12934 NousResearch#14810 NousResearch#14045 NousResearch#17286 NousResearch#17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (NousResearch#16506 NousResearch#15027 NousResearch#13428 NousResearch#12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (NousResearch#12929 NousResearch#12972 NousResearch#10763 NousResearch#16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (NousResearch#16576 NousResearch#16572 NousResearch#16383 NousResearch#15878 NousResearch#15608 NousResearch#15606 NousResearch#14809 NousResearch#14767 NousResearch#14231 NousResearch#14232 NousResearch#14307 NousResearch#13683 NousResearch#12373 NousResearch#11891 NousResearch#11291 NousResearch#10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (NousResearch#15045 NousResearch#14473 NousResearch#15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
jsboige
pushed a commit
to jsboige/hermes-agent
that referenced
this pull request
May 14, 2026
… not aux text (NousResearch#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (NousResearch#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).
dannyJ848
pushed a commit
to dannyJ848/hermes-agent
that referenced
this pull request
May 17, 2026
…pability (NousResearch#16506) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto | native | text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.
dannyJ848
pushed a commit
to dannyJ848/hermes-agent
that referenced
this pull request
May 17, 2026
…ers (NousResearch#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since NousResearch#11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (NousResearch#17417 NousResearch#16997 NousResearch#16193 NousResearch#14315 NousResearch#13151 NousResearch#11794 NousResearch#10610 NousResearch#10283 NousResearch#10246 NousResearch#11564 NousResearch#13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (NousResearch#16052 NousResearch#16539 NousResearch#16566 NousResearch#15841 NousResearch#14798 NousResearch#10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (NousResearch#17305 NousResearch#17026 NousResearch#17000 NousResearch#15077 NousResearch#14557 NousResearch#14227 NousResearch#14166 NousResearch#14730 NousResearch#17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (NousResearch#17130 NousResearch#17113 NousResearch#17175 NousResearch#17150 NousResearch#16707 NousResearch#12312 NousResearch#12305 NousResearch#12934 NousResearch#14810 NousResearch#14045 NousResearch#17286 NousResearch#17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (NousResearch#16506 NousResearch#15027 NousResearch#13428 NousResearch#12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (NousResearch#12929 NousResearch#12972 NousResearch#10763 NousResearch#16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (NousResearch#16576 NousResearch#16572 NousResearch#16383 NousResearch#15878 NousResearch#15608 NousResearch#15606 NousResearch#14809 NousResearch#14767 NousResearch#14231 NousResearch#14232 NousResearch#14307 NousResearch#13683 NousResearch#12373 NousResearch#11891 NousResearch#11291 NousResearch#10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (NousResearch#15045 NousResearch#14473 NousResearch#15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
AlexFoxD
pushed a commit
to AlexFoxD/hermes-agent
that referenced
this pull request
May 21, 2026
… not aux text (NousResearch#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (NousResearch#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…pability (NousResearch#16506) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto | native | text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…ers (NousResearch#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since NousResearch#11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (NousResearch#17417 NousResearch#16997 NousResearch#16193 NousResearch#14315 NousResearch#13151 NousResearch#11794 NousResearch#10610 NousResearch#10283 NousResearch#10246 NousResearch#11564 NousResearch#13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (NousResearch#16052 NousResearch#16539 NousResearch#16566 NousResearch#15841 NousResearch#14798 NousResearch#10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (NousResearch#17305 NousResearch#17026 NousResearch#17000 NousResearch#15077 NousResearch#14557 NousResearch#14227 NousResearch#14166 NousResearch#14730 NousResearch#17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (NousResearch#17130 NousResearch#17113 NousResearch#17175 NousResearch#17150 NousResearch#16707 NousResearch#12312 NousResearch#12305 NousResearch#12934 NousResearch#14810 NousResearch#14045 NousResearch#17286 NousResearch#17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (NousResearch#16506 NousResearch#15027 NousResearch#13428 NousResearch#12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (NousResearch#12929 NousResearch#12972 NousResearch#10763 NousResearch#16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (NousResearch#16576 NousResearch#16572 NousResearch#16383 NousResearch#15878 NousResearch#15608 NousResearch#15606 NousResearch#14809 NousResearch#14767 NousResearch#14231 NousResearch#14232 NousResearch#14307 NousResearch#13683 NousResearch#12373 NousResearch#11891 NousResearch#11291 NousResearch#10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (NousResearch#15045 NousResearch#14473 NousResearch#15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
… not aux text (NousResearch#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (NousResearch#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…pability (NousResearch#16506) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto | native | text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…ers (NousResearch#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since NousResearch#11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (NousResearch#17417 NousResearch#16997 NousResearch#16193 NousResearch#14315 NousResearch#13151 NousResearch#11794 NousResearch#10610 NousResearch#10283 NousResearch#10246 NousResearch#11564 NousResearch#13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (NousResearch#16052 NousResearch#16539 NousResearch#16566 NousResearch#15841 NousResearch#14798 NousResearch#10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (NousResearch#17305 NousResearch#17026 NousResearch#17000 NousResearch#15077 NousResearch#14557 NousResearch#14227 NousResearch#14166 NousResearch#14730 NousResearch#17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (NousResearch#17130 NousResearch#17113 NousResearch#17175 NousResearch#17150 NousResearch#16707 NousResearch#12312 NousResearch#12305 NousResearch#12934 NousResearch#14810 NousResearch#14045 NousResearch#17286 NousResearch#17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (NousResearch#16506 NousResearch#15027 NousResearch#13428 NousResearch#12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (NousResearch#12929 NousResearch#12972 NousResearch#10763 NousResearch#16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (NousResearch#16576 NousResearch#16572 NousResearch#16383 NousResearch#15878 NousResearch#15608 NousResearch#15606 NousResearch#14809 NousResearch#14767 NousResearch#14231 NousResearch#14232 NousResearch#14307 NousResearch#13683 NousResearch#12373 NousResearch#11891 NousResearch#11291 NousResearch#10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (NousResearch#15045 NousResearch#14473 NousResearch#15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
19 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
User-attached images reach vision-capable models as real pixels instead of a lossy
vision_analyzetext summary. Non-vision models keep the existing text pipeline — no regression, no loss of model flexibility.Routing decision (
agent/image_routing.py::decide_image_input_mode):agent.image_input_mode)auto(default)supports_vision=TrueAND no explicitauxiliary.vision.provider. Text otherwise.native_prepare_messages_for_non_vision_model.textvision_analyze(pre-PR behaviour).vision_analyzestays surfaced as a tool regardless — skills that chain it (browser screenshots, baoyu-comic gating, URL-referenced images) keep working. Description rewritten to position it as inspection-for-images-not-already-visible so vision-capable models don't redundantly call it.Changes
agent/image_routing.py(NEW, 206 LOC):decide_image_input_mode()+build_native_content_parts(). Provider/model caps read via existingget_model_capabilities().gateway/run.py: routing in_prepare_inbound_message_text+ native-parts assembly at therun_conversationcall site. Reads current provider/model from config.yaml so/modelswitches track automatically next turn.tui_gateway/server.py: same routing in the dashboard/Inkprompt.submitpath.cli.py: same routing inHermesCLI.chat()for interactive/attach+ drag-drop.run_agent.py:_prepare_anthropic_messages_for_apipasses image parts through whensupports_vision=True— the Anthropic adapter translates them to native{type:image,source:...}blocks. Legacy vision_analyze→text behaviour only runs for non-vision Anthropic models (virtually none today)._prepare_messages_for_non_vision_modelmirrors the same contract forchat.completionsandcodex_responsespaths. Non-vision models get image parts replaced with cached vision_analyze descriptions instead of failing at the provider._model_supports_vision()helper.tools/vision_tools.py: description rewrite.hermes_cli/config.py:agent.image_input_mode: autodefault.Validation
Unit tests
35 new across 2 files, all pass:
tests/agent/test_image_routing.py(22 tests):_coerce_mode,_explicit_aux_vision_override,decide_image_input_mode(all 3 modes × vision/non-vision/unknown),build_native_content_parts(text+image shape, empty-text default prompt, missing-file skip, multi-image, MIME inference).tests/run_agent/test_vision_aware_preprocessing.py(13 tests):_prepare_anthropic_messages_for_api,_prepare_messages_for_non_vision_model,_model_supports_vision— covers pass-through, strip-and-replace, mixed-content multi-message, and the caps-lookup fallback paths.198 targeted tests in related areas (
test_ctx_halving_fix,test_models_dev,test_reply_to_injection,test_shared_group_sender_prefix,test_stt_config,test_anthropic_adapter) still pass.Live E2E (this PR's code)
Two models, two test images.
Test 1 — subtle-detail image. Red background with "HERMES" in big white text, a small blue downward triangle in the top-left corner, and a tiny "42" (10px font) in the bottom-right corner.
claude-sonnet-4-6(native Anthropic)HERMES/42/downward-pointing triangle, blue✔ all correct including the tiny 42claude-sonnet-4-6(native Anthropic)qwen/qwen3-235b-a22b-thinking-2507(OpenRouter, no vision)Solid crimson background. Upper-left: bright blue downward-pointing triangle logo. Centered: large white "HERMES" text. Bottom-right: small white "42" page number.✔ — the agent stripped image parts and replaced with vision_analyze textTest 2 — spatial layout. 2×2 grid (red/blue/green/yellow), black circle in green quadrant only.
claude-sonnet-4-6claude-sonnet-4-6Timings: native ~2.5s, text ~2.5s + one vision_analyze call (~11s). Native is faster because the pixels go straight to the main model without a pre-analysis hop.
Cost / caching trade-off
Native image parts break the prompt cache on turns that carry images. Text stubs cache cleanly. That's why
automode still defers to the text pipeline when the user explicitly configuredauxiliary.vision.provider— they've opted into that trade-off on purpose.