feat(image-input): native multimodal routing based on model vision capability by teknium1 · Pull Request #16506 · NousResearch/hermes-agent

teknium1 · 2026-04-27T10:57:17Z

Summary

User-attached images reach vision-capable models as real pixels instead of a lossy vision_analyze text summary. Non-vision models keep the existing text pipeline — no regression, no loss of model flexibility.

Routing decision (agent/image_routing.py::decide_image_input_mode):

Config (`agent.image_input_mode`)	Effect
`auto` (default)	Native when `supports_vision=True` AND no explicit `auxiliary.vision.provider`. Text otherwise.
`native`	Always native; non-vision models auto-fall-back via `_prepare_messages_for_non_vision_model`.
`text`	Always pre-analyze via `vision_analyze` (pre-PR behaviour).

vision_analyze stays surfaced as a tool regardless — skills that chain it (browser screenshots, baoyu-comic gating, URL-referenced images) keep working. Description rewritten to position it as inspection-for-images-not-already-visible so vision-capable models don't redundantly call it.

Changes

agent/image_routing.py (NEW, 206 LOC): decide_image_input_mode() + build_native_content_parts(). Provider/model caps read via existing get_model_capabilities().
gateway/run.py: routing in _prepare_inbound_message_text + native-parts assembly at the run_conversation call site. Reads current provider/model from config.yaml so /model switches track automatically next turn.
tui_gateway/server.py: same routing in the dashboard/Ink prompt.submit path.
cli.py: same routing in HermesCLI.chat() for interactive /attach + drag-drop.
run_agent.py:
- _prepare_anthropic_messages_for_api passes image parts through when supports_vision=True — the Anthropic adapter translates them to native {type:image,source:...} blocks. Legacy vision_analyze→text behaviour only runs for non-vision Anthropic models (virtually none today).
- New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths. Non-vision models get image parts replaced with cached vision_analyze descriptions instead of failing at the provider.
- New _model_supports_vision() helper.
tools/vision_tools.py: description rewrite.
hermes_cli/config.py: agent.image_input_mode: auto default.

Validation

Unit tests

35 new across 2 files, all pass:

tests/agent/test_image_routing.py (22 tests): _coerce_mode, _explicit_aux_vision_override, decide_image_input_mode (all 3 modes × vision/non-vision/unknown), build_native_content_parts (text+image shape, empty-text default prompt, missing-file skip, multi-image, MIME inference).
tests/run_agent/test_vision_aware_preprocessing.py (13 tests): _prepare_anthropic_messages_for_api, _prepare_messages_for_non_vision_model, _model_supports_vision — covers pass-through, strip-and-replace, mixed-content multi-message, and the caps-lookup fallback paths.

198 targeted tests in related areas (test_ctx_halving_fix, test_models_dev, test_reply_to_injection, test_shared_group_sender_prefix, test_stt_config, test_anthropic_adapter) still pass.

Live E2E (this PR's code)

Two models, two test images.

Test 1 — subtle-detail image. Red background with "HERMES" in big white text, a small blue downward triangle in the top-left corner, and a tiny "42" (10px font) in the bottom-right corner.

Model	Mode	Response
`claude-sonnet-4-6` (native Anthropic)	native	`HERMES` / `42` / `downward-pointing triangle, blue` ✔ all correct including the tiny 42
`claude-sonnet-4-6` (native Anthropic)	text (prod path)	same answers — but the aux vision model (Gemini 3 Flash) happened to catch the 42
`qwen/qwen3-235b-a22b-thinking-2507` (OpenRouter, no vision)	native-mis-route → auto-strip-fallback	`Solid crimson background. Upper-left: bright blue downward-pointing triangle logo. Centered: large white "HERMES" text. Bottom-right: small white "42" page number.` ✔ — the agent stripped image parts and replaced with vision_analyze text

Test 2 — spatial layout. 2×2 grid (red/blue/green/yellow), black circle in green quadrant only.

Model	Mode	Response
`claude-sonnet-4-6`	native	Correct all 4 quadrants + "circle in bottom-left (green)" ✔
`claude-sonnet-4-6`	text	Same ✔

Timings: native ~2.5s, text ~2.5s + one vision_analyze call (~11s). Native is faster because the pixels go straight to the main model without a pre-analysis hop.

Cost / caching trade-off

Native image parts break the prompt cache on turns that carry images. Text stubs cache cleanly. That's why auto mode still defers to the text pipeline when the user explicitly configured auxiliary.vision.provider — they've opted into that trade-off on purpose.

github-actions · 2026-04-27T10:57:34Z

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py
skills/productivity/google-workspace/scripts/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

alt-glitch · 2026-04-27T11:01:06Z

Likely duplicate of #8610 — same feature (native multimodal vision routing based on model capability). Also related to #4535 (native image content flow across gateway/agent). Please coordinate to avoid redundant work.

…pability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto | native | text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green).

…okens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed).

…ied per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass.

…eject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.

…pability (NousResearch#16506) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto | native | text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value. Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

…pability (#16506) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto | native | text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.

…ers (#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since #11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610 #10283 #10246 #11564 #13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (#16052 #16539 #16566 #15841 #14798 #10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227 #14166 #14730 #17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934 #14810 #14045 #17286 #17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (#16506 #15027 #13428 #12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (#12929 #12972 #10763 #16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231 #14232 #14307 #13683 #12373 #11891 #11291 #10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (#15045 #14473 #15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.

… not aux text (#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).

…ers (NousResearch#17727) Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since NousResearch#11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (NousResearch#17417 NousResearch#16997 NousResearch#16193 NousResearch#14315 NousResearch#13151 NousResearch#11794 NousResearch#10610 NousResearch#10283 NousResearch#10246 NousResearch#11564 NousResearch#13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (NousResearch#16052 NousResearch#16539 NousResearch#16566 NousResearch#15841 NousResearch#14798 NousResearch#10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (NousResearch#17305 NousResearch#17026 NousResearch#17000 NousResearch#15077 NousResearch#14557 NousResearch#14227 NousResearch#14166 NousResearch#14730 NousResearch#17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (NousResearch#17130 NousResearch#17113 NousResearch#17175 NousResearch#17150 NousResearch#16707 NousResearch#12312 NousResearch#12305 NousResearch#12934 NousResearch#14810 NousResearch#14045 NousResearch#17286 NousResearch#17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (NousResearch#16506 NousResearch#15027 NousResearch#13428 NousResearch#12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (NousResearch#12929 NousResearch#12972 NousResearch#10763 NousResearch#16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (NousResearch#16576 NousResearch#16572 NousResearch#16383 NousResearch#15878 NousResearch#15608 NousResearch#15606 NousResearch#14809 NousResearch#14767 NousResearch#14231 NousResearch#14232 NousResearch#14307 NousResearch#13683 NousResearch#12373 NousResearch#11891 NousResearch#11291 NousResearch#10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (NousResearch#15045 NousResearch#14473 NousResearch#15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.

… not aux text (NousResearch#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (NousResearch#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).

* chore: add Qwinty to AUTHOR_MAP * fix(browser_tool): do not cache transient None cloud provider resolution Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True` before resolution. If credentials were briefly unavailable on the first call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned the entire process to local mode forever, even after credentials self-healed seconds later. Root cause: bookkeeping was set up-front, so any code path that fell through to `return _cached_cloud_provider` (config read failure, no credentials yet, explicit-provider instantiation failure) committed the transient `None` to the cache permanently. Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now set only when (a) the user explicitly chose `cloud_provider: local`, or (b) a provider was successfully resolved. All transient `None` paths return without poisoning the cache, so the next call retries. Explicit provider instantiation failures now log at warning level with stack trace so operators can diagnose them. Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py covering explicit local, successful resolution, no-credentials-yet, config read failure, and explicit provider instantiation failure. Stash-verify confirmed the 3 transient-None tests fail without the fix. All 320 existing browser tests still green. Closes #22324 * fix(browser_tool): fall through to autodetect on config read failure * fix(email): send IMAP ID extension to support 163/NetEase mailbox 163/NetEase IMAP servers reject every UID SEARCH/FETCH with `BYE Unsafe Login` unless the client first identifies itself via the RFC 2971 ID command after LOGIN. Without this, the email gateway logs in OK but then fails on the very first poll and the connection is torn down. Send the ID payload best-effort after both `imap.login()` sites (`EmailAdapter.connect` and `_fetch_new_messages`). Failures are swallowed at debug level so non-supporting IMAP servers (Gmail, Outlook, Fastmail, Yahoo, etc.) keep working unchanged. Closes #22271 * fix(email): use real hermes version in IMAP ID command * fix(deps): declare youtube-transcript-api in pyproject.toml [youtube] extra skills/media/youtube-content/scripts/fetch_transcript.py and optional-skills/productivity/memento-flashcards/scripts/youtube_quiz.py both import youtube-transcript-api at runtime, but the package was not listed in pyproject.toml. A fresh `uv sync` therefore omits it, and both skills fail on first invocation with: ModuleNotFoundError: No module named 'youtube_transcript_api' Add a new [youtube] optional-dependency group with youtube-transcript-api>=1.2.0 (the v1.x API surface the scripts already use) and include it in [all] so standard installs pick it up. Regression tests: TestPyprojectDeclaresYoutubeExtra verifies the extra is present in pyproject.toml and included in [all]. Closes #22243 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(agent): extract thinking from content-list blocks for DeepSeek V4 Pro DeepSeek V4 Pro returns thinking content as typed blocks inside the content array rather than as a top-level reasoning_content field: [{"type": "thinking", "thinking": "..."}, {"type": "output", ...}] _extract_reasoning only handled content as a plain string, so the thinking text was silently dropped. On the next turn the session was replayed without the thinking block, causing: HTTP 400: The content[].thinking in the thinking mode must be passed back to the API. Fix: when content is a list and no structured reasoning field was found, scan for items with type=='thinking' and accumulate their 'thinking' (or 'text') value into reasoning_parts. Structured fields (reasoning, reasoning_content, reasoning_details) still take priority so existing provider behaviour is unchanged. Closes #21944 * fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open ALTER TABLE calls inside _migrate_add_optional_columns were guarded by a snapshot of PRAGMA table_info taken at function entry. When the gateway dispatcher opens the kanban DB twice per tick (once in _tick_once_for_board and once via init_db's discard-and-reconnect path), a second connection can run the same migration before the first one commits, causing: sqlite3.OperationalError: duplicate column name: consecutive_failures This crashed the dispatcher on every first tick after a gateway restart (subsequent ticks succeeded because the columns were then present). Fix: introduce _add_column_if_missing() which wraps ALTER TABLE in a try/except that swallows OperationalError whose message contains 'duplicate column name'. All ALTER TABLE calls in _migrate_add_optional_columns are routed through this helper. Closes #21708 * fix(doctor): skip pluggable provider profiles when a dedicated check exists (#22346) Problem ------- `hermes doctor` ran two health checks for Anthropic: a dedicated one with the correct `x-api-key` + `anthropic-version` headers, and a generic Bearer-auth one driven by the pluggable `ProviderProfile` for "anthropic". The generic check called `https://api.anthropic.com/v1/models` with `Authorization: Bearer ...`, which Anthropic answers with HTTP 404, producing a noisy duplicate warning even when the dedicated check passed. Root cause ---------- `hermes_cli/doctor.py:_build_apikey_providers_list` deduplicated profiles against a `_known_canonical` set built from the static list (Z.AI/GLM, Kimi, DeepSeek, …). Providers with their own dedicated check above the generic loop (Anthropic, OpenRouter, Bedrock) were not in that set, so their profiles were appended and ran a second, broken check. Fix --- Add `{"anthropic", "openrouter", "bedrock"}` to the skip set, and also skip profiles whose aliases match any of those names (e.g. `claude`, `claude-oauth` → anthropic). Tests ----- tests/hermes_cli/test_doctor_dedicated_provider_skip.py: - test_build_apikey_providers_list_skips_dedicated_check_providers: asserts the assembled list does not contain anthropic, openrouter, or bedrock entries. - test_build_apikey_providers_list_includes_non_dedicated_providers: sanity guard that legitimate providers (DeepSeek, Z.AI/GLM) survive. Both confirmed via stash-verify (fail pre-fix with anthropic/openrouter leaking, pass post-fix). Fixes #22346 * fix(doctor): normalize provider name and aliases before dedicated-skip check * fix(completion): use valid zsh _arguments exclusion-group syntax The generated zsh completion script used `(-h --help)` as the exclusion group for `_arguments`, which zsh rejects with: _arguments:comparguments: invalid argument: (-h --help){-h,--help}[...] Exclusion groups in `_arguments` cannot contain long options. Use the canonical `(-)` form (exclude all other options) which correctly handles flag pairs like `-h`/`--help`. Fixes NousResearch/hermes-agent#22686 * fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805) Two co-located fixes: 1. agent/model_metadata.py: bump hy3-preview static fallback from 256000 to 262144 (256 * 1024) to match OpenRouter live metadata so cache and offline both agree (issue #22268). 2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the exact-value change-detector (assert ctx == 256000) with an invariant assertion (registered + >= 4096). Per AGENTS.md 'Don't write change-detector tests': pinning the upstream-controlled context length is exactly the test class the rule forbids — it breaks every time the provider bumps the published value, with zero behavioral coverage gained. Salvage of #22574 with a redirect on the test approach. The contributor's diff bumped the integer and added a SECOND change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144, which would re-break on the next published bump. We instead delete the change-detector entirely and assert the relationship. Closes #22268. * fix(delegate): add explicit do-not-use guidance to acp_command/acp_args schema (carve-out of #22680) acp_command / acp_args descriptions previously primed the model to populate them — "Per-task ACP command override (e.g. 'copilot')" — even when no ACP CLI was installed. Models with weaker schema-following discipline would set them and the spawn would fail. Add explicit "Do NOT set unless the user has explicitly told you" guidance at both the top-level acp_command and the per-task override. Strengthen acp_args to mention it's empty unless acp_command is set. Adds 2 tests pinning the descriptions. Note: this is a cosmetic prompt-engineering fix — the params remain exposed in the schema. The fully-correct fix is to gate them behind a config flag or runtime ACP-CLI detection so the schema only emits them when an ACP harness is available. Tracked as a follow-up; this PR ships the low-cost stopgap. Salvage of #22680 (delegate schema only). The original PR also bundled unrelated fixes for #22548, #21944, #22150 — those need separate PRs since #22548 and #21944 are already addressed on main (#22780 + #22798 in flight) and #22150 deserves its own review. Closes #22013. * feat(gateway): add Telegram notification mode to suppress intermediate push notifications Add a configurable notifications mode for the Telegram platform adapter that controls which messages trigger push notifications. - display.platforms.telegram.notifications: "all" (default) | "important" - HERMES_TELEGRAM_NOTIFICATIONS env var override - In "important" mode, all sends use disable_notification=True except: - Approvals (send_exec_approval) and slash confirmations - Final response messages (metadata["notify"]=True) - Zero overhead in default "all" mode - Zero impact on non-Telegram platforms Closes #22771 * chore: add CalmProton to AUTHOR_MAP * fix(telegram): default notifications to 'important' (silence intermediate) Per-tool-call push notifications on Telegram are noisy enough that 'all' is the wrong default — long agent runs spam the user's notification shade with status messages they didn't ask to be pinged about. Final responses, approval prompts, and slash confirmations still notify; intermediate progress, streaming, and tool-progress messages now deliver silently via disable_notification. Users who want the legacy behavior can opt back in with: display: platforms: telegram: notifications: all or HERMES_TELEGRAM_NOTIFICATIONS=all. * fix(gateway): adopt unit's HERMES_HOME for --system CLI ops When systemd_restart / systemd_status / systemd_stop run under sudo, HERMES_HOME is stripped and HOME=/root, so get_hermes_home() resolves to /root/.hermes instead of the unit's pinned home. read_runtime_status and get_running_pid then look at the wrong gateway_state.json — the 60s status poll never sees "running", times out, and forces another systemctl restart that SIGTERMs the in-progress new gateway. Read the unit's pinned HERMES_HOME from `systemctl show -p Environment` and mirror it into os.environ before any HERMES_HOME-derived read. Early-out when system=False (user-scope inherits naturally). Errors swallowed so a transient systemctl failure doesn't break unrelated CLI ops. Closes #22035. * chore: add mbac to AUTHOR_MAP * fix(openrouter): add x-grok-conv-id header for Grok models to improve prompt cache hit rates (carve-out of #22708) Pass session_id through to provider profile build_api_kwargs_extras so the OpenRouter profile can attach an xAI cache-affinity header (x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt cache requires server affinity via this header — without it the cache is poisoned and Grok prompt-cache hit rates drop dramatically on multi-turn sessions. Carve-out of #22708 by Ninso112. The original PR bundled a /diff slash command, a zsh completion fix (already on main via #22802), and holographic memory null-guards. This salvage keeps just the Grok header work — small, targeted, and well-tested. Other contributors and changes preserved for separate review. Closes #22705. * chore: add Ninso112 to AUTHOR_MAP * fix(gateway): preserve Ctrl+C for Windows foreground runs * fix: make session search initialize session db * perf(teams): defer httpx import to first webhook call (#22831) Same pattern as the google_chat lazy-load (PR #22681), applied to the Teams plugin. The bundled `plugins/platforms/teams/adapter.py` did `import httpx` at module top, which dragged the entire httpx + httpcore stack into every process that triggered plugin discovery — including `hermes` invocations that never instantiate the Teams adapter. `httpx` is only needed inside one method (`TeamsMeetingPipeline._write_summary_via_incoming_webhook`), and the `httpx.AsyncBaseTransport` parameter annotation is already string-only thanks to the existing `from __future__ import annotations`. Move the runtime import inside the method. Measured impact (7-run medians, 9950X3D): teams plugin alone: 118 → 89 ms (-25%) 46 → 38 MB (-17%) import cli (full): unchanged import model_tools: unchanged The full-CLI numbers are flat because httpx is loaded transitively from many other modules on that path. The microbench win is the real signal: 29 ms / 8 MB shaved off any process that touches the teams plugin without otherwise pulling httpx — primarily future workflows where the gateway is enabled but Teams is not configured. Tests: 44/44 `tests/gateway/test_teams.py` pass; 345 across all plugin-platform suites (teams + qqbot + google_chat). The test file imports `httpx` itself for the `MockTransport` fixture, which is correct — tests legitimately use httpx, only the plugin's module-level import was the issue. * fix(acp): honor task cwd for foreground terminal commands * feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838) Pick openrouter/pareto-code as your model and OpenRouter auto-routes each request to the cheapest model meeting your coding-quality bar (ranked by Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0, default 0.65) tunes the floor. - hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so it shows up in the picker with a description - hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands on a mid-tier coder on the current Pareto frontier) - plugins/model-providers/openrouter: emit extra_body.plugins = [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code AND the score is a valid float in [0.0, 1.0] - agent/transports/chat_completions.py: same emission on the legacy flag path (when no provider profile is loaded) - run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into both build_kwargs() invocations and the context-summary extra_body path - cli.py: read openrouter.min_coding_score once at init, validate float in [0,1], pass to AIAgent constructions (CLI + background-task paths) - cron/scheduler.py, batch_runner.py, tools/delegate_tool.py, tui_gateway/server.py: propagate the kwarg (mirrors providers_order plumbing — subagents inherit, cron/batch read from config) - tests: profile-level + transport-level coverage of the model gating, unset/empty/out-of-range handling, and the legacy flag path - docs: new 'OpenRouter Pareto Code Router' section in providers.md Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a mid-tier coder, at omission we get the strongest. Score is silently dropped on any model other than openrouter/pareto-code, so it's safe to leave set. * fix(gateway): preserve reasoning_content, codex_message_items, finish_reason on transcript replay (#22839) PR #2974 whitelisted three reasoning fields (reasoning, reasoning_details, codex_reasoning_items) for the gateway's simple-text replay branch. Three more fields were added to the DB later but the whitelist was never updated: - reasoning_content: provider-facing thinking text. _copy_reasoning_content_for_api promotes 'reasoning' -> 'reasoning_content' at send time only when the strings happen to match. Carrying the original verbatim avoids loss for providers that return them as distinct fields (DeepSeek/Kimi/ Moonshot thinking modes), and preserves the empty-string sentinel that DeepSeek V4 Pro requires for thinking-mode replay. - codex_message_items: exact assistant message items with 'phase'. OpenAI docs: 'preserve and resend phase on all assistant messages — dropping it can degrade performance.' Required for prefix cache hits. No recovery path exists — once dropped, gone. - finish_reason: informational; cheap to keep so transcripts replay identically across CLI and gateway. The CLI is unaffected because cli.py keeps the live in-memory message list across turns (cli.py:10046 'self.conversation_history = result["messages"]'). The gateway rebuilds agent_history from the SQLite transcript on every turn, so any field stripped during replay is silently lost. Refactors the inline whitelist into a module-level _build_replay_entry() helper so the contract can be unit-tested. 16 new tests pin the field set and falsy-value handling. Verified end-to-end: DB stores all 8 fields, replay now preserves all 8 (was preserving only 5 for assistant text turns). * docs(openrouter): document auxiliary.<task>.extra_body for OR routing and Pareto (#22844) The plumbing for setting OpenRouter provider preferences and the Pareto Code router on auxiliary tasks already exists — auxiliary.<task>.extra_body is forwarded verbatim by call_llm() / async_call_llm(). It just wasn't documented, so users who wanted (e.g.) Pareto Code routing for compression but the strongest coder for the main agent had no way to discover the escape hatch. - hermes_cli/config.py: expand the auxiliary section header with a YAML example showing provider routing plus plugins under extra_body, and an explicit note that main-agent provider_routing / openrouter.min_coding_score do NOT propagate to aux calls (each task is independent by design) - website/docs/user-guide/configuration.md: new 'OpenRouter routing and Pareto Code for auxiliary tasks' subsection with worked example - website/docs/integrations/providers.md: cross-link from the Pareto Code Router section to the aux-side doc E2E verified that auxiliary.<task>.extra_body reaches the OpenRouter API with the configured provider routing and plugins blocks intact. * docs: round 2 audit — messaging, developer-guide, guides, integrations (#22858) Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/, guides/, and integrations/ against the live registries and gateway code. messaging/ - index.md: API Server toolset is hermes-api-server (was 'hermes (default)'); Google Chat slug is hermes-google_chat (underscore — plugin name uses _). - google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such extra); list the actual deps (google-cloud-pubsub, google-api-python-client, google-auth, google-auth-oauthlib). - qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is silently ignored by the adapter); QQ_STT_BASE_URL is not read directly — baseUrl lives under platforms.qqbot.extra.stt. - teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline plugin must be enabled), not a built-in subcommand. - sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default SMS_WEBHOOK_HOST). - open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to per-profile .env, not 'hermes config set' (same pattern fixed in api-server.md last round). Also bumped example ports to 8650+ to dodge the default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646) collision. developer-guide/ - architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py, gateway/run.py replaced with 'large file' to stop drifting. - agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)'). - gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway platform tree updated (qqbot is a sub-package, not qqbot.py; added yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/ (always active)' was wrong — it's an empty extension point and _register_builtin_hooks() is a no-op stub. - acp-internals.md: drop fictional 'message_callback' from the bridged- callbacks list; clarify thinking_callback is currently set to None. - provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry, NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama Cloud, LM Studio, Tencent TokenHub. Fallback section described only the legacy single-pair model — corrected to the canonical list-form fallback_providers chain. - environments.md: parsers list missing llama4_json and the deepseek_v31 alias; both register via @register_parser. - browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py which doesn't exist in-repo. - contributing.md: tinker-atropos is a git submodule — note that 'git submodule update --init' is required if cloning without --recurse-submodules. guides/ - operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is positional (not --schedule), the script-only flag is --no-agent (not --script-only), and there's no --command flag. Replaced with a real example that creates the script under ~/.hermes/scripts/ and uses the actual flags. Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'. - automation-templates.md: 'cron create --skills "a,b"' doesn't work — the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST rewrite. - minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently fails because --region isn't registered on the auth-add argparse spec. Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for China-region access. - cron-script-only.md: 'hermes send' is fictional — replaced the comparison- table mention with a webhook-subscription pointer; also fixed the dead link to /guides/pipe-script-output (page doesn't exist). - cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed at 'hermes gateway' (foreground) / 'hermes gateway start' (service). - local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right knob is the HERMES_API_TIMEOUT env var. - python-library.md: run_conversation() return dict has only final_response and messages — task_id is stored on the agent instance, not echoed back. - use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in one quoted string, so cmd.exe gets a single arg instead of the multi-token command line it needs. Removed the surrounding quotes — argparse nargs='*' collects each token correctly. integrations/ - providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist); actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 -> api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai) refreshed. Fallback section rewritten to lead with the canonical fallback_providers list form (was leading with the legacy fallback_model single dict); supported-providers list extended to include azure-foundry, alibaba-coding-plan, lmstudio. index.md - '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with integrations/index.md ('19+') and undercounted — bumped to 20+ and added Weixin/QQ Bot/Yuanbao/Google Chat to the list. Validation: 'npm run build' clean (exit 0); broken-link count unchanged at 155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89. * perf(image_gen): defer fal_client import to first generation request (#22859) `tools/image_generation_tool.py` did `import fal_client` at module top, which pulled the entire fal_client + httpx + rich stack on every process that ran `discover_builtin_tools()` — every `hermes` cold start, even ones that never touch image generation. Make the import lazy: replace the eager import with a placeholder (`fal_client: Any = None`) and add an idempotent `_load_fal_client()` that rebinds the module global on first use. Call it from the two runtime entry points (`_ManagedFalSyncClient.__init__` and `_submit_fal_request`) and from the SDK-presence check in `check_image_generation_requirements`. The loader short-circuits if the global is already truthy, which preserves the test pattern of monkeypatching `fal_client` to install a mock — the `monkeypatch.setattr(image_tool, "fal_client", ...)` calls in test_image_generation.py keep working unchanged. Measured impact (15-run min times, 9950X3D): tools.image_generation_tool alone: 77 → 20 ms (-74%) 36 → 20 MB (-44%) import cli (full): 734 → 720 ms (-2%) import model_tools: 372 → 366 ms (-2%) The microbench is dramatic but the full-CLI win is small — fal_client shares its httpx + rich dependencies with the rest of the agent, so on a real cold start most of the 16 MB / 64 ms is already paid by other imports. The win matters mostly for processes that touch this tool without otherwise loading httpx (rare) and for architectural consistency with the previous lazy-load PRs (#22681 google_chat, #22831 teams). Tests: 55/55 `tests/tools/test_image_generation.py` pass, including the cases that monkeypatch the module global to install a mock fal_client. End-to-end verification confirms `import model_tools` no longer pulls `fal_client` into `sys.modules`. * fix(gateway): finalize final stream edit on done * chore: add kidonng to AUTHOR_MAP * fix(skills-hub): cover remaining SSRF fetch paths after #10029 * fix(context_compressor): treat streaming premature-close as transient error Problem: When a provider or proxy drops a streaming response mid-flight (httpcore raises RemoteProtocolError: "incomplete chunked read", "peer closed connection", "response ended prematurely", etc.), _generate_summary would not classify it as a transient error. Instead of retrying on the main model, it entered the generic 60-second cooldown, leaving context growing unbounded until the cooldown expired. Issue #18458. Root cause: _is_connection_error in auxiliary_client.py did not match httpcore's streaming premature-close error substrings. context_compressor.py's _generate_summary except block never called _is_connection_error, so those errors fell through to the 60-second generic cooldown rather than triggering the retry-on-main fallback path used for timeouts. Fix: 1. auxiliary_client.py — extend _is_connection_error keyword list with: "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror". Also guard the `from openai import ...` with try/except ImportError so the function works in environments without the openai package. 2. context_compressor.py — import _is_connection_error and call it in _generate_summary's except block as _is_streaming_closed. Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode) and use the shorter 30s transient cooldown for streaming-closed errors. Tests: 4 new regression tests in TestStreamingClosedFallback: - test_incomplete_chunked_read_falls_back_to_main - test_peer_closed_connection_falls_back_to_main - test_streaming_closed_on_main_uses_short_cooldown (stash-verified) - test_non_streaming_unknown_error_still_uses_long_cooldown Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(session): route OR-combined short CJK tokens to LIKE fallback (#20494) The FTS5 trigram tokenizer requires >=3 CJK characters per individual token to produce matchable trigrams. A query like "广西 OR 桂林 OR 漓江" has cjk_count=6 (passes the existing >=3 guard) but each token is only 2 CJK chars, so the trigram index returns 0 results. Fix: - Add per-token check: if any non-operator CJK token has <3 CJK chars, force the LIKE fallback path regardless of total cjk_count. - Expand the LIKE fallback to build one LIKE condition per non-operator token joined with OR, so each term is matched independently. Regression tests added in TestCJKSearchFallback: - test_cjk_or_combined_short_tokens_returns_results - test_cjk_short_token_or_query_preserves_filters * fix(checkpoint): guard _touch_project against non-dict project metadata Problem ======= `tools.checkpoint_manager._touch_project` reads the project metadata file with `json.loads(meta_path.read_text(...))`, then immediately does: meta["workdir"] = str(_normalize_path(working_dir)) The `except` block only catches `(OSError, ValueError)`. When the file parses successfully but returns a non-dict value (a list `[]`, `null`, or a scalar from a corrupted or hand-truncated write), `json.loads` succeeds without error and `meta` is set to, e.g., `[]`. The subsequent subscript assignment then raises `TypeError: list indices must be integers or slices, not str`, which is NOT caught by the narrow except clause. This TypeError propagates up through `_take` to `ensure_checkpoint`, where the broad `except Exception` safety net swallows it. The effect is that `ensure_checkpoint` silently returns False for the entire session — all checkpoints are skipped for the affected working directory without any user-visible error. Root cause ========== Missing `isinstance(meta, dict)` guard after `json.loads`, identical in pattern to bugs fixed in `cron/jobs.py` (#22569) and `tools/process_registry.py` (#22544). The same guard is already present one function below in `_list_projects` (line 506), but was inadvertently omitted in `_touch_project`. Fix === Add two lines after the try/except: ```python if not isinstance(meta, dict): meta = {} ``` This matches the existing guard in `_list_projects` and ensures a fresh empty dict is used whenever the persisted value is not a mapping — preserving the `created_at` semantics via `setdefault` on the next line. Tests ===== `TestTouchProjectMalformedMeta` covers four non-dict root values (`[]`, `null`, `42`, `"oops"`). Each writes a corrupted metadata file, calls `_touch_project`, and asserts: (a) no exception raised, (b) the metadata file is rewritten as a valid dict containing `last_touch` and `workdir`. All four fail on main with `TypeError`, pass with fix. Full `tests/tools/test_checkpoint_manager.py` regression: 77 passed. * fix(update): prebuild psutil on Termux Android via Linux path shim * fix(update): use termux-all uv fallback path on Termux * fix(install): also patch psutil on Termux fresh-install path The Termux update path (PR #22814) prebuilds psutil from a marker-patched sdist so 'platform android is not supported' doesn't kill it. The same psutil setup.py error blocks fresh installs via scripts/install.sh — only the update path was wired up. Without this, a brand-new Termux user can't get past the very first 'pip install -e .[termux-all]' call. - New scripts/install_psutil_android.py — standalone version of the same patcher hermes_cli/main.py uses, callable from bash. - scripts/install.sh detects sys.platform == 'android' and runs the patcher before pip install. - TODO note added to both copies pointing at upstream https://github.com/giampaolo/psutil/pull/2762; remove both when that ships. Note: we keep psutil as a base dep on Android (do not adopt the proposed sys_platform != 'android' marker in pyproject). Removing it would crash five unguarded 'import psutil' sites at runtime (tools/code_execution_tool.py, tools/tts_tool.py, tools/process_registry.py (2x), gateway/platforms/whatsapp.py). * fix(process_registry): kill orphaned Popen on post-spawn setup failure After Popen succeeds with os.setsid (detached process group), 5 things happen with no try/except: Thread construction, reader.start(), lock acquisition, prune+register, checkpoint write. If any raises, the Popen object goes unregistered and the detached process group leaks indefinitely. Wrap the post-spawn setup in try/except. On failure: - os.killpg(getpgid(pid), SIGKILL) takes down the entire process group (not just the shell - important because of detached PG + -lic shell wrapper that may have spawned children) - proc.kill() fallback for ProcessLookupError/PermissionError/OSError - proc.wait(timeout=5) reaps with a bound - re-raise to preserve original traceback Nested try/except around cleanup so a secondary failure can't mask the original. Closes #2749. * fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV Problem: terminal.docker_env set in config.yaml was silently ignored. Docker containers never received the user-specified env vars. Root cause: docker_env was missing from all three config→env bridging maps (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync) and from the terminal_tool _get_env_config() reader. _create_environment() consumed the key from container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV was never set. Also extend the list-serialisation branches in cli.py and gateway/run.py to handle dict values via json.dumps (lists already used json.dumps; plain str() on a dict produces undecodable output). Fix: - cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings; serialise dict values with json.dumps alongside existing list path - gateway/run.py: same additions to _terminal_env_map and serialisation - hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV" to _config_to_env_sync so `hermes config set terminal.docker_env …` persists to .env correctly - tools/terminal_tool.py: add docker_env key to _get_env_config() reading TERMINAL_DOCKER_ENV via _parse_env_var with default "{}" Tests: add test_docker_env_is_bridged_everywhere to tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on origin/main, passes with fix. Fixes #20537 * fix(gateway): degrade gracefully when all platform adapters are missing When connected_count == 0 AND enabled_platform_count > 0, the gateway treated 'all adapters returned None' identically to 'all adapters failed to connect' — both as fatal startup errors. The 'returned None' case happens when imports fail silently or when adapters are present in config but their dependencies aren't installed (e.g. discord.py missing). Cron jobs and other gateway-runtime work would unnecessarily fail to start. Split: only return False when startup_retryable_errors is non-empty (real connection attempt failed). When the list is empty AND enabled > 0, log a warning and continue running, matching the 'no platforms enabled' cron path. Salvage of #22642's gateway slice. Drops the bundled run_agent.py memory-nudge counter hydration block (issue #22357 territory) which wasn't mentioned in the PR description. Closes #5196. * fix(fallback): resolve api_key_env in fallback chain entries (carve-out of #22665) Fallback chain entries with 'api_key_env: ENV_VAR_NAME' weren't being resolved by either the init-time fallback path (line ~1660) or the runtime _try_activate_fallback path (line ~8045). Only literal 'api_key' was honored; the snake_case 'api_key_env' alias documented elsewhere in the config was silently dropped, so a 'provider: custom' fallback with base_url + api_key_env worked as primary but failed as fallback with 'no endpoint credentials found' / 401. Adds 'or fb.get("api_key_env")' to the existing 'key_env' lookup in both call sites, with empty-string-to-None coercion so unset env vars don't poison the resolver. Salvage of #22665's fallback portion. The original PR also bundled gateway-degrade-on-no-adapters changes (those land via the carve-out in #22853 which is the same code) and run_agent.py memory-nudge counter hydration (issue #22357 territory, not mentioned in the title). Drops both bundled pieces; keeps just the api_key_env fix. Closes #5392. * fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664) RuntimeError('claude CLI turn timed out') from a local OpenAI-compatible shim was falling through to FailoverReason.unknown, surfacing as 'Empty response from model' and burning 3 retry slots on the same failing endpoint. _classify_by_message had no timeout-message branch — only billing/rate_limit/auth/context_overflow/model_not_found patterns. The type-based check at line 565 also requires isinstance(error, (TimeoutError, ConnectionError, OSError)) — a plain RuntimeError doesn't match. Add _TIMEOUT_MESSAGE_PATTERNS for 'timed out', 'deadline exceeded', 'request timed out', 'operation timed out', 'upstream timed out', 'turn timed out'. _classify_by_message returns FailoverReason.timeout (retryable=True) when any pattern matches. Salvage of #22664's classifier portion. The original PR also bundled a fallback self-selection guard which is now redundant (already on main via #22780) plus DeepSeek thinking and session_search fixes that are their own separate concerns. Follow-up to #22780 — fixes the still-broken classification of generic-typed provider-shim timeouts that #22780's dedup didn't cover. * fix(test_gateway): stop run_gateway() tests from rewriting the dev's installed systemd unit (#22900) run_gateway() calls refresh_systemd_unit_if_needed() on every invocation so restart settings stay current after exit-code-75 respawns. The user-scope unit path resolves under Path.home() (NOT sandboxed by conftest, only HERMES_HOME is), and generate_systemd_unit() bakes the current HERMES_HOME into the unit's Environment= line. Result: any test that exercises run_gateway() end-to-end on a real Linux dev box silently rewrites the developer's installed ~/.config/systemd/user/hermes-gateway.service with a polluted HERMES_HOME pointing at /tmp/pytest-of-<user>/.../hermes_test. On the next reboot, systemd loads that unit, the gateway starts looking at an empty tmp dir, and Telegram/Discord/etc. all show as 'No messaging platforms enabled' even though the user's real config is fine. Three tests in tests/hermes_cli/test_gateway.py hit this path: test_run_gateway_exits_cleanly_on_keyboard_interrupt, test_run_gateway_exits_nonzero_when_start_gateway_reports_failure, and test_run_gateway_root_guard_has_escape_hatch. Two-layer fix: 1. _install_fake_gateway_run helper (covers all four run_gateway() call sites in test_gateway.py and any future ones) now also stubs supports_systemd_services and refresh_systemd_unit_if_needed. 2. refresh_systemd_unit_if_needed() itself sniffs the generated unit body for /pytest-of- and /hermes_test markers and refuses to write when present. Defense in depth so a future test that bypasses the helper still can't corrupt the dev's gateway. Tests that legitimately exercise the refresh flow (test_run_gateway_refreshes_outdated_unit_on_boot) patch generate_systemd_unit to return synthetic content that doesn't carry those markers, so they keep working. Adds test_refresh_refuses_to_bake_pytest_tmpdir_into_real_user_unit as a regression test for the source-side guard. * fix(gateway): detect gateway process via /proc in Docker without procps Salvage of NousResearch/hermes-agent#7622. Docker images often lack procps so `ps` is unavailable. Try reading /proc/*/cmdline first (works in any Linux container) and fall back to `ps -A eww` only when /proc is not present. PermissionError on individual PIDs is silently skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(gateway): stub /proc unavailability in find_gateway_pids fallback test Follow-up test fix for #22693 — the existing test for ps-failure + pid-file fallback needed the /proc walk path stubbed too since /proc is now consulted first. * fix(gateway): pass max_total_size_mb and max_file_size_mb to CheckpointManager The /rollback command handler in gateway/run.py was constructing CheckpointManager with only enabled and max_snapshots, omitting max_total_size_mb and max_file_size_mb that the __init__ expects. This caused a TypeError on every /rollback invocation when checkpoints were enabled. Fixes: NousResearch/hermes-agent#18841 * chore: add DanielLSM to AUTHOR_MAP * fix: use credential_pool for custom endpoint model listing probes Same-provider /model switches on a 'custom' endpoint kept stale credentials because (a) _resolve_named_custom_runtime's bare-custom + explicit_base_url path went straight to OPENAI_API_KEY/OPENROUTER_API_KEY env fallbacks without consulting the credential pool, and (b) switch_model() guarded against custom-provider re-resolution to preserve base_url, locking in the prior api_key. Now the bare-custom path queries the credential pool first (mirroring the named-custom-provider branch behavior), and the same-provider switch guard is removed since resolve_runtime_provider has since grown a robust custom-resolution path that preserves base_url from model_cfg. Refs #18681 (the gateway-side api_key wiring is still separate), #16254, #12919. * chore: add v1b3coder to AUTHOR_MAP * fix(cli): preserve config comments on setting writes * chore: add ming1523 to AUTHOR_MAP * feat(docs): richer info panels on the Skills Hub for built-in + optional skills (#22905) The Skills Hub at /skills had cards that, when expanded, showed only the one-line description, tags, author, version, and an install command. For the 163 bundled and optional skills shipped with the repo, this was thinner than the data we already have on disk. Three changes, all under website/: 1. extract-skills.py now pulls four extra fields per local skill: - 'overview' — first non-heading body paragraph from SKILL.md (stripped of admonitions/code fences, capped at ~500 chars at a sentence boundary) - 'envVars' / 'commands' — from the prerequisites: block in frontmatter - 'license' — from the top-level frontmatter - 'docsPath' — slug to the per-skill /docs/user-guide/skills/.../* page, computed with the same logic as generate-skill-docs.py 162 of 163 local skills get a non-empty overview automatically. The remaining one (media/heartmula) has only headings/code in its body and falls through to the description. 2. Skill TS interface + SkillCard expanded-panel render the new fields: - Overview paragraph at the top of the panel - Prerequisites box (env vars + required commands) when frontmatter declares them - License row alongside author/version - 'View full documentation →' link to the per-skill docs page Search now covers the overview text too, so users can find skills by matching content from inside SKILL.md, not just the one-line description. 3. styles.module.css gains six new classes (overviewBlock, detailLabel, overviewText, prereqBlock/Row/Kind/List/Item, docsLink) styled to match the existing dark panel aesthetic. External / community skills (Anthropic, LobeHub, Claude Marketplace cached indexes) keep the old behavior — overview is empty, no prereqs, no docsPath. Validation: 'npm run build' clean (exit 0); broken-link count unchanged at 155 baseline; all 163 generated docsPath values resolve to existing pages under website/docs/user-guide/skills/. * perf(cli): skip welcome banner on `chat -q` single-query mode (#22904) `hermes chat -q "..."` printed the full welcome banner before running the query — kawaii ASCII logo, available toolsets list, available skills list, model name, session ID, working directory, update-available notice. Building it took ~420 ms on cold start (~200 ms version-update probe, the rest is toolset / skill enumeration plus Rich panel rendering). For a one-shot `-q` query the banner is noise: the user already picked the prompt, doesn't need a toolset reference, and gets the session ID + resume hint from `_print_exit_summary()` after the response prints. The fully-quiet `-Q` / `--quiet` machine-readable path was already banner-free; this brings the human-facing single-query path in line so all non-interactive invocations are fast. Measured impact (`hermes chat -q "ok" --max-turns 1`, 10-run percentiles, 9950X3D): median: 1.90 → 1.75 s (-150 ms) min: 1.80 → 1.73 s ( -70 ms) P25: 1.82 → 1.74 s ( -80 ms) Wider variance than expected; the banner cost overlaps with API latency on real `chat -q` runs. Min-time delta of 70 ms is the cleanest signal — that's the deterministic banner-build cost gone. The 150 ms median delta picks up cases where the version-update probe also finishes during the wait. Interactive mode (`hermes` with no `-q`) and the `--list-tools` / `--list-toolsets` one-shot listing commands still show the banner — those are the contexts where it's actually wanted. Tests: 656/656 `tests/cli/` pass on top of latest main (modulo 5 pre- existing flakes in `test_cli_save_config_value.py` that fail with `No module named 'ruamel'` both with and without this change). * feat(curator): show rename map in user-visible summary (#22910) * feat(curator): show rename map (where skills went) in user-visible summary The full data has always been on disk in REPORT.md, but the user-visible curator summary (gateway 💾 line, CLI session-start panel, `hermes curator status`) was counts-only — "consolidated 4 into 2 umbrellas" with no names. Users only discovered renames when something they expected was gone. New `_build_rename_summary()` formats the rename map and appends it to `final_summary`: auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1 archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale-thing — pruned (stale) full report: hermes curator status Empty on no-op ticks (no archives), so most ticks add zero log noise. Cap of 10 entries keeps agent.log readable when a 50-skill consolidation lands; the full list is always in REPORT.md. `hermes curator status` indents continuation lines so the multi-line summary reads as one logical field. 5 new tests in tests/agent/test_curator_classification.py covering empty / consolidation / pruning / cap / mixed cases. * feat(curator): show recent run summary once on `hermes update` The rename map is now visible from where users actually look — the update flow they explicitly run, instead of just the live gateway log or transient CLI session-start panel. Behavior: - After `hermes update`, if the most recent curator run produced a rename map (multi-line summary) that the user hasn't seen yet, print it once with a 'last run Xh ago' header and a one-time-message footer. - Stamp `last_run_summary_shown_at = last_run_at` after printing so subsequent `hermes update` invocations are silent until a newer curator run lands. - Silent on no-op runs (single-line summary like 'auto: no changes; llm: no change'). Still stamps shown so we don't reconsider on every update. - Silent when the curator has never run (the existing first-run notice handles that case). Output: ℹ Skill curator — last run 4h ago auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1 archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale-thing — pruned (stale) full report: hermes curator status (This message shows once per curator run. View anytime: hermes curator status) State migration: - `_default_state()` gains `last_run_summary_shown_at: None`. Existing state files lack the field; `.get()` returns None; the comparison treats any prior run as 'not yet shown' and prints once on next update. Self-healing. Wiring: - Both `hermes update` paths in main.py call the new `_print_curator_recent_run_notice()` right after the existing first-run notice. Best-effort try/except so a state-load bug never breaks the update flow. 6 tests in tests/hermes_cli/test_curator_recent_run_notice.py: no-run / single-line / multi-line / show-once / new-run-resets / time-formatter buckets. * chore(skills): move heavy training skills + outlines to optional-skills (#22912) These skills require heavy GPU/CUDA stacks or are niche enough that they shouldn't be active by default. Moved to optional-skills/ where users opt-in via `hermes skills install official/...`. Moved: - mlops/training/axolotl - mlops/training/trl-fine-tuning - mlops/training/unsloth - mlops/inference/outlines Counts: 91 -> 87 built-in, 72 -> 76 optional. Auto-regenerated docs (per-skill pages + catalogs) reflect the move. * fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913) Linux's MAX_ARG_STRLEN caps any single argv element at 128 KB (32 * PAGE_SIZE). The previous heredoc-in-the-command-string approach in _write_to_sandbox put the entire tool result inside the 'bash -c' arg, so any result over ~128 KB raised OSError [Errno 7] 'Argument list too long' before the heredoc ever ran. The caller logged a warning, but quiet_mode (CLI default) sets tools.* to ERROR — so the warning never reached agent.log either, and the agent saw a 1.5 KB preview tagged 'Full output could not be saved to sandbox'. Hits delegate_task with 3+ subagent outputs routinely now. Switch to passing content via env.execute(stdin_data=...). cmd is now just 'mkdir -p X && cat > Y' (under 1 KB), and the heavyweight payload travels through stdin where there is no argv-element limit. E2E reproduced the user's exact 144,778-char delegate_task envelope: old code OSError'd, new code round-trips cleanly to disk with all three task summaries intact. * docs(skills): clarify kanban fan-out decomposition * chore: AUTHOR_MAP entry for eloklam (#22898) * fix(kanban): request default board explicitly (#21819) * test(kanban): assert re-block notification is delivered after unblock cycle Adds test_notifier_second_blocked_delivers to cover the case where a task is blocked, unblocked, then blocked again — the second blocked event must still deliver a gateway notification. Currently fails because blocked is treated as a terminal event kind, causing the subscription to be dropped after the first block. * fix(kanban): remove blocked kind from unsub * chore(test): comment of test case rewrite to english * docs(user-stories): add 18 verified social entries (99 → 117) (#22920) Found 18 real Hermes-Agent stories from HN, X, and Reddit not yet captured on the page. All URLs HTTP-verified to return 200 with matching titles. Reddit (15): r/hermesagent (Obsidian-as-memory writeup at 794 upvotes, LLM cheatsheet at 635 upvotes, Kanban game-changer post, OpenRouter #1 ranking, AMA from the Nous team, etc.); r/LocalLLaMA, r/Rag, r/openclaw, r/SideProject, r/LocalLLM threads where users describe their actual setups (Qwen3.5-9b on 16gb VRAM, 5060Ti + Telegram, smart routing tiers). X (3): @vmiss33's 'what I use Hermes for' guide, @HeyYanvi's X-to-NotebookLM podcast workflow, @ExileAI_0's spare-laptop Iris running RenPy + ComfyUI, @brucexu_eth's Hermes Inc. Telegram startup sim from the hackathon, Hype's deep-dive blog. HN (1): 'I'm using Hermes — sandbox it like any agent.' No component changes — all new entries fit the existing schema (real URL, real author, real date). * feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images). * fix(stream-retry): collapse two-line drop status, name provider, and let agent.log capture diagnostics (#22993) Subagent stream drops were spamming the parent terminal with two lines per blip ('Connection dropped...' + 'Reconnected...') while leaving zero breadcrumb in agent.log to debug them. Two underlying bugs, fixed together: 1. quiet_mode raised the run_agent/tools/etc. loggers to ERROR, which filters records before root-logger file handlers see them. The comment claimed 'File handlers still capture everything' — that was wrong. Removed in both run_agent.py and cli.py; console quietness already comes from hermes_logging not installing a console StreamHandler in non-verbose mode. 2. The stream-retry blocks emitted two _emit_status calls per drop ('⚠️ Connection dropped... Reconnecting...' + '🔄 Reconnected — resuming…') with no provider name, so multi-provider sessions had to dig through agent.log to attribute a drop. Replaced both call sites with a single _emit_stream_drop helper that emits ONE line naming the provider and error class, and always writes a structured WARNING to agent.log with subagent_id, depth, provider, base_url, error_type. Net UX change: 6 lines per triple-subagent drop → 3 lines, each naming the provider. agent.log now has a structured breadcrumb per retry that didn't exist before. Tests: 6 new tests in tests/run_agent/test_stream_drop_logging.py covering the logger-level guard, structured WARNING content, single status line per drop (no Reconnected follow-up), and provider naming. * fix(kanban): drop redundant init_db() in gateway watchers (#21378) Both `_kanban_notifier_watcher` and `_kanban_dispatcher_watcher`'s `_tick_once_for_board` called `_kb.connect(board=slug)` immediately followed by `_kb.init_db(board=slug)`. Since `connect()` already runs the schema + idempotent migration on first open per process, the explicit `init_db()` was redundant — and worse, `init_db()` deliberately busts the per-process `_INITIALIZED_PATHS` cache and re-runs the migration on a *second* connection that races the first. On every cold gateway start against a legacy DB this surfaced as either `sqlite3.OperationalError: duplicate column name: <col>` or intermittent `database is locked` errors logged at the first tick. The duplicate-column case is now tolerated by `_add_column_if_missing` (commit 78698381a), but the wasted second migration plus the database-is-locked race remain fixable by skipping the redundant call entirely. Drops `_kb.init_db(board=slug)` at both call sites and adds a regression test in `tests/hermes_cli/test_kanban_notify.py` that pins the absence via source inspection plus a runtime spy. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com> * chore: AUTHOR_MAP entry for li0near gmail (#21378) * chore(models): refresh OpenRouter + Nous fallback lists (#23001) Reorder Anthropic Opus 4.7/4.6 + Sonnet 4.6 to the top, cluster free models at the bottom of the OpenRouter list, and mirror the same ordering into the Nous portal list (paid models only). - Add inclusionai/ring-2.6-1t:free - Drop minimax-m2.5, minimax-m2.5:free, sonnet-4.5, mimo-v2.5, glm-5v-turbo, glm-5-turbo, trinity-large-preview:free, trinity-large-thinking, qwen3.5-plus-02-15 - Replace qwen3.5-35b-a3b with qwen3.6-35b-a3b - Drop x-ai/grok-4.20-beta from the Nous list * fix(kanban): /kanban slash command emits argparse garbage instead of help Closes #21794. `/kanban`, `/kanban help`, `/kanban --help`, and `/kanban <sub> -h` all returned broken output to the gateway and interactive CLI. Three underlying bugs in `hermes_cli.kanban.run_slash`: 1. argparse writes help to **stdout** but `run_slash` only captured stderr at parse time, so `-h` text was silently swallowed and replaced with the `(usage error: 0)` sentinel. 2. The wrapping parser used `prog="/"` and routed via a synthetic "_top → kanban" subparser, producing `usage: / kanban …` (stray space) and `usage: /kanban kanban …` (doubled token) in error text. 3. Bare `/kanban` and `/kanban help` dumped argparse's full ~3KB usage tree, which reads as visual garbage in a chat bubble. Fix: drive the kanban_parser directly (no double-wrap), rewrite prog strings on every leaf subparser, capture stdout AND stderr around parse_args, distinguish SystemExit(0) (help — return captured stdout) from SystemExit(2) (error — return single-line ⚠-prefixed message), and add an explicit chat-friendly short-help block returned for bare invocation and the help aliases (`help`, `--help`, `-h`, `?`). Added 5 regression tests covering bare invocation, every help alias, subcommand help, unknown action, and missing required arg. Affects every chat platform via gateway/run.py::_handle_kanban_command and the interactive CLI via cli.py::_handle_kanban_command. Co-Authored-By: Nagatha (Claude Opus 4.7) <noreply@anthropic.com> * chore: AUTHOR_MAP entry for tymrtn (#21794) * feat(stream-retry): add upstream + timing diagnostics to drop log (#23005) The previous PR (#22993) gave us a structured WARNING per stream drop but the only diagnostic was 'error_type=APIError error=Network connection lost.' — same nothing the user started with. To actually diagnose why subagents drop streams disproportionately we need to know WHERE the drop happened. Adds three breadcrumbs to the agent.log WARNING: 1. Inner exception chain. openai SDK wraps httpx errors as APIConnectionError / APIError so the catch site only sees the wrapper. _flatten_exception_chain walks __cause__/__context__ up to 4 levels deep and renders 'Outer(msg) <- Inner(msg)' so we can tell ConnectError vs RemoteProtocolError vs ReadError vs ProxyError without enabling verbose mode. 2. Upstream HTTP headers. Snapshots cf-ray, x-openrouter-provider, x-openrouter-model, x-openrouter-id, x-request-id, server, via, etc. from stream.response immediately after open (so they survive even when the stream dies before the first chunk). These answer 'is one CF edge / one downstream provider responsible, or random?' 3. Per-attempt counters. bytes streamed, chunk count, elapsed time on the dying attempt, and time-to-first-byte. Distinguishes 'couldn't connect at all' (0s, 0 bytes) from 'died after 30s mid-stream' (very different root causes — first is auth/routing, second is upstream idle-kill or proxy timeout). Plumbing: - _stream_diag_init / _stream_diag_capture_response live on AIAgent and produce a per-attempt dict held on request_client_holder['diag'] for closure access from the retry block. - _call_chat_completions and _call_anthropic both initialize the diag and increment counters per chunk/event (best-effort, never raises in the streaming hot path). - _log_stream_retry / _emit_stream_drop accept an optional diag and render the new fields. Final-exhaustion log goes through the same helper so it gets the same diagnostic dump. - UI status line gains a brief 'after Xs' suffix when timing is available — distinguishes 'connect failed' from 'died mid-stream' at a glance without grepping logs. Sample WARNING after this change: Stream drop mid tool-call on attempt 2/3 — retrying. subagent_id=sa-2-cafef00d depth=1 provider=openrouter base_url=https://openrouter.ai/api/v1 error_type=APIError error=Connection error. chain=APIError(Connection error.) <- RemoteProtocolError(peer closed connection without sending complete message body) http_status=200 bytes=12400 chunks=47 elapsed=12.00s ttfb=0.83s upstream=[cf-ray=8f1a2b3c4d5e6f7g-LAX x-openrouter-provider=Anthropic x-openrouter-id=gen-abc123 server=cloudflare] Tests: 10 covering diag init, header capture (whitelist enforced for PII), exception-chain walking + depth cap, log content with full diag, log content without diag (placeholders), UI elapsed-suffix on/off. * fix(review): tell background reviewer not to capture transient env failures as skills (#23004) Closes #6051. Reported failure mode: agent migrated to WSL2, browser launch failed because Playwright wasn't installed yet. Background reviewer captured the failure as a durable skill (`browser-tool-launch-issue`) and the agent kept refusing the browser tool for weeks after Playwright was installed and verified working. Negative claims also propagated into unrelated skills ("browser tools do not work", "cannot use Y from execute_code"). Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both lean hard on "be active, save things, a pass that does nothing is a missed learning opportunity." Neither distinguished durable knowledge from transient environment state. The reviewer was doing what it was told. Fix at the write site — both prompts now carry a "Do NOT capture" section calling out: • Environment-dependent failures (missing binaries, fresh-install errors, post-migration path mismatches, 'command not found', unconfigured credentials, uninstalled packages) • Negative claims about tools or features ("X does not work") that harden into self-cited refusals • Session-specific transient errors that resolved before the conversation ended • One-off task narratives ("summarize today's market", "analyze this PR") — also addresses the #12812 / #4538 family Plus a positive-reframing line: when a tool fails because of setup state, capture the FIX (install command, config step, env var) under an existing setup/troubleshooting skill — never "this tool doesn't work" as a standalone constraint. Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py (2 new + all existing review-prompt assertions). Substring-based checks so future prompt edits don't false-fail. * feat(codex): add gpt-5.3-codex-spark model * fix(model-metadata): restore gpt-5.3-codex-spark fallback context * fix(model-metadata): set codex-spark fallback context to 128k * fix: surface Codex CLI-only models * chore: add codex-spark salvage contributors to AUTHOR_MAP Maps olegwn@gmail.com → nederev (PR #18286) and vesper@askclaw.dev → askclaw-vesper (PR #19530) so the contributor attribution check passes when their commits land via this salvage. * docs(codex-spark): document ChatGPT Pro entitlement gating PR #12994 stripped gpt-5.3-codex-spark on the assumption that it was unsupported. It's actually research-preview, ChatGPT-Pro-only, exposed via the Codex OAuth backend at chatgpt.com/backend-api/codex/models — not via the public OpenAI API. Add explanatory comments in: - DEFAULT_CODEX_MODELS / _FORWARD_COMPAT_TEMPLATE_MODELS (codex_models.py) - _CODEX_OAUTH_CONTEXT_FALLBACK (model_metadata.py) - list_authenticated_providers' live-discovery branch (model_switch.py) so future maintainers don't strip the entry again. Also documents the intentional asymmetry that Spark stays out of the "openai" provider catalog (it isn't on the public API) and why the supported_in_api filter is *not* applied for the openai-codex route. * test(codex-spark): add live-API regression and make picker test deterministic Two follow-ups from self-review: 1. Add unit test for _fetch_models_from_api covering the live HTTP path. The salvaged PR #19530 dropped the supported_in_api:false filter in both _fetch_models_from_api and _read_cache_models, but only the cache path had a regression test. This adds the symmetric live-fetch test (mocked httpx) so a future drive-by cha…

…pability (NousResearch#16506) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto | native | text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.