fix(gateway): /usage now shows rate limits, cost, and token details between turns by teknium1 · Pull Request #7038 · NousResearch/hermes-agent

teknium1 · 2026-04-10T08:22:14Z

Summary

The gateway /usage command was only checking _running_agents for the agent object, which is only populated while the agent is actively processing. Between turns — when users actually type /usage — the dict is empty and the handler fell back to a rough message-count estimate with no rate limits, no cost, no token breakdown.

The agent object actually lives in _agent_cache between turns (kept for prompt caching). This fix checks both dicts.

What changed

Agent lookup fix (gateway/run.py):

Check _running_agents first (mid-turn), then fall back to _agent_cache (between turns)
Skip the _AGENT_PENDING_SENTINEL properly

Output parity with CLI (gateway/run.py):

Model name
Detailed token breakdown: input, output, cache read, cache write
Cost estimation (estimated $amount or 'included' for subscriptions)
Cache token lines hidden when zero

Tests (tests/gateway/test_usage_command.py):

6 tests covering: cached agent lookup, running agent priority, sentinel bypass, history fallback, zero-cache hiding, included-cost status

Before/After

Before (between turns):

📊 Session Info
Messages: 5
Estimated context: ~12,000 tokens
(Detailed usage available during active conversations)

After (between turns):

⏱️ Rate Limits: RPM: 50/60 | TPM: 800K/1.0M

📊 Session Token Usage
Model: anthropic/claude-sonnet-4.6
Input tokens: 35,000
Cache read tokens: 5,000
Output tokens: 10,000
Total: 50,000
API calls: 5
Cost: ~$0.1234
Context: 30,000 / 200,000 (15%)

Test plan

python3 -m pytest tests/gateway/test_usage_command.py -o 'addopts=' -q — 6 passed
python3 -m pytest tests/gateway/ -o 'addopts=' -q — 2359 passed (14 pre-existing failures unrelated)

…etween turns The gateway /usage handler only looked in _running_agents for the agent object, which is only populated while the agent is actively processing a message. Between turns (when users actually type /usage), the dict is empty and the handler fell through to a rough message-count estimate. The agent object actually lives in _agent_cache between turns (kept for prompt caching). This fix checks both dicts, with _running_agents taking priority (mid-turn) and _agent_cache as the between-turns fallback. Also brings the gateway output to parity with the CLI /usage: - Model name - Detailed token breakdown (input, output, cache read, cache write) - Cost estimation (estimated amount or 'included' for subscriptions) - Cache token lines hidden when zero (cleaner output) This fixes Nous Portal rate limit headers not showing up for gateway users — the data was being captured correctly but the handler could never see it.

…le boundary (#1) * fix(anthropic): omit tool-streaming beta on MiniMax endpoints MiniMax's Anthropic-compatible endpoints reject requests that include the fine-grained-tool-streaming beta header — every tool-use message triggers a connection error (~18s timeout). Regular chat works fine. Add _common_betas_for_base_url() that filters out the tool-streaming beta for Bearer-auth (MiniMax) endpoints while keeping all other betas. All four client-construction branches now use the filtered list. Based on #6528 by @HiddenPuppy. Original cherry-picked from PR #6688 by kshitijk4poor. Fixes #6510, fixes #6555. * fix: add actionable hint for OpenRouter 'no tool endpoints' error When OpenRouter returns 'No endpoints found that support tool use' (HTTP 404), display a hint explaining that provider routing restrictions may be filtering out tool-capable providers. Links the user directly to the model's OpenRouter page to check which providers support tools. The hint fires in the error display block that runs regardless of whether fallback succeeds — so the user always understands WHY the model failed, not just that it fell back. Reported via Discord: GLM-5.1 on OpenRouter with US-based provider restrictions eliminated all 4 tool-supporting endpoints (DeepInfra, Z.AI, Friendli, Venice), leaving only 7 non-tool providers. * fix: skip stale Nous pool entry when agent_key is expired * fix: sync refreshed OAuth tokens from pool back to auth.json providers * fix: proactive Codex CLI sync before refresh + retry on failure * fix: add auth.json write-back for Codex retry and valid-token early-return paths The Codex retry block and valid-token short-circuit in _refresh_entry() both return early, bypassing the auth.json sync at the end of the method. This adds _sync_device_code_entry_to_auth_store() calls on both paths so refreshed/synced tokens are written back to auth.json regardless of which code path succeeds. * feat: add Codex fast mode toggle (/fast command) Add /fast slash command to toggle OpenAI Codex service_tier between normal and priority ('fast') inference. Only exposed for models registered in _FAST_MODE_BACKEND_CONFIG (currently gpt-5.4). - Registry-based backend config for extensibility - Dynamic command visibility (hidden from help/autocomplete for non-supported models) via command_filter on SlashCommandCompleter - service_tier flows through request_overrides from route resolution - Omit max_output_tokens for Codex backend (rejects it) - Persists to config.yaml under agent.service_tier Salvage cleanup: removed simple_term_menu/input() menu (banned), bare /fast now shows status like /reasoning. Removed redundant override resolution in _build_api_kwargs — single source of truth via request_overrides from route. Co-authored-by: Hermes Agent <hermes@nousresearch.com> * feat: expand /fast to all OpenAI Priority Processing models (#6960) Previously /fast only supported gpt-5.4 and forced a provider switch to openai-codex. Now supports all 13 models from OpenAI's Priority Processing pricing table (gpt-5.4, gpt-5.4-mini, gpt-5.2, gpt-5.1, gpt-5, gpt-5-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o4-mini). Key changes: - Replaced _FAST_MODE_BACKEND_CONFIG with _PRIORITY_PROCESSING_MODELS frozenset - Removed provider-forcing logic — service_tier is now injected into whatever API path the user is already on (Codex Responses, Chat Completions, or OpenRouter passthrough) - Added request_overrides support to chat_completions path in run_agent.py - Updated messaging from 'Codex inference tier' to 'Priority Processing' - Expanded test coverage for all supported models * fix(streaming): prevent <think> in prose from suppressing response output When the model mentions <think> as literal text in its response (e.g. "(/think not producing <think> tags)"), the streaming display treated it as a reasoning block opener and suppressed everything after it. The response box would close with truncated content and no error — the API response was complete but the display ate it. Root cause: _stream_delta() matched <think> anywhere in the text stream regardless of position. Real reasoning blocks always start at the beginning of a line; mentions in prose appear mid-sentence. Fix: track line position across streaming deltas with a _stream_last_was_newline flag. Only enter reasoning suppression when the tag appears at a block boundary (start of stream, after a newline, or after only whitespace on the current line). Add a _flush_stream() safety net that recovers buffered content if no closing tag is found by end-of-stream. Also fixes three related issues discovered during investigation: - anthropic_adapter: _get_anthropic_max_output() now normalizes dots to hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key (was returning 32K instead of 128K) - run_agent: send explicit max_tokens for Claude models on Nous Portal, same as OpenRouter — both proxy to Anthropic's API which requires it. Without it the backend defaults to a low limit that truncates responses. - run_agent: reset truncated_tool_call_retries after successful tool execution so a single truncation doesn't poison the entire conversation. * fix: increase stream read timeout default to 120s, auto-raise for local LLMs (#6967) Raise the default httpx stream read timeout from 60s to 120s for all providers. Additionally, auto-detect local LLM endpoints (Ollama, llama.cpp, vLLM) and raise the read timeout to HERMES_API_TIMEOUT (1800s) since local models can take minutes for prefill on large contexts before producing the first token. The stale stream timeout already had this local auto-detection pattern; the httpx read timeout was missing it — causing a hard 60s wall that users couldn't find (HERMES_STREAM_READ_TIMEOUT was undocumented). Changes: - Default HERMES_STREAM_READ_TIMEOUT: 60s -> 120s - Auto-detect local endpoints -> raise to 1800s (user override respected) - Document HERMES_STREAM_READ_TIMEOUT and HERMES_STREAM_STALE_TIMEOUT - Add 10 parametrized tests Reported-by: Pavan Srinivas (@pavanandums) * fix(telegram): adaptive batch delay for split long messages Cherry-picked from PR #6891 by SHL0MS. When a chunk is near the 4096-char split point, wait 2.0s instead of 0.6s since a continuation is almost certain. * fix(discord): add text batching to merge split long messages Cherry-picked from PR #6894 by SHL0MS with fixes: - Only batch TEXT messages; commands/media dispatch immediately - Use build_session_key() for proper session-scoped batch keys - Consistent naming (_text_batch_delay_seconds) - Proper Dict[str, MessageEvent] typing Discord splits at 2000 chars (lowest of all platforms). Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. * fix(matrix): add text batching to merge split long messages Ports the adaptive batching pattern from the Telegram adapter. Matrix clients split messages around 4000 chars. Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. Only text messages are batched; commands dispatch immediately. Ref #6892 * fix(wecom): add text batching to merge split long messages Ports the adaptive batching pattern from the Telegram adapter. WeCom clients split messages around 4000 chars. Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. Only text messages are batched; commands/media dispatch immediately. Ref #6892 * fix(feishu): add adaptive batch delay for split long messages Feishu already had text batching with a static 0.6s delay. This adds adaptive delay: waits 2.0s when a chunk is near the ~4096-char split point since a continuation is almost certain. Tracks _last_chunk_len on each queued event to determine the delay. Configurable via HERMES_FEISHU_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 2.0). Ref #6892 * test: add text batching tests for Discord, Matrix, WeCom, Telegram, Feishu 22 tests covering: - Single message dispatch after delay - Split message aggregation (2-way and 3-way) - Different chats/rooms not merged - Adaptive delay for near-limit chunks - State cleanup after flush - Split continuation merging All 5 platform adapters tested. * test: disable text batching in existing adapter tests Set _text_batch_delay_seconds = 0 on test adapter fixtures so messages dispatch immediately (bypassing async batching). This preserves the existing synchronous assertion patterns while the batching logic is tested separately in test_text_batching.py. * fix(docker): use uv for dependency resolution to fix resolution-too-deep error * docs: document streaming timeout auto-detection for local LLMs (#6990) Add streaming timeout documentation to three pages: - guides/local-llm-on-mac.md: New 'Timeouts' section with table of all three timeouts, their defaults, local auto-adjustments, and env var overrides - reference/faq.md: Tip box in the local models FAQ section - user-guide/configuration.md: 'Streaming Timeouts' subsection under the agent config section Follow-up to #6967. * fix(gateway): bypass text batching when delay is 0 (#6996) The text batching feature routes TEXT messages through asyncio.create_task() + asyncio.sleep(delay). Even with delay=0, the task fires asynchronously and won't complete before synchronous test assertions. This broke 33 tests across Discord, Matrix, and WeCom adapters. When _text_batch_delay_seconds is 0 (the test fixture setting), dispatch directly to handle_message() instead of going through the async batching path. This preserves the pre-batching behavior for tests while keeping batching active in production (default delay 0.6s). * fix: clear conversation_history after mid-loop compression to prevent empty sessions (#7001) After mid-loop compression (triggered by 413, context_overflow, or Anthropic long-context tier errors), _compress_context() creates a new session in SQLite and resets _last_flushed_db_idx=0. However, conversation_history was not cleared, so _flush_messages_to_session_db() computed: flush_from = max(len(conversation_history=200), _last_flushed_db_idx=0) = 200 messages[200:] → empty (compressed messages < 200) This resulted in zero messages being written to the new session's SQLite store. On resume, the user would see 'Session found but has no messages.' The preflight compression path (line 7311) already had the fix: conversation_history = None This commit adds the same clearing to the three mid-loop compression sites: - Anthropic long-context tier overflow - HTTP 413 payload too large - Generic context_overflow error Reported by Aaryan (Nous community). * fix(update): always reset on stash conflict — never leave conflict markers (#7010) When `hermes update` stashes local changes and the restore hits merge conflicts, the old code prompted the user to reset or keep conflict markers. If the user declined the reset, git conflict markers (<<<<<<< Updated upstream) were left in source files, making hermes completely unrunnable with a SyntaxError on the next invocation. Additionally, the interactive path called sys.exit(1), which killed the entire update process before pip dependency install, skill sync, and gateway restart could finish — even though the code pull itself had succeeded. Changes: - Always auto-reset to clean state when stash restore conflicts - Remove the "Reset working tree?" prompt (footgun) - Remove sys.exit(1) — return False so cmd_update continues normally - User's changes remain safely in the stash for manual recovery Also fixes a secondary bug where the conflict handling prompt used bare input() instead of the input_fn parameter, which would hang in gateway mode. Tests updated: replaced prompt/sys.exit assertions with auto-reset behavior checks; removed the "user declines reset" test (path no longer exists). * feat: add Anthropic Fast Mode support to /fast command (#7037) Extends the /fast command to support Anthropic's Fast Mode beta in addition to OpenAI Priority Processing. When enabled on Claude Opus 4.6, adds speed:"fast" and the fast-mode-2026-02-01 beta header to API requests for ~2.5x faster output token throughput. Changes: - hermes_cli/models.py: Add _ANTHROPIC_FAST_MODE_MODELS registry, model_supports_fast_mode() now recognizes Claude Opus 4.6, resolve_fast_mode_overrides() returns {speed: fast} for Anthropic vs {service_tier: priority} for OpenAI - agent/anthropic_adapter.py: Add _FAST_MODE_BETA constant, build_anthropic_kwargs() accepts fast_mode=True which injects speed:fast + beta header via extra_headers (skipped for third-party Anthropic-compatible endpoints like MiniMax) - run_agent.py: Pass fast_mode to build_anthropic_kwargs in the anthropic_messages path of _build_api_kwargs() - cli.py: Update _handle_fast_command with provider-aware messaging (shows 'Anthropic Fast Mode' vs 'Priority Processing') - hermes_cli/commands.py: Update /fast description to mention both providers - tests: 13 new tests covering Anthropic model detection, override resolution, CLI availability, routing, adapter kwargs, and third-party endpoint safety * fix(gateway): /usage now shows rate limits, cost, and token details between turns (#7038) The gateway /usage handler only looked in _running_agents for the agent object, which is only populated while the agent is actively processing a message. Between turns (when users actually type /usage), the dict is empty and the handler fell through to a rough message-count estimate. The agent object actually lives in _agent_cache between turns (kept for prompt caching). This fix checks both dicts, with _running_agents taking priority (mid-turn) and _agent_cache as the between-turns fallback. Also brings the gateway output to parity with the CLI /usage: - Model name - Detailed token breakdown (input, output, cache read, cache write) - Cost estimation (estimated amount or 'included' for subscriptions) - Cache token lines hidden when zero (cleaner output) This fixes Nous Portal rate limit headers not showing up for gateway users — the data was being captured correctly but the handler could never see it. * fix: update Kimi Coding User-Agent to KimiCLI/1.30.0 The hardcoded User-Agent 'KimiCLI/1.3' is outdated — Kimi CLI is now at v1.30.0. The stale version string causes intermittent 403 errors from Kimi's coding endpoint ('only available for Coding Agents'). Update all 8 occurrences across run_agent.py, auxiliary_client.py, and doctor.py to 'KimiCLI/1.30.0' to match the current official Kimi CLI. * fix(cli): add missing os and platform imports in uninstall.py (#7034) Fixes #6983. Contributed by @JiayuuWang. * fix: set retryable=False for message-based auth errors in _classify_by_message() (#7027) Auth errors matched by message pattern were incorrectly marked retryable=True, causing futile retry loops. Aligns with _classify_by_status() which already sets retryable=False for 401/403. Fixes #7026. Contributed by @kuishou68. * Harden setup provider flows Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refresh OpenRouter model catalog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: harden cron script timeout and provider recovery * docs: add cron script timeout and provider recovery documentation - Add HERMES_CRON_TIMEOUT and HERMES_CRON_SCRIPT_TIMEOUT to env vars reference - Add script timeout and provider recovery sections to cron features page - Add timeout resolution chain and credential pool details to cron internals * fix(cli): prevent stale image attachment on text paste and voice input Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): require auth for session continuation and warn on missing API key Two security hardening changes for the API server: 1. **Startup warning when no API key is configured.** When `API_SERVER_KEY` is not set, all endpoints accept unauthenticated requests. This is the default configuration, but operators may not realize the security implications. A prominent warning at startup makes the risk visible. 2. **Require authentication for session continuation.** The `X-Hermes-Session-Id` header allows callers to load and continue any session stored in state.db. Without authentication, an attacker who can reach the API server (e.g. via CORS from a malicious page, or on a shared host) could enumerate session IDs and read conversation history — which may contain API keys, passwords, code, or other sensitive data shared with the agent. Session continuation now returns 403 when no API key is configured, with a clear error message explaining how to enable the feature. When a key IS configured, the existing Bearer token check already gates access. This is defense-in-depth: the API server is intended for local use, but defense against cross-origin and shared-host attacks is important since the default binding is 127.0.0.1 which is reachable from browsers via DNS rebinding or localhost CORS. * fix(gateway): apply /model session overrides so switch persists across messages The gateway /model command stored session overrides in _session_model_overrides but run_sync() never consulted them when resolving the model and runtime for the next message. It always read from config.yaml, so the switch was lost as soon as a new agent was created. Two fixes: 1. In run_sync(), apply _session_model_overrides after resolving from config.yaml/env — the override takes precedence for model, provider, api_key, base_url, and api_mode. 2. In post-run fallback detection, check whether the model mismatch (agent.model != config_model) is due to an intentional /model switch before evicting the cached agent. Without this, the first message after /model would work (cached agent reused) but the fallback detector would evict it, causing the next message to revert. Affects all gateway platforms (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, BlueBubbles, HomeAssistant) since they all share GatewayRunner._run_agent(). Fixes #6213 * fix(terminal): cap foreground timeout to prevent session deadlocks When the model calls terminal() in foreground mode without background=true (e.g. to start a server), the tool call blocks until the command exits or the timeout expires. Without an upper bound the model can request arbitrarily high timeouts (the schema had minimum=1 but no maximum), blocking the entire agent session for hours until the gateway idle watchdog kills it. Changes: - Add FOREGROUND_MAX_TIMEOUT (600s, configurable via TERMINAL_MAX_FOREGROUND_TIMEOUT env var) that caps foreground timeout - Clamp effective_timeout to the cap when background=false and timeout exceeds the limit - Include a timeout_note in the tool result when clamped, nudging the model to use background=true for long-running processes - Update schema description to show the max timeout value - Remove dead clamping code in the background branch that could never fire (max_timeout was set to effective_timeout, so timeout > max_timeout was always false) - Add 7 tests covering clamping, no-clamping, config-default-exceeds-cap edge case, background bypass, default timeout, constant value, and schema content Self-review fixes: - Fixed bug where timeout_note said 'Requested timeout Nones' when clamping fired from config default exceeding cap (timeout param is None). Now uses unclamped_timeout instead of the raw timeout param. - Removed unused pytest import from test file - Extracted test config dict into _make_env_config() helper - Fixed tautological test_default_value assertion - Added missing test for config default > cap with no model timeout * fix: reject foreground timeout above cap instead of clamping Change behavior from silent clamping to returning an error when the model requests a foreground timeout exceeding FOREGROUND_MAX_TIMEOUT. This forces the model to use background=true for long-running commands rather than silently changing its intent. - Config default timeouts above the cap are NOT rejected (user's choice) - Only explicit model-requested timeouts trigger rejection - Added boundary test for timeout exactly at the limit * fix(copilot): add missing Copilot-Integration-Id header The GitHub Copilot API now requires a Copilot-Integration-Id header on all requests. Without it, every API call fails with HTTP 400: "missing required Copilot-Integration-Id header". Uses vscode-chat as the integration ID, matching opencode which shares the same OAuth client ID (Ov23li8tweQw6odWQebz). Fixes: Copilot provider fails with "missing required Copilot-Integration-Id header" (HTTP 400) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(acp): populate usage from top-level result fields * fix(acp): remove dead nested usage dict path run_conversation() never returns a result["usage"] nested dict — token counters are always at the top level. The nested path used the wrong key name ("cached_tokens" vs "cache_read_tokens") and was never reachable. Remove it. * fix(config): allow HERMES_HOME_MODE env var to override _secure_dir() permissions (#6993) Operators running a web server (nginx, caddy) that needs to traverse ~/.hermes/ can now set HERMES_HOME_MODE=0701 (or any octal mode) instead of having _secure_dir() revert their manual chmod on every gateway restart. Default behavior (0o700) is unchanged. Fixes #6991. Contributed by @ygd58. * feat(environments): unified file sync with change tracking and deletion Replace per-backend ad-hoc file sync with a shared FileSyncManager that handles mtime-based change detection, remote deletion of locally-removed files, and transactional state updates. - New FileSyncManager class (tools/environments/file_sync.py) with callbacks for upload/delete, rate limiting, and rollback - Shared iter_sync_files() eliminates 3 duplicate implementations - SSH: replace unconditional rsync with scp + mtime skip - Modal/Daytona: replace inline _synced_files dict with manager - All 3 backends now sync credentials + skills + cache uniformly - Remote deletion: files removed locally are cleaned from remote - HERMES_FORCE_FILE_SYNC=1 env var for debugging - Base class _before_execute() simplified to empty hook - 12 unit tests covering mtime skip, deletion, rollback, rate limiting * test: add reproducible perf benchmark for file sync overhead Direct env.execute() timing — no LLM in the loop. Measures per-command wall-clock including sync check. Results on SSH: - echo median: 617ms (pure SSH round-trip + spawn overhead) - sync-triggered after 6s wait: 621ms (mtime skip adds ~0ms) - within-interval (no sync): 618ms Confirms mtime skip makes sync overhead unmeasurable. * fix(tests): update mocks for file sync changes - Modal snapshot tests: accept **kw in iter_skills_files/iter_cache_files mock lambdas to match new container_base kwarg - SSH preflight test: mock _detect_remote_home, _ensure_remote_dirs, init_session, and FileSyncManager added in file sync PR * fix(gateway): remove DM thread session seeding to prevent cross-thread contamination (#7084) The session store was copying the ENTIRE parent DM transcript into new thread sessions. This caused unrelated conversations to bleed across threads in Slack DMs. The Slack adapter already handles thread context correctly via _fetch_thread_context() (conversations.replies API), which fetches only the actual thread messages. The session-level seeding was both redundant and harmful. No other platform (Telegram, Discord) uses DM threads, so the seeding code path was only triggered by Slack — where it conflicted with the adapter-level context. Tests updated to assert thread isolation: all thread sessions start empty, platform adapters are responsible for injecting thread context. Salvage of PR #5868 (jarvisxyz). Reported by norbert on Discord. * feat(discord): add allowed_channels whitelist config Add DISCORD_ALLOWED_CHANNELS (env var) / discord.allowed_channels (config.yaml) support to restrict the bot to only respond in specified channels. When set, messages from any channel NOT in the allowed list are silently ignored — even if the bot is @mentioned. This provides a secure default- deny posture vs the existing ignored_channels which is default-allow. This is especially useful when bots in other channels may create new channels dynamically (e.g., project bots) — a blacklist requires constant maintenance while a whitelist is set-and-forget. Follows the same config pattern as ignored_channels and free_response_channels: - Env var: DISCORD_ALLOWED_CHANNELS (comma-separated channel IDs) - Config: discord.allowed_channels (string or list of channel IDs) - Env var takes precedence over config.yaml - Empty/unset = no restriction (backward compatible) Files changed: - gateway/platforms/discord.py: check allowed_channels before ignored_channels - gateway/config.py: map discord.allowed_channels → DISCORD_ALLOWED_CHANNELS - hermes_cli/config.py: add allowed_channels to DEFAULT_CONFIG * fix(model_metadata): add xAI Grok context length fallbacks xAI /v1/models does not return context_length metadata, so Hermes probes down to the 128k default whenever a user configures a custom provider pointing at https://api.x.ai/v1. This forces every xAI user to manually override model.context_length in config.yaml (2M for Grok 4.20 / 4.1-fast / 4-fast) or lose most of the usable context window. Add DEFAULT_CONTEXT_LENGTHS entries for the Grok family so the fallback lookup returns the correct value via substring matching. Values sourced from models.dev (2026-04) and cross-checked against the xAI /v1/models listing: - grok-4.20-* 2,000,000 (reasoning, non-reasoning, multi-agent) - grok-4-1-fast-* 2,000,000 - grok-4-fast-* 2,000,000 - grok-4 / grok-4-0709 256,000 - grok-code-fast-1 256,000 - grok-3* 131,072 - grok-2 / latest 131,072 - grok-2-vision* 8,192 - grok (catch-all) 131,072 Keys are ordered longest-first so that specific variants match before the catch-all, consistent with the existing Claude/Gemma/MiniMax entries. Add TestDefaultContextLengths.test_grok_models_context_lengths and test_grok_substring_matching to pin the values and verify the full lookup path. All 77 tests in test_model_metadata.py pass. * fix(dingtalk,api): validate session webhook URL origin, cap webhook cache, reject header injection dingtalk.py: The session_webhook URL from incoming DingTalk messages is POSTed to without any origin validation (line 290), enabling SSRF attacks via crafted webhook URLs (e.g. http://169.254.169.254/ to reach cloud metadata). Add a regex check that only accepts the official DingTalk API origin (https://api.dingtalk.com/). Also cap _session_webhooks dict at 500 entries with FIFO eviction to prevent unbounded memory growth from long-running gateway instances. api_server.py: The X-Hermes-Session-Id request header is accepted and echoed back into response headers (lines 675, 697) without sanitization. A session ID containing \r\n enables HTTP response splitting / header injection. Add a check that rejects session IDs containing control characters (\r, \n, \x00). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: align auth-by-message classification with status-code path, decode URLs before secret check error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized", etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401 path (line 432) which correctly uses retryable=False + should_fallback=True. The mismatch causes 3 wasted retries with the same broken credential before fallback, while 401 errors immediately attempt fallback. Align the message-based path to match: retryable=False, should_fallback=True. web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs against the raw URL string (line 1196). URL-encoded secrets like %73k-1234... ( sk-1234...) bypass the filter because the regex expects literal ASCII. Add urllib.parse.unquote() before the check so percent-encoded variants are also caught. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(approval,mcp): log silent exception handlers, narrow OAuth catches, close server on error Three silent `except Exception` blocks in approval.py (lines 345, 387, 469) return fallback values with zero logging — making it impossible to debug callback failures, allowlist load errors, or config read issues. Add logger.warning/error calls that match the pattern already used by save_permanent_allowlist() and _smart_approve() in the same file. In mcp_oauth.py, narrow the overly-broad `except Exception` in get_tokens() and get_client_info() to the specific exceptions Pydantic's model_validate() can raise (ValueError, TypeError, KeyError), and include the exception message in the warning. Also wrap the _wait_for_callback() polling loop in try/finally so the HTTPServer is always closed — previously an asyncio.CancelledError or any exception in the loop would leak the server socket. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: hidden_div regex bypass with newlines, credential config silent failure, webhook route error severity prompt_builder.py: The `hidden_div` detection pattern uses `.*` which does not match newlines in Python regex (re.DOTALL is not passed). An attacker can bypass detection by splitting the style attribute across lines: `<div style="color:red;\ndisplay: none">injected content</div>` Replace `.*` with `[\s\S]*?` to match across line boundaries. credential_files.py: `_load_config_files()` catches all exceptions at DEBUG level (line 171), making YAML parse failures invisible in production logs. Users whose credential files silently fail to mount into sandboxes have no diagnostic clue. Promote to WARNING to match the severity pattern used by the path validation warnings at lines 150 and 158 in the same function. webhook.py: `_reload_dynamic_routes()` logs JSON parse failures at WARNING (line 265) but the impact — stale/corrupted dynamic routes persisting silently — warrants ERROR level to ensure operator visibility in alerting pipelines. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: apply hidden_div regex newline bypass fix to skills_guard.py The same .* pattern vulnerable to newline bypass that was fixed in prompt_builder.py (PR #6925) also existed in skills_guard.py. Changed to [\s\S]*? to match across newlines. * test: update session ID tests to require auth (follow-up to #6930) Session continuation now requires API_SERVER_KEY to be configured. Update TestSessionIdHeader tests to use auth_adapter with Bearer token. * fix: include custom_providers in /model command listings and resolution Custom providers defined in config.yaml under were completely invisible to the /model command in both gateway (Telegram, Discord, etc.) and CLI. The provider listing skipped them and explicit switching via --provider failed with "Unknown provider". Root cause: gateway/run.py, cli.py, and model_switch.py only read the dict from config, ignoring entirely. Changes: - providers.py: add resolve_custom_provider() and extend resolve_provider_full() to check custom_providers after user_providers - model_switch.py: propagate custom_providers through switch_model(), list_authenticated_providers(), and get_authenticated_provider_slugs(); add custom provider section to provider listings - gateway/run.py: read custom_providers from config, pass to all model-switch calls - cli.py: hoist config loading, pass custom_providers to listing and switch calls Tests: 4 new regression tests covering listing, resolution, and gateway command handler. All 71 tests pass. * fix: extract custom_provider_slug() helper, harden gateway test - Add custom_provider_slug() to hermes_cli/providers.py as the single source of truth for building 'custom:<name>' slugs. - Use it in resolve_custom_provider() and list_authenticated_providers() instead of duplicated inline slug construction. - Add _session_model_overrides and _voice_mode to gateway test runner for object.__new__() safety. * mattermost added as deliver to webhook gateway * fix: add all platforms to webhook cross-platform delivery The delivery tuple in webhook.py only had 5 of 14 platforms with gateway adapters. Adds whatsapp, matrix, mattermost, homeassistant, email, dingtalk, feishu, wecom, and bluebubbles so webhooks can deliver to any connected platform. Updates docs delivery options table to list all platforms. Follow-up to cherry-picked fix from olafthiele (PR #7035). * feat(cron): support Discord thread_id in deliver targets Add Discord thread support to cron delivery and send_message_tool. - _parse_target_ref: handle discord platform with chat_id:thread_id format - _send_discord: add thread_id param, route to /channels/{thread_id}/messages - _send_to_platform: pass thread_id through for Discord - Discord adapter send(): read thread_id from metadata for gateway path - Update tool schema description to document Discord thread targets Cherry-picked from PR #7046 by pandacooming (maxyangcn). Follow-up fixes: - Restore proxy support (resolve_proxy_url/proxy_kwargs_for_aiohttp) that was accidentally deleted — would have caused NameError at runtime - Remove duplicate _DISCORD_TARGET_RE regex; reuse existing _TELEGRAM_TOPIC_TARGET_RE via _NUMERIC_TOPIC_RE alias (identical pattern) - Fix misleading test comments about Discord negative snowflake IDs (Discord uses positive snowflakes; negative IDs are a Telegram convention) - Rewrite misleading scheduler test that claimed to exercise home channel fallback but actually tested the explicit platform:chat_id parsing path * fix(run_agent): recover primary client on openai transport errors * fix(bluebubbles): auto-register webhook with BlueBubbles server on connect **Problem:** The BlueBubbles iMessage gateway was not receiving incoming messages even though: 1. BlueBubbles Server was properly configured and running 2. Hermes gateway started without errors 3. Webhook listener was started on the configured port The root cause was that the BlueBubbles adapter only started a local webhook listener but never registered the webhook URL with the BlueBubbles server via the API. Without registration, the server doesn't know where to send events. **Fix:** 1. Added _register_webhook() method that POSTs to /api/v1/webhook with the listener URL and event types (new-message, updated-message, message) 2. Added _unregister_webhook() method for clean shutdown 3. Both methods handle the case where webhook listens on 0.0.0.0/127.0.0.1 by using 'localhost' as the external hostname 4. Fixed documentation: 'hermes gateway logs' → 'hermes logs gateway' **API Reference:** https://docs.bluebubbles.app/server/developer-guides/rest-api-and-webhooks **Testing:** - Webhook registration is now automatic when gateway starts - Failed registration logs a warning but doesn't prevent startup - Clean shutdown unregisters the webhook Closes: iMessage gateway not working issue * fix: improve bluebubbles webhook registration resilience Follow-up to cherry-picked PR #6592: - Extract _webhook_url property to deduplicate URL construction - Add _find_registered_webhooks() helper for reuse - Crash resilience: check for existing registration before POSTing (handles restart after unclean shutdown without creating duplicates) - Accept 200-299 status range (not just 200) for webhook creation - Unregister removes ALL matching registrations (cleans up orphaned dupes) - Add 17 tests covering register/unregister/find/edge cases * fix(run-agent): rotate credential pool on billing-classified 400s * fix: STT provider-model mismatch — whisper-1 fed to faster-whisper (#7113) Legacy flat stt.model config key (from cli-config.yaml.example and older versions) was passed as a model override to transcribe_audio() by the gateway, bypassing provider-specific model resolution. When the provider was 'local' (faster-whisper), this caused: ValueError: Invalid model size 'whisper-1' Changes: - gateway/run.py, discord.py: stop passing model override — let transcribe_audio() handle provider-specific model resolution internally - get_stt_model_from_config(): now provider-aware, reads from the correct nested section (stt.local.model, stt.openai.model, etc.); ignores legacy flat key for local provider to prevent model name mismatch - cli-config.yaml.example: updated STT section to show nested provider config structure instead of legacy flat key - config migration v13→v14: moves legacy stt.model to the correct provider section and removes the flat key Reported by community user on Discord. * fix(gateway): remap all paths in system service unit to target user's home When installing a system service via sudo, ExecStart, WorkingDirectory, VIRTUAL_ENV, and PATH entries were not remapped to the target user's home — only HERMES_HOME was. This caused the service to fail with status=200/CHDIR because the target user cannot access /root/. Adds _remap_path_for_user() helper and applies it to all path variables in the system branch of generate_systemd_unit(). Closes #6989 * fix(streaming): update stale-stream timer during Anthropic native streaming (#7117) The _call_anthropic() streaming path never updated last_chunk_time during the event loop — only once at stream start. The stale stream detector in the outer poll loop uses this timer, so any Anthropic stream longer than 180s was killed even when events were actively arriving. This self-inflicted a RemoteProtocolError that users saw as: '⚠️ Connection to provider dropped (RemoteProtocolError). Reconnecting…' The _call_chat_completions() path already updates last_chunk_time on every chunk (line 4475). This brings _call_anthropic() to parity. Also adds deltas_were_sent tracking to the Anthropic text_delta path so the retry loop knows not to retry after partial delivery (prevents duplicated output on connection drops mid-stream). Reported-by: Discord users (Castellani, Codename_11) * fix(gateway): scope /yolo to the active session * fix(mcp): combine content and structuredContent when both present (#7118) When an MCP server returns both content (model-oriented text) and structuredContent (machine-oriented JSON), the client now combines them instead of discarding content. The text content becomes the primary result (what the agent reads), and structuredContent is included as supplementary metadata. Previously, structuredContent took full precedence — causing data loss for servers like Desktop Commander that put the actual file text in content and metadata in structuredContent. MCP spec guidance: for conversational/agent UX, prefer content. * fix: remove 115 verified dead code symbols across 46 production files Automated dead code audit using vulture + coverage.py + ast-grep intersection, confirmed by Opus deep verification pass. Every symbol verified to have zero production callers (test imports excluded from reachability analysis). Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py, hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py). Co-authored-by: alt-glitch <balyan.sid@gmail.com> * fix: restore 6 tests that tested live code but used deleted helpers * chore: remove spec-dead-code.md from tracked files * fix: clean up stale test references to removed attributes * fix: update 6 test files broken by dead code removal - test_percentage_clamp.py: remove TestContextCompressorUsagePercent class and test_context_compressor_clamped (tested removed get_status() method) - test_credential_pool.py: remove test_mark_used_increments_request_count (tested removed mark_used()), replace active_lease_count() calls with direct _active_leases dict access, remove mark_used from thread test - test_session.py: replace SessionSource.local_cli() factory calls with direct SessionSource construction (local_cli classmethod removed) - test_error_classifier.py: remove test_is_transient_property (tested removed is_transient property on ClassifiedError) - test_delivery.py: remove TestDeliveryRouter class (tested removed resolve_targets method), clean up unused imports - test_skills_hub.py: remove test_is_hub_installed (tested removed is_hub_installed method on HubLockFile) * fix(gateway): launchd_stop uses bootout so KeepAlive doesn't respawn (#7119) launchd_stop() previously used `launchctl kill SIGTERM` which only signals the process. Because the plist has KeepAlive.SuccessfulExit=false, launchd immediately respawns the gateway — making `hermes gateway stop` a no-op that prints '✓ Service stopped' while the service keeps running. Switch to `launchctl bootout` which unloads the service definition so KeepAlive can't trigger. The process exits and stays stopped until `hermes gateway start` (which already handles re-bootstrapping unloaded jobs via error codes 3/113). Also adds _wait_for_gateway_exit() after bootout to ensure the process is fully gone before returning, and tolerates 'already unloaded' errors. Fixes: .env changes not taking effect after gateway stop+restart on macOS. The root cause was that stop didn't actually stop — the respawned process loaded the old env before the user's restart command ran. * fix(acp): declare session load and resume capabilities in initialize response (#6985) The resume_session and load_session handlers were implemented but undiscoverable by ACP clients because the capabilities weren't declared in the initialize response. Adds load_session=True and resume=SessionResumeCapabilities() plus wire-format tests. Fixes #6633. Contributed by @luyao618. * docs: add cron troubleshooting guide Adds a troubleshooting guide for Hermes cron jobs covering: - Jobs not firing (schedule, gateway, timezone checks) - Delivery failures (platform tokens, [SILENT], permissions) - Skill loading failures (installed, ordering, interactive tools) - Job errors (script paths, lock contention, permissions) - Performance issues and diagnostic commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct inaccuracies and add sidebar entry for cron troubleshooting guide - Fix job state display: [active] not scheduled - Fix CLI mode claim: only gateway fires cron, not CLI sessions - Expand delivery targets table (5 → 10+ platforms with platform:chat_id syntax) - Fix disabled toolsets: cronjob, messaging, and clarify (not just cronjob) - Remove nonexistent 'hermes skills sync' command reference - Fix log file path: agent.log/errors.log, not scheduler.log - Fix execution model: sequential, not thread pool concurrent - Fix 'hermes cron run' description: next tick, not immediate - Add inactivity-based timeout details (HERMES_CRON_TIMEOUT) - Add sidebar entry in sidebars.ts under Guides & Tutorials * fix(gateway): avoid false failure reactions on restart cancellation * fix(gateway): route /background through active-session bypass When /background was sent during an active run, it was not in the platform adapter's bypass list and fell through to the interrupt path instead of spawning a parallel background task. Add "background" to the active-session command bypass in the platform adapter, and add an early return in the gateway runner's running-agent guard to route /background to _handle_background_command() before it reaches the default interrupt logic. Fixes #6827 * test(gateway): add /background to active-session bypass tests Adds a regression test verifying that /background bypasses the active-session guard in the platform adapter, matching the existing test pattern for /stop, /new, /approve, /deny, and /status. * fix(gateway): replace assertions with proper error handling in Telegram and Feishu Python assertions are stripped when running with `python -O` (optimized mode), making them unsuitable for runtime error handling. 1. `telegram_network.py:113` — After exhausting all fallback IPs, the code uses `assert last_error is not None` before `raise last_error`. In optimized mode, the assert is skipped; if `last_error` is unexpectedly None, `raise None` produces a confusing `TypeError` instead of a meaningful error. Replace with an explicit `if` check that raises `RuntimeError` with a descriptive message. 2. `feishu.py:975` — The `_configure_with_overrides` closure uses `assert original_configure is not None` as a guard. While the outer scope only installs this closure when `original_configure` is not None, the assert would silently disappear in optimized mode. Replace with an explicit `if` check for defensive safety. * fix(telegram): harden HTTPX request pools during reconnect - configure Telegram HTTPXRequest pool/timeouts with env-overridable defaults\n- use separate request/get_updates request objects to reduce pool contention\n- skip fallback-IP transport when proxy is configured (or explicitly disabled)\n\nThis mitigates recurrent pool-timeout failures during polling reconnect/bootstrap (delete_webhook). * fix(gateway): prevent duplicate messages on no-message-id platforms Platforms that don't return a message_id after the first send (Signal, GitHub webhooks) were causing GatewayStreamConsumer to re-enter the "first send" path on every tool boundary, posting one platform message per tool call (observed as 155 PR comments on a single response). Fix: treat _message_id == "__no_edit__" as a sentinel meaning "platform accepted the send but cannot be edited". When a tool boundary arrives in that state, skip the message_id/accumulated/last_sent_text reset so all continuation text is delivered once via _send_fallback_final rather than re-posted per segment. Also make prompt_toolkit imports in hermes_cli/commands.py optional so gateway and test environments that lack the package can still import resolve_command, gateway_help_lines, and COMMAND_REGISTRY. * fix(tests): repair three pre-existing gateway test failures - test_background_autocompletes: pytest.importorskip("prompt_toolkit") so the test skips gracefully where the CLI dep is absent - test_run_agent_progress_stays_in_originating_topic: update stale emoji 💻 → ⚙️ to match get_tool_emoji("terminal", default="⚙️") in run.py - test_internal_event_bypass{_authorization,_pairing}: mock _handle_message_with_agent to raise immediately; avoids the 300s run_in_executor hang that caused the tests to time out * fix(gateway): implement platform-aware PID termination * fix: prevent duplicate completion notifications on process kill (#7124) When kill_process() sends SIGTERM, both it and the reader thread race to call _move_to_finished() — kill_process sets exit_code=-15 and enqueues a notification, then the reader thread's process.wait() returns with exit_code=143 (128+SIGTERM) and enqueues a second one. Fix: make _move_to_finished() idempotent by tracking whether the session was actually removed from _running. The second call sees it was already moved and skips the completion_queue.put(). Adds regression test: test_move_to_finished_idempotent_no_duplicate * fix(gateway): validate Slack image downloads before caching Slack may return an HTML sign-in/redirect page instead of actual media bytes (e.g. expired token, restricted file access). This adds two layers of defense: 1. Content-Type check in slack.py rejects text/html responses early 2. Magic-byte validation in base.py's cache_image_from_bytes() rejects non-image data regardless of source platform Also adds ValueError guards in wecom.py and email.py so the new validation doesn't crash those adapters. Closes #6829 * fix(api-server): share one Docker container across all API conversations (#7127) The API server's _run_agent() was not passing task_id to run_conversation(), causing a fresh random UUID per request. This meant every Open WebUI message spun up a new Docker container and tore it down afterward — making persistent filesystem state impossible. Two fixes: 1. Pass task_id="default" so all API server conversations share the same Docker container (matching the design intent: one configured Docker environment, always the same container). 2. Derive a stable session_id from the system prompt + first user message hash instead of uuid4(). This stops hermes sessions list from being polluted with single-message throwaway sessions. Fixes #3438. * fix(security): prevent SSRF redirect bypass in Slack adapter * fix: make safe_url_for_log public, add SSRF redirect guards to base.py cache helpers Follow-up to Dusk1e's PR #7120 (Slack send_image redirect guard): - Rename _safe_url_for_log -> safe_url_for_log (drop underscore) since it is now imported cross-module by the Slack adapter - Add _ssrf_redirect_guard httpx event hook to cache_image_from_url() and cache_audio_from_url() in base.py — same pattern as vision_tools and the Slack adapter fix - Update url_safety.py docstring to reflect broader coverage - Add regression tests for image/audio redirect blocking + safe passthrough * fix(security): enforce path boundary checks in skill manager operations * feat(auth): add is_provider_explicitly_configured() helper Gate function for checking whether a user has explicitly selected a provider via hermes model/setup, auth.json active_provider, or env vars. Used in subsequent commits to prevent unauthorized credential auto-discovery. Follows the pattern from PR #4210. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(auth): gate Claude Code credential seeding behind explicit provider config _seed_from_singletons('anthropic') now checks is_provider_explicitly_configured('anthropic') before reading ~/.claude/.credentials.json. Without this, the auxiliary client fallback chain silently discovers and uses Claude Code tokens when the user's primary provider key is invalid — consuming their Claude Max subscription quota without consent. Follows the same gating pattern as PR #4210 (setup wizard gate) but applied to the credential pool seeding path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(auth): make 'auth remove' for claude_code prevent re-seeding Previously, removing a claude_code credential from the anthropic pool only printed a note — the next load_pool() re-seeded it from ~/.claude/.credentials.json. Now writes a 'suppressed_sources' flag to auth.json that _seed_from_singletons checks before seeding. Follows the pattern of env: source removal (clears .env var) and device_code removal (clears auth store state). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(auxiliary): skip anthropic in fallback chain when not explicitly configured _resolve_api_key_provider() now checks is_provider_explicitly_configured before calling _try_anthropic(). Previously, any auxiliary fallback (e.g. when kimi-coding key was invalid) would silently discover and use Claude Code OAuth tokens — consuming the user's Claude Max subscription without their knowledge. This is the auxiliary-client counterpart of the setup-wizard gate in PR #4210. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * security(approval): close 4 pattern gaps found by source-grounded audit Four gaps in DANGEROUS_PATTERNS found by running 10 targeted tests that each mapped to a specific pattern in approval.py and checked whether the documented defense actually held. 1. **Heredoc script injection** — `python3 << 'EOF'` bypasses the existing `-e`/`-c` flag pattern. Adds pattern for interpreter + `<<` covering python{2,3}, perl, ruby, node. 2. **PID expansion self-termination** — `kill -9 $(pgrep hermes)` is opaque to the existing `pkill|killall` + name pattern because command substitution is not expanded at detection time. Adds structural patterns matching `kill` + `$(pgrep` and backtick variants. 3. **Git destructive operations** — `git reset --hard`, `push --force`, `push -f`, `clean -f*`, and `branch -D` were entirely absent. Note: `branch -d` also triggers because IGNORECASE is global — acceptable since -d is still a delete, just a safe one, and the prompt is only a confirmation, not a hard block. 4. **chmod +x then execute** — two-step social engineering where a script containing dangerous commands is first written to disk (not checked by write_file), then made executable and run as `./script`. Pattern catches `chmod +x ... [;&|]+ ./` combos. Does not solve the deeper architectural issue (write_file not checking content) — that is called out in the PR description as a known limitation. Tests: 23 new cases across 4 test classes, all in test_approval.py: - TestHeredocScriptExecution (7 cases, incl. regressions for -c) - TestPgrepKillExpansion (5 cases, incl. safe kill PID negative) - TestGitDestructiveOps (8 cases, incl. safe git status/push negatives) - TestChmodExecuteCombo (3 cases, incl. safe chmod-only negative) Full suite: 146 passed, 0 failed. * fix(feishu): wrap image bytes in BytesIO before uploading to lark SDK * feat(telegram): support custom base_url for credential proxy When extra.base_url is set in the Telegram platform config, use it as the base URL for all Telegram API requests instead of api.telegram.org. This allows agents to route Telegram traffic through the credential proxy, which injects the real bot token — the VM never sees it. Also supports extra.base_file_url for file downloads (defaults to base_url if not set separately). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(cli): make /status show gateway-style session status * fix(matrix): remove eyes reaction on processing complete The on_processing_complete handler was never removing the eyes reaction because _send_reaction didn't return the reaction event_id. Fix: - _send_reaction returns Optional[str] event_id - on_processing_start stores it in _pending_reactions dict - on_processing_complete redacts the eyes reaction before adding completion emoji * test: update Matrix reaction tests for new _send_reaction return type _send_reaction now returns Optional[str] (event_id) instead of bool. Tests updated: - test_send_reaction: assert result == event_id string - test_send_reaction_no_client: assert result is None - test_on_processing_start_sends_eyes: _send_reaction returns event_id, now also asserts _pending_reactions is populated - test_on_processing_complete_sends_check: set up _pending_reactions and mock _redact_reaction, assert eyes reaction is redacted before sending check * fix(matrix): log redact failures and add missing reaction test cases Add debug logging when eyes reaction redaction fails, and add tests for the success=False path and the no-pending-reaction edge case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(discord): add channel_skill_bindings for auto-loading skills per channel Simplified implementation of the feature from PR #6842 (RunzhouLi). Allows Discord channels/forum threads to auto-bind skills via config: discord: channel_skill_bindings: - id: "123456" skills: ["skill-a", "skill-b"] The run.py auto-skill loader now handles both str and list[str], loading multiple skills in order and concatenating their payloads. Forum threads inherit their parent channel's bindings. Co-authored-by: RunzhouLi <RunzhouLi@users.noreply.github.com> * test(discord): add tests for channel_skill_bindings resolution * fix: flush stdin after curses/terminal menus to prevent escape sequence leakage (#7167) After curses.wrapper() or simple_term_menu exits, endwin() restores the terminal but does NOT drain the OS input buffer. Leftover escape-sequence bytes from arrow key navigation remain buffered and get silently consumed by the next input()/getpass.getpass() call. This caused a user-reported bug where selecting Z.AI/GLM as provider wrote ^[^[ (two ESC chars) into .env as the API key, because the buffered escape bytes were consumed by getpass before the user could type anything. Fix: add flush_stdin() helper using termios.tcflush(TCIFLUSH) and call it after every curses.wrapper() and simple_term_menu .show() return across all interactive menu sites: - hermes_cli/curses_ui.py (curses_checklist) - hermes_cli/setup.py (_curses_prompt_choice) - hermes_cli/tools_config.py (_prompt_choice) - hermes_cli/auth.py (_prompt_model_selection) - hermes_cli/main.py (3 simple_term_menu usages) * fix: UTF-8 config encoding, pairing hint, credential_pool key, header normalization (#7174) Four small fixes: (1) UTF-8 encoding for config open (@zhangchn #7063), (2) pairing hint placeholders (@konsisumer #7057), (3) missing credential_pool in cheap route (@kuishou68 #7025), (4) case-insensitive rate limit headers (@kuishou68 #7019). * fix(telegram): use valid reaction emojis for processing completion (#7175) Telegram's Bot API only allows a specific set of emoji for bot reactions (the ReactionEmoji enum). ✅ (U+2705) and ❌ (U+274C) are not in that set, causing on_processing_complete reactions to silently fail with REACTION_INVALID (caught at debug log level). Replace with 👍 (U+1F44D) / 👎 (U+1F44E) which are always available in Telegram's allowed reaction list. The 👀 (eyes) reaction used by on_processing_start was already valid. Based on the fix by @ppdng in PR #6685. Fixes #6068 * fix: add Alibaba/DashScope rate-limit pattern to error classifier Port from anomalyco/opencode#21355: Alibaba's DashScope API returns a unique throttling message ('Request rate increased too quickly...') that doesn't match standard rate-limit patterns ('rate limit', 'too many requests'). This caused Alibaba errors to fall through to the 'unknown' category rather than being properly classified as rate_limit with appropriate backoff/rotation. Add 'rate increased too quickly' to _RATE_LIMIT_PATTERNS and test with the exact error message observed from the Alibaba provider. * fix: pass config_context_length to switch_model context compressor When switching models at runtime, the config_context_length override was not being passed to the new context compressor instance. This meant the user-specified context length from config.yaml was lost after a model switch. - Store _config_context_length on AIAgent instance during __init__ - Pass _config_context_length when creating new ContextCompressor in switch_model - Add test to verify config_context_length is preserved across model switches Fixes: quando estamos alterando o modelo não está alterando o tamanho do contexto * fix: opencode-go missing from /model list and improve HERMES_OVERLAYS credential check When opencode-go API key is set, it should appear in the /model list. The provider was already in PROVIDER_TO_MODELS_DEV and PROVIDER_REGISTRY, so it appears via Part 1 (built-in source). Also fixes a potential issue in Part 2 (HERMES_OVERLAYS) where providers with auth_type=api_key but no extra_env_vars would not be detected: - Now also checks api_key_env_vars from PROVIDER_REGISTRY for api_key auth_type - Add test verifying opencode-go appears when OPENCODE_GO_API_KEY is set * fix: always show model selection menu for custom providers Previously, _model_flow_named_custom() returned immediately when a saved model existed, making it impossible to switch models on multi-model endpoints (OpenRouter, vLLM clusters, etc.). Now the function always probes the endpoint and shows the selection menu with the current model pre-selected and marked '(current)'. Falls back to the saved model if endpoint probing fails. Fixes #6862 * test: add regression tests for custom provider model switching Covers: probe always called, model switch works, probe failure fallback, first-time flow unchanged. * fix(test): correct mock target for fetch_api_models in custom provider tests fetch_api_models is imported locally inside _model_flow_named_custom from hermes_cli.models, not defined as a module-level attribute of hermes_cli.main. Patch the source module so the local import picks up the mock. Also force simple_term_menu ImportError so tests reliably use the input() fallback path regardless of environment. Co-Authored-By: Claude <noreply@anthropic.com> * fix(model): normalize native provider-prefixed model ids * fix(model): normalize direct provider ids in auxiliary routing * fix(model): tighten direct-provider fallback normalization * fix: profile paths broken in Docker — profiles go to /root/.hermes instead of mounted volume (#7170) In Docker, HERMES_HOME=/opt/data (set in Dockerfile) and users mount their .hermes directory to /opt/data. However, profile operations used Path.home() / '.hermes' which resolves to /root/.hermes in Docker — an ephemeral container path, not the mounted volume. This caused: - Profiles created at /root/.hermes/profiles/ (lost on container recreate) - active_profile sticky file written to wrong location - profile list looking at wrong directory Fix: Add get_default_hermes_root() to hermes_constants.py that detects Docker/custom deployments (HERMES_HOME outside ~/.hermes) and returns HERMES_HOME as the root. Also handles Docker profiles correctly (<root>/profiles/<name> → root is grandparent). Files changed: - hermes_constants.py: new get_default_hermes_root() - hermes_cli/profiles.py: _get_default_hermes_home() delegates to shared fn - hermes_cli/main.py: _apply_profile_override() + _invalidate_update_cache() - hermes_cli/gateway.py: _profile_suffix() + _profile_arg() - Tests: 12 new tests covering Docker scenarios * feat(gateway): add native Weixin/WeChat support via iLink Bot API Add first-class Weixin platform adapter for personal WeChat accounts: - Long-poll inbound delivery via iLink getupdates - AES-128-ECB encrypted CDN media upload/download - QR-code login flow for gateway setup wizard - context_token persistence for reply continuity - DM/group access policies with allowlists - Native text, image, video, file, voice handling - Markdown formatting with header rewriting and table-to-list conversion - Block-aware message chunking (preserves fenced code blocks) - Typing indicators via getconfig/sendtyping - SSRF protection on remote media downloads - Message deduplication with TTL Integration across all gateway touchpoints: - Platform enum, config, env overrides, connected platforms check - Adapter creation in gateway runner - Authorization maps (allowed users, allow all) - Cron delivery routing - send_message tool with native media support - Toolset definition (hermes-weixin) - Channel directory (session-based) - Platform hint in prompt builder - CLI status display - hermes tools default toolset mapping Co-authored-by: Zihan Huang <bravohenry@users.noreply.github.com> * fix: salvage follow-ups for Weixin adapter (#6747) - Remove sys.path.insert hack (leftover from standalone dev) - Add token lock (acquire_scoped_lock/release_scoped_lock) in connect()/disconnect() to prevent duplicate pollers across profiles - Fix get_connected_platforms: WEIXIN check must precede generic token/api_key check (requires both token AND account_id) - Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS - Add gateway setup wizard with QR login flow - Add platform status check for partially configured state - Add weixin.md docs page with full adapter documentation - Update environment-variables.md reference with all 11 env vars - Update sidebars.ts to include weixin docs page - Wire all gateway integration points onto current main Salvaged from PR #6747 by Zihan Huang. * fix: complete Weixin platform parity audit — 16 missing integration points Systematic audit found Weixin missing from: Code: - gateway/run.py: early WEIXIN_ALLOW_ALL_USERS env check - gateway/platforms/webhook.py: cross-platform delivery routing - hermes_cli/dump.py: platform detection for config export - hermes_cli/setup.py: hermes setup wizard platform list + _setup_weixin - hermes_cli/skills_config.py: platform labels for skills config UI Docs (11 pages): - developer-guide/architecture.md: platform adapter listing - developer-guide/cron-internals.md: delivery target table - developer-guide/gateway-internals.md: file tree - guides/cron-troubleshooting.md: supported platforms list - integrations/index.md: platform links - reference/toolsets-reference.md: toolset table - user-guide/configuration.md: platform keys for tool_progress - user-guide/features/cron.md: delivery target table - user-guide/messaging/index.md: intro text, feature table, mermaid diagram, toolset table, setup links - user-guide/messaging/webhooks.md: deliver field + routing table - user-guide/sessions.md: platform identifiers table * fix(gateway): handle provider command without config * feat(gateway): add fast mode support to gateway chats * fix: add _session_model_overrides to test runner fixture Follow-up for cherry-pick — _session_model_overrides was added to GatewayRunner.__init__ after the fast mode PR was written. * fix: fall back to default certs when CA bundle path doesn't exist (#7352) _resolve_verify() returned stale CA bundle paths fr…

…etween turns (NousResearch#7038) The gateway /usage handler only looked in _running_agents for the agent object, which is only populated while the agent is actively processing a message. Between turns (when users actually type /usage), the dict is empty and the handler fell through to a rough message-count estimate. The agent object actually lives in _agent_cache between turns (kept for prompt caching). This fix checks both dicts, with _running_agents taking priority (mid-turn) and _agent_cache as the between-turns fallback. Also brings the gateway output to parity with the CLI /usage: - Model name - Detailed token breakdown (input, output, cache read, cache write) - Cost estimation (estimated amount or 'included' for subscriptions) - Cache token lines hidden when zero (cleaner output) This fixes Nous Portal rate limit headers not showing up for gateway users — the data was being captured correctly but the handler could never see it.

…le boundary (#1) * fix(anthropic): omit tool-streaming beta on MiniMax endpoints MiniMax's Anthropic-compatible endpoints reject requests that include the fine-grained-tool-streaming beta header — every tool-use message triggers a connection error (~18s timeout). Regular chat works fine. Add _common_betas_for_base_url() that filters out the tool-streaming beta for Bearer-auth (MiniMax) endpoints while keeping all other betas. All four client-construction branches now use the filtered list. Based on #6528 by @HiddenPuppy. Original cherry-picked from PR #6688 by kshitijk4poor. Fixes #6510, fixes #6555. * fix: add actionable hint for OpenRouter 'no tool endpoints' error When OpenRouter returns 'No endpoints found that support tool use' (HTTP 404), display a hint explaining that provider routing restrictions may be filtering out tool-capable providers. Links the user directly to the model's OpenRouter page to check which providers support tools. The hint fires in the error display block that runs regardless of whether fallback succeeds — so the user always understands WHY the model failed, not just that it fell back. Reported via Discord: GLM-5.1 on OpenRouter with US-based provider restrictions eliminated all 4 tool-supporting endpoints (DeepInfra, Z.AI, Friendli, Venice), leaving only 7 non-tool providers. * fix: skip stale Nous pool entry when agent_key is expired * fix: sync refreshed OAuth tokens from pool back to auth.json providers * fix: proactive Codex CLI sync before refresh + retry on failure * fix: add auth.json write-back for Codex retry and valid-token early-return paths The Codex retry block and valid-token short-circuit in _refresh_entry() both return early, bypassing the auth.json sync at the end of the method. This adds _sync_device_code_entry_to_auth_store() calls on both paths so refreshed/synced tokens are written back to auth.json regardless of which code path succeeds. * feat: add Codex fast mode toggle (/fast command) Add /fast slash command to toggle OpenAI Codex service_tier between normal and priority ('fast') inference. Only exposed for models registered in _FAST_MODE_BACKEND_CONFIG (currently gpt-5.4). - Registry-based backend config for extensibility - Dynamic command visibility (hidden from help/autocomplete for non-supported models) via command_filter on SlashCommandCompleter - service_tier flows through request_overrides from route resolution - Omit max_output_tokens for Codex backend (rejects it) - Persists to config.yaml under agent.service_tier Salvage cleanup: removed simple_term_menu/input() menu (banned), bare /fast now shows status like /reasoning. Removed redundant override resolution in _build_api_kwargs — single source of truth via request_overrides from route. Co-authored-by: Hermes Agent <hermes@nousresearch.com> * feat: expand /fast to all OpenAI Priority Processing models (#6960) Previously /fast only supported gpt-5.4 and forced a provider switch to openai-codex. Now supports all 13 models from OpenAI's Priority Processing pricing table (gpt-5.4, gpt-5.4-mini, gpt-5.2, gpt-5.1, gpt-5, gpt-5-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o4-mini). Key changes: - Replaced _FAST_MODE_BACKEND_CONFIG with _PRIORITY_PROCESSING_MODELS frozenset - Removed provider-forcing logic — service_tier is now injected into whatever API path the user is already on (Codex Responses, Chat Completions, or OpenRouter passthrough) - Added request_overrides support to chat_completions path in run_agent.py - Updated messaging from 'Codex inference tier' to 'Priority Processing' - Expanded test coverage for all supported models * fix(streaming): prevent <think> in prose from suppressing response output When the model mentions <think> as literal text in its response (e.g. "(/think not producing <think> tags)"), the streaming display treated it as a reasoning block opener and suppressed everything after it. The response box would close with truncated content and no error — the API response was complete but the display ate it. Root cause: _stream_delta() matched <think> anywhere in the text stream regardless of position. Real reasoning blocks always start at the beginning of a line; mentions in prose appear mid-sentence. Fix: track line position across streaming deltas with a _stream_last_was_newline flag. Only enter reasoning suppression when the tag appears at a block boundary (start of stream, after a newline, or after only whitespace on the current line). Add a _flush_stream() safety net that recovers buffered content if no closing tag is found by end-of-stream. Also fixes three related issues discovered during investigation: - anthropic_adapter: _get_anthropic_max_output() now normalizes dots to hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key (was returning 32K instead of 128K) - run_agent: send explicit max_tokens for Claude models on Nous Portal, same as OpenRouter — both proxy to Anthropic's API which requires it. Without it the backend defaults to a low limit that truncates responses. - run_agent: reset truncated_tool_call_retries after successful tool execution so a single truncation doesn't poison the entire conversation. * fix: increase stream read timeout default to 120s, auto-raise for local LLMs (#6967) Raise the default httpx stream read timeout from 60s to 120s for all providers. Additionally, auto-detect local LLM endpoints (Ollama, llama.cpp, vLLM) and raise the read timeout to HERMES_API_TIMEOUT (1800s) since local models can take minutes for prefill on large contexts before producing the first token. The stale stream timeout already had this local auto-detection pattern; the httpx read timeout was missing it — causing a hard 60s wall that users couldn't find (HERMES_STREAM_READ_TIMEOUT was undocumented). Changes: - Default HERMES_STREAM_READ_TIMEOUT: 60s -> 120s - Auto-detect local endpoints -> raise to 1800s (user override respected) - Document HERMES_STREAM_READ_TIMEOUT and HERMES_STREAM_STALE_TIMEOUT - Add 10 parametrized tests Reported-by: Pavan Srinivas (@pavanandums) * fix(telegram): adaptive batch delay for split long messages Cherry-picked from PR #6891 by SHL0MS. When a chunk is near the 4096-char split point, wait 2.0s instead of 0.6s since a continuation is almost certain. * fix(discord): add text batching to merge split long messages Cherry-picked from PR #6894 by SHL0MS with fixes: - Only batch TEXT messages; commands/media dispatch immediately - Use build_session_key() for proper session-scoped batch keys - Consistent naming (_text_batch_delay_seconds) - Proper Dict[str, MessageEvent] typing Discord splits at 2000 chars (lowest of all platforms). Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. * fix(matrix): add text batching to merge split long messages Ports the adaptive batching pattern from the Telegram adapter. Matrix clients split messages around 4000 chars. Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. Only text messages are batched; commands dispatch immediately. Ref #6892 * fix(wecom): add text batching to merge split long messages Ports the adaptive batching pattern from the Telegram adapter. WeCom clients split messages around 4000 chars. Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. Only text messages are batched; commands/media dispatch immediately. Ref #6892 * fix(feishu): add adaptive batch delay for split long messages Feishu already had text batching with a static 0.6s delay. This adds adaptive delay: waits 2.0s when a chunk is near the ~4096-char split point since a continuation is almost certain. Tracks _last_chunk_len on each queued event to determine the delay. Configurable via HERMES_FEISHU_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 2.0). Ref #6892 * test: add text batching tests for Discord, Matrix, WeCom, Telegram, Feishu 22 tests covering: - Single message dispatch after delay - Split message aggregation (2-way and 3-way) - Different chats/rooms not merged - Adaptive delay for near-limit chunks - State cleanup after flush - Split continuation merging All 5 platform adapters tested. * test: disable text batching in existing adapter tests Set _text_batch_delay_seconds = 0 on test adapter fixtures so messages dispatch immediately (bypassing async batching). This preserves the existing synchronous assertion patterns while the batching logic is tested separately in test_text_batching.py. * fix(docker): use uv for dependency resolution to fix resolution-too-deep error * docs: document streaming timeout auto-detection for local LLMs (#6990) Add streaming timeout documentation to three pages: - guides/local-llm-on-mac.md: New 'Timeouts' section with table of all three timeouts, their defaults, local auto-adjustments, and env var overrides - reference/faq.md: Tip box in the local models FAQ section - user-guide/configuration.md: 'Streaming Timeouts' subsection under the agent config section Follow-up to #6967. * fix(gateway): bypass text batching when delay is 0 (#6996) The text batching feature routes TEXT messages through asyncio.create_task() + asyncio.sleep(delay). Even with delay=0, the task fires asynchronously and won't complete before synchronous test assertions. This broke 33 tests across Discord, Matrix, and WeCom adapters. When _text_batch_delay_seconds is 0 (the test fixture setting), dispatch directly to handle_message() instead of going through the async batching path. This preserves the pre-batching behavior for tests while keeping batching active in production (default delay 0.6s). * fix: clear conversation_history after mid-loop compression to prevent empty sessions (#7001) After mid-loop compression (triggered by 413, context_overflow, or Anthropic long-context tier errors), _compress_context() creates a new session in SQLite and resets _last_flushed_db_idx=0. However, conversation_history was not cleared, so _flush_messages_to_session_db() computed: flush_from = max(len(conversation_history=200), _last_flushed_db_idx=0) = 200 messages[200:] → empty (compressed messages < 200) This resulted in zero messages being written to the new session's SQLite store. On resume, the user would see 'Session found but has no messages.' The preflight compression path (line 7311) already had the fix: conversation_history = None This commit adds the same clearing to the three mid-loop compression sites: - Anthropic long-context tier overflow - HTTP 413 payload too large - Generic context_overflow error Reported by Aaryan (Nous community). * fix(update): always reset on stash conflict — never leave conflict markers (#7010) When `hermes update` stashes local changes and the restore hits merge conflicts, the old code prompted the user to reset or keep conflict markers. If the user declined the reset, git conflict markers (<<<<<<< Updated upstream) were left in source files, making hermes completely unrunnable with a SyntaxError on the next invocation. Additionally, the interactive path called sys.exit(1), which killed the entire update process before pip dependency install, skill sync, and gateway restart could finish — even though the code pull itself had succeeded. Changes: - Always auto-reset to clean state when stash restore conflicts - Remove the "Reset working tree?" prompt (footgun) - Remove sys.exit(1) — return False so cmd_update continues normally - User's changes remain safely in the stash for manual recovery Also fixes a secondary bug where the conflict handling prompt used bare input() instead of the input_fn parameter, which would hang in gateway mode. Tests updated: replaced prompt/sys.exit assertions with auto-reset behavior checks; removed the "user declines reset" test (path no longer exists). * feat: add Anthropic Fast Mode support to /fast command (#7037) Extends the /fast command to support Anthropic's Fast Mode beta in addition to OpenAI Priority Processing. When enabled on Claude Opus 4.6, adds speed:"fast" and the fast-mode-2026-02-01 beta header to API requests for ~2.5x faster output token throughput. Changes: - hermes_cli/models.py: Add _ANTHROPIC_FAST_MODE_MODELS registry, model_supports_fast_mode() now recognizes Claude Opus 4.6, resolve_fast_mode_overrides() returns {speed: fast} for Anthropic vs {service_tier: priority} for OpenAI - agent/anthropic_adapter.py: Add _FAST_MODE_BETA constant, build_anthropic_kwargs() accepts fast_mode=True which injects speed:fast + beta header via extra_headers (skipped for third-party Anthropic-compatible endpoints like MiniMax) - run_agent.py: Pass fast_mode to build_anthropic_kwargs in the anthropic_messages path of _build_api_kwargs() - cli.py: Update _handle_fast_command with provider-aware messaging (shows 'Anthropic Fast Mode' vs 'Priority Processing') - hermes_cli/commands.py: Update /fast description to mention both providers - tests: 13 new tests covering Anthropic model detection, override resolution, CLI availability, routing, adapter kwargs, and third-party endpoint safety * fix(gateway): /usage now shows rate limits, cost, and token details between turns (#7038) The gateway /usage handler only looked in _running_agents for the agent object, which is only populated while the agent is actively processing a message. Between turns (when users actually type /usage), the dict is empty and the handler fell through to a rough message-count estimate. The agent object actually lives in _agent_cache between turns (kept for prompt caching). This fix checks both dicts, with _running_agents taking priority (mid-turn) and _agent_cache as the between-turns fallback. Also brings the gateway output to parity with the CLI /usage: - Model name - Detailed token breakdown (input, output, cache read, cache write) - Cost estimation (estimated amount or 'included' for subscriptions) - Cache token lines hidden when zero (cleaner output) This fixes Nous Portal rate limit headers not showing up for gateway users — the data was being captured correctly but the handler could never see it. * fix: update Kimi Coding User-Agent to KimiCLI/1.30.0 The hardcoded User-Agent 'KimiCLI/1.3' is outdated — Kimi CLI is now at v1.30.0. The stale version string causes intermittent 403 errors from Kimi's coding endpoint ('only available for Coding Agents'). Update all 8 occurrences across run_agent.py, auxiliary_client.py, and doctor.py to 'KimiCLI/1.30.0' to match the current official Kimi CLI. * fix(cli): add missing os and platform imports in uninstall.py (#7034) Fixes #6983. Contributed by @JiayuuWang. * fix: set retryable=False for message-based auth errors in _classify_by_message() (#7027) Auth errors matched by message pattern were incorrectly marked retryable=True, causing futile retry loops. Aligns with _classify_by_status() which already sets retryable=False for 401/403. Fixes #7026. Contributed by @kuishou68. * Harden setup provider flows Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refresh OpenRouter model catalog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: harden cron script timeout and provider recovery * docs: add cron script timeout and provider recovery documentation - Add HERMES_CRON_TIMEOUT and HERMES_CRON_SCRIPT_TIMEOUT to env vars reference - Add script timeout and provider recovery sections to cron features page - Add timeout resolution chain and credential pool details to cron internals * fix(cli): prevent stale image attachment on text paste and voice input Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): require auth for session continuation and warn on missing API key Two security hardening changes for the API server: 1. **Startup warning when no API key is configured.** When `API_SERVER_KEY` is not set, all endpoints accept unauthenticated requests. This is the default configuration, but operators may not realize the security implications. A prominent warning at startup makes the risk visible. 2. **Require authentication for session continuation.** The `X-Hermes-Session-Id` header allows callers to load and continue any session stored in state.db. Without authentication, an attacker who can reach the API server (e.g. via CORS from a malicious page, or on a shared host) could enumerate session IDs and read conversation history — which may contain API keys, passwords, code, or other sensitive data shared with the agent. Session continuation now returns 403 when no API key is configured, with a clear error message explaining how to enable the feature. When a key IS configured, the existing Bearer token check already gates access. This is defense-in-depth: the API server is intended for local use, but defense against cross-origin and shared-host attacks is important since the default binding is 127.0.0.1 which is reachable from browsers via DNS rebinding or localhost CORS. * fix(gateway): apply /model session overrides so switch persists across messages The gateway /model command stored session overrides in _session_model_overrides but run_sync() never consulted them when resolving the model and runtime for the next message. It always read from config.yaml, so the switch was lost as soon as a new agent was created. Two fixes: 1. In run_sync(), apply _session_model_overrides after resolving from config.yaml/env — the override takes precedence for model, provider, api_key, base_url, and api_mode. 2. In post-run fallback detection, check whether the model mismatch (agent.model != config_model) is due to an intentional /model switch before evicting the cached agent. Without this, the first message after /model would work (cached agent reused) but the fallback detector would evict it, causing the next message to revert. Affects all gateway platforms (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, BlueBubbles, HomeAssistant) since they all share GatewayRunner._run_agent(). Fixes #6213 * fix(terminal): cap foreground timeout to prevent session deadlocks When the model calls terminal() in foreground mode without background=true (e.g. to start a server), the tool call blocks until the command exits or the timeout expires. Without an upper bound the model can request arbitrarily high timeouts (the schema had minimum=1 but no maximum), blocking the entire agent session for hours until the gateway idle watchdog kills it. Changes: - Add FOREGROUND_MAX_TIMEOUT (600s, configurable via TERMINAL_MAX_FOREGROUND_TIMEOUT env var) that caps foreground timeout - Clamp effective_timeout to the cap when background=false and timeout exceeds the limit - Include a timeout_note in the tool result when clamped, nudging the model to use background=true for long-running processes - Update schema description to show the max timeout value - Remove dead clamping code in the background branch that could never fire (max_timeout was set to effective_timeout, so timeout > max_timeout was always false) - Add 7 tests covering clamping, no-clamping, config-default-exceeds-cap edge case, background bypass, default timeout, constant value, and schema content Self-review fixes: - Fixed bug where timeout_note said 'Requested timeout Nones' when clamping fired from config default exceeding cap (timeout param is None). Now uses unclamped_timeout instead of the raw timeout param. - Removed unused pytest import from test file - Extracted test config dict into _make_env_config() helper - Fixed tautological test_default_value assertion - Added missing test for config default > cap with no model timeout * fix: reject foreground timeout above cap instead of clamping Change behavior from silent clamping to returning an error when the model requests a foreground timeout exceeding FOREGROUND_MAX_TIMEOUT. This forces the model to use background=true for long-running commands rather than silently changing its intent. - Config default timeouts above the cap are NOT rejected (user's choice) - Only explicit model-requested timeouts trigger rejection - Added boundary test for timeout exactly at the limit * fix(copilot): add missing Copilot-Integration-Id header The GitHub Copilot API now requires a Copilot-Integration-Id header on all requests. Without it, every API call fails with HTTP 400: "missing required Copilot-Integration-Id header". Uses vscode-chat as the integration ID, matching opencode which shares the same OAuth client ID (Ov23li8tweQw6odWQebz). Fixes: Copilot provider fails with "missing required Copilot-Integration-Id header" (HTTP 400) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(acp): populate usage from top-level result fields * fix(acp): remove dead nested usage dict path run_conversation() never returns a result["usage"] nested dict — token counters are always at the top level. The nested path used the wrong key name ("cached_tokens" vs "cache_read_tokens") and was never reachable. Remove it. * fix(config): allow HERMES_HOME_MODE env var to override _secure_dir() permissions (#6993) Operators running a web server (nginx, caddy) that needs to traverse ~/.hermes/ can now set HERMES_HOME_MODE=0701 (or any octal mode) instead of having _secure_dir() revert their manual chmod on every gateway restart. Default behavior (0o700) is unchanged. Fixes #6991. Contributed by @ygd58. * feat(environments): unified file sync with change tracking and deletion Replace per-backend ad-hoc file sync with a shared FileSyncManager that handles mtime-based change detection, remote deletion of locally-removed files, and transactional state updates. - New FileSyncManager class (tools/environments/file_sync.py) with callbacks for upload/delete, rate limiting, and rollback - Shared iter_sync_files() eliminates 3 duplicate implementations - SSH: replace unconditional rsync with scp + mtime skip - Modal/Daytona: replace inline _synced_files dict with manager - All 3 backends now sync credentials + skills + cache uniformly - Remote deletion: files removed locally are cleaned from remote - HERMES_FORCE_FILE_SYNC=1 env var for debugging - Base class _before_execute() simplified to empty hook - 12 unit tests covering mtime skip, deletion, rollback, rate limiting * test: add reproducible perf benchmark for file sync overhead Direct env.execute() timing — no LLM in the loop. Measures per-command wall-clock including sync check. Results on SSH: - echo median: 617ms (pure SSH round-trip + spawn overhead) - sync-triggered after 6s wait: 621ms (mtime skip adds ~0ms) - within-interval (no sync): 618ms Confirms mtime skip makes sync overhead unmeasurable. * fix(tests): update mocks for file sync changes - Modal snapshot tests: accept **kw in iter_skills_files/iter_cache_files mock lambdas to match new container_base kwarg - SSH preflight test: mock _detect_remote_home, _ensure_remote_dirs, init_session, and FileSyncManager added in file sync PR * fix(gateway): remove DM thread session seeding to prevent cross-thread contamination (#7084) The session store was copying the ENTIRE parent DM transcript into new thread sessions. This caused unrelated conversations to bleed across threads in Slack DMs. The Slack adapter already handles thread context correctly via _fetch_thread_context() (conversations.replies API), which fetches only the actual thread messages. The session-level seeding was both redundant and harmful. No other platform (Telegram, Discord) uses DM threads, so the seeding code path was only triggered by Slack — where it conflicted with the adapter-level context. Tests updated to assert thread isolation: all thread sessions start empty, platform adapters are responsible for injecting thread context. Salvage of PR #5868 (jarvisxyz). Reported by norbert on Discord. * feat(discord): add allowed_channels whitelist config Add DISCORD_ALLOWED_CHANNELS (env var) / discord.allowed_channels (config.yaml) support to restrict the bot to only respond in specified channels. When set, messages from any channel NOT in the allowed list are silently ignored — even if the bot is @mentioned. This provides a secure default- deny posture vs the existing ignored_channels which is default-allow. This is especially useful when bots in other channels may create new channels dynamically (e.g., project bots) — a blacklist requires constant maintenance while a whitelist is set-and-forget. Follows the same config pattern as ignored_channels and free_response_channels: - Env var: DISCORD_ALLOWED_CHANNELS (comma-separated channel IDs) - Config: discord.allowed_channels (string or list of channel IDs) - Env var takes precedence over config.yaml - Empty/unset = no restriction (backward compatible) Files changed: - gateway/platforms/discord.py: check allowed_channels before ignored_channels - gateway/config.py: map discord.allowed_channels → DISCORD_ALLOWED_CHANNELS - hermes_cli/config.py: add allowed_channels to DEFAULT_CONFIG * fix(model_metadata): add xAI Grok context length fallbacks xAI /v1/models does not return context_length metadata, so Hermes probes down to the 128k default whenever a user configures a custom provider pointing at https://api.x.ai/v1. This forces every xAI user to manually override model.context_length in config.yaml (2M for Grok 4.20 / 4.1-fast / 4-fast) or lose most of the usable context window. Add DEFAULT_CONTEXT_LENGTHS entries for the Grok family so the fallback lookup returns the correct value via substring matching. Values sourced from models.dev (2026-04) and cross-checked against the xAI /v1/models listing: - grok-4.20-* 2,000,000 (reasoning, non-reasoning, multi-agent) - grok-4-1-fast-* 2,000,000 - grok-4-fast-* 2,000,000 - grok-4 / grok-4-0709 256,000 - grok-code-fast-1 256,000 - grok-3* 131,072 - grok-2 / latest 131,072 - grok-2-vision* 8,192 - grok (catch-all) 131,072 Keys are ordered longest-first so that specific variants match before the catch-all, consistent with the existing Claude/Gemma/MiniMax entries. Add TestDefaultContextLengths.test_grok_models_context_lengths and test_grok_substring_matching to pin the values and verify the full lookup path. All 77 tests in test_model_metadata.py pass. * fix(dingtalk,api): validate session webhook URL origin, cap webhook cache, reject header injection dingtalk.py: The session_webhook URL from incoming DingTalk messages is POSTed to without any origin validation (line 290), enabling SSRF attacks via crafted webhook URLs (e.g. http://169.254.169.254/ to reach cloud metadata). Add a regex check that only accepts the official DingTalk API origin (https://api.dingtalk.com/). Also cap _session_webhooks dict at 500 entries with FIFO eviction to prevent unbounded memory growth from long-running gateway instances. api_server.py: The X-Hermes-Session-Id request header is accepted and echoed back into response headers (lines 675, 697) without sanitization. A session ID containing \r\n enables HTTP response splitting / header injection. Add a check that rejects session IDs containing control characters (\r, \n, \x00). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: align auth-by-message classification with status-code path, decode URLs before secret check error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized", etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401 path (line 432) which correctly uses retryable=False + should_fallback=True. The mismatch causes 3 wasted retries with the same broken credential before fallback, while 401 errors immediately attempt fallback. Align the message-based path to match: retryable=False, should_fallback=True. web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs against the raw URL string (line 1196). URL-encoded secrets like %73k-1234... ( sk-1234...) bypass the filter because the regex expects literal ASCII. Add urllib.parse.unquote() before the check so percent-encoded variants are also caught. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(approval,mcp): log silent exception handlers, narrow OAuth catches, close server on error Three silent `except Exception` blocks in approval.py (lines 345, 387, 469) return fallback values with zero logging — making it impossible to debug callback failures, allowlist load errors, or config read issues. Add logger.warning/error calls that match the pattern already used by save_permanent_allowlist() and _smart_approve() in the same file. In mcp_oauth.py, narrow the overly-broad `except Exception` in get_tokens() and get_client_info() to the specific exceptions Pydantic's model_validate() can raise (ValueError, TypeError, KeyError), and include the exception message in the warning. Also wrap the _wait_for_callback() polling loop in try/finally so the HTTPServer is always closed — previously an asyncio.CancelledError or any exception in the loop would leak the server socket. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: hidden_div regex bypass with newlines, credential config silent failure, webhook route error severity prompt_builder.py: The `hidden_div` detection pattern uses `.*` which does not match newlines in Python regex (re.DOTALL is not passed). An attacker can bypass detection by splitting the style attribute across lines: `<div style="color:red;\ndisplay: none">injected content</div>` Replace `.*` with `[\s\S]*?` to match across line boundaries. credential_files.py: `_load_config_files()` catches all exceptions at DEBUG level (line 171), making YAML parse failures invisible in production logs. Users whose credential files silently fail to mount into sandboxes have no diagnostic clue. Promote to WARNING to match the severity pattern used by the path validation warnings at lines 150 and 158 in the same function. webhook.py: `_reload_dynamic_routes()` logs JSON parse failures at WARNING (line 265) but the impact — stale/corrupted dynamic routes persisting silently — warrants ERROR level to ensure operator visibility in alerting pipelines. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: apply hidden_div regex newline bypass fix to skills_guard.py The same .* pattern vulnerable to newline bypass that was fixed in prompt_builder.py (PR #6925) also existed in skills_guard.py. Changed to [\s\S]*? to match across newlines. * test: update session ID tests to require auth (follow-up to #6930) Session continuation now requires API_SERVER_KEY to be configured. Update TestSessionIdHeader tests to use auth_adapter with Bearer token. * fix: include custom_providers in /model command listings and resolution Custom providers defined in config.yaml under were completely invisible to the /model command in both gateway (Telegram, Discord, etc.) and CLI. The provider listing skipped them and explicit switching via --provider failed with "Unknown provider". Root cause: gateway/run.py, cli.py, and model_switch.py only read the dict from config, ignoring entirely. Changes: - providers.py: add resolve_custom_provider() and extend resolve_provider_full() to check custom_providers after user_providers - model_switch.py: propagate custom_providers through switch_model(), list_authenticated_providers(), and get_authenticated_provider_slugs(); add custom provider section to provider listings - gateway/run.py: read custom_providers from config, pass to all model-switch calls - cli.py: hoist config loading, pass custom_providers to listing and switch calls Tests: 4 new regression tests covering listing, resolution, and gateway command handler. All 71 tests pass. * fix: extract custom_provider_slug() helper, harden gateway test - Add custom_provider_slug() to hermes_cli/providers.py as the single source of truth for building 'custom:<name>' slugs. - Use it in resolve_custom_provider() and list_authenticated_providers() instead of duplicated inline slug construction. - Add _session_model_overrides and _voice_mode to gateway test runner for object.__new__() safety. * mattermost added as deliver to webhook gateway * fix: add all platforms to webhook cross-platform delivery The delivery tuple in webhook.py only had 5 of 14 platforms with gateway adapters. Adds whatsapp, matrix, mattermost, homeassistant, email, dingtalk, feishu, wecom, and bluebubbles so webhooks can deliver to any connected platform. Updates docs delivery options table to list all platforms. Follow-up to cherry-picked fix from olafthiele (PR #7035). * feat(cron): support Discord thread_id in deliver targets Add Discord thread support to cron delivery and send_message_tool. - _parse_target_ref: handle discord platform with chat_id:thread_id format - _send_discord: add thread_id param, route to /channels/{thread_id}/messages - _send_to_platform: pass thread_id through for Discord - Discord adapter send(): read thread_id from metadata for gateway path - Update tool schema description to document Discord thread targets Cherry-picked from PR #7046 by pandacooming (maxyangcn). Follow-up fixes: - Restore proxy support (resolve_proxy_url/proxy_kwargs_for_aiohttp) that was accidentally deleted — would have caused NameError at runtime - Remove duplicate _DISCORD_TARGET_RE regex; reuse existing _TELEGRAM_TOPIC_TARGET_RE via _NUMERIC_TOPIC_RE alias (identical pattern) - Fix misleading test comments about Discord negative snowflake IDs (Discord uses positive snowflakes; negative IDs are a Telegram convention) - Rewrite misleading scheduler test that claimed to exercise home channel fallback but actually tested the explicit platform:chat_id parsing path * fix(run_agent): recover primary client on openai transport errors * fix(bluebubbles): auto-register webhook with BlueBubbles server on connect **Problem:** The BlueBubbles iMessage gateway was not receiving incoming messages even though: 1. BlueBubbles Server was properly configured and running 2. Hermes gateway started without errors 3. Webhook listener was started on the configured port The root cause was that the BlueBubbles adapter only started a local webhook listener but never registered the webhook URL with the BlueBubbles server via the API. Without registration, the server doesn't know where to send events. **Fix:** 1. Added _register_webhook() method that POSTs to /api/v1/webhook with the listener URL and event types (new-message, updated-message, message) 2. Added _unregister_webhook() method for clean shutdown 3. Both methods handle the case where webhook listens on 0.0.0.0/127.0.0.1 by using 'localhost' as the external hostname 4. Fixed documentation: 'hermes gateway logs' → 'hermes logs gateway' **API Reference:** https://docs.bluebubbles.app/server/developer-guides/rest-api-and-webhooks **Testing:** - Webhook registration is now automatic when gateway starts - Failed registration logs a warning but doesn't prevent startup - Clean shutdown unregisters the webhook Closes: iMessage gateway not working issue * fix: improve bluebubbles webhook registration resilience Follow-up to cherry-picked PR #6592: - Extract _webhook_url property to deduplicate URL construction - Add _find_registered_webhooks() helper for reuse - Crash resilience: check for existing registration before POSTing (handles restart after unclean shutdown without creating duplicates) - Accept 200-299 status range (not just 200) for webhook creation - Unregister removes ALL matching registrations (cleans up orphaned dupes) - Add 17 tests covering register/unregister/find/edge cases * fix(run-agent): rotate credential pool on billing-classified 400s * fix: STT provider-model mismatch — whisper-1 fed to faster-whisper (#7113) Legacy flat stt.model config key (from cli-config.yaml.example and older versions) was passed as a model override to transcribe_audio() by the gateway, bypassing provider-specific model resolution. When the provider was 'local' (faster-whisper), this caused: ValueError: Invalid model size 'whisper-1' Changes: - gateway/run.py, discord.py: stop passing model override — let transcribe_audio() handle provider-specific model resolution internally - get_stt_model_from_config(): now provider-aware, reads from the correct nested section (stt.local.model, stt.openai.model, etc.); ignores legacy flat key for local provider to prevent model name mismatch - cli-config.yaml.example: updated STT section to show nested provider config structure instead of legacy flat key - config migration v13→v14: moves legacy stt.model to the correct provider section and removes the flat key Reported by community user on Discord. * fix(gateway): remap all paths in system service unit to target user's home When installing a system service via sudo, ExecStart, WorkingDirectory, VIRTUAL_ENV, and PATH entries were not remapped to the target user's home — only HERMES_HOME was. This caused the service to fail with status=200/CHDIR because the target user cannot access /root/. Adds _remap_path_for_user() helper and applies it to all path variables in the system branch of generate_systemd_unit(). Closes #6989 * fix(streaming): update stale-stream timer during Anthropic native streaming (#7117) The _call_anthropic() streaming path never updated last_chunk_time during the event loop — only once at stream start. The stale stream detector in the outer poll loop uses this timer, so any Anthropic stream longer than 180s was killed even when events were actively arriving. This self-inflicted a RemoteProtocolError that users saw as: '⚠️ Connection to provider dropped (RemoteProtocolError). Reconnecting…' The _call_chat_completions() path already updates last_chunk_time on every chunk (line 4475). This brings _call_anthropic() to parity. Also adds deltas_were_sent tracking to the Anthropic text_delta path so the retry loop knows not to retry after partial delivery (prevents duplicated output on connection drops mid-stream). Reported-by: Discord users (Castellani, Codename_11) * fix(gateway): scope /yolo to the active session * fix(mcp): combine content and structuredContent when both present (#7118) When an MCP server returns both content (model-oriented text) and structuredContent (machine-oriented JSON), the client now combines them instead of discarding content. The text content becomes the primary result (what the agent reads), and structuredContent is included as supplementary metadata. Previously, structuredContent took full precedence — causing data loss for servers like Desktop Commander that put the actual file text in content and metadata in structuredContent. MCP spec guidance: for conversational/agent UX, prefer content. * fix: remove 115 verified dead code symbols across 46 production files Automated dead code audit using vulture + coverage.py + ast-grep intersection, confirmed by Opus deep verification pass. Every symbol verified to have zero production callers (test imports excluded from reachability analysis). Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py, hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py). Co-authored-by: alt-glitch <balyan.sid@gmail.com> * fix: restore 6 tests that tested live code but used deleted helpers * chore: remove spec-dead-code.md from tracked files * fix: clean up stale test references to removed attributes * fix: update 6 test files broken by dead code removal - test_percentage_clamp.py: remove TestContextCompressorUsagePercent class and test_context_compressor_clamped (tested removed get_status() method) - test_credential_pool.py: remove test_mark_used_increments_request_count (tested removed mark_used()), replace active_lease_count() calls with direct _active_leases dict access, remove mark_used from thread test - test_session.py: replace SessionSource.local_cli() factory calls with direct SessionSource construction (local_cli classmethod removed) - test_error_classifier.py: remove test_is_transient_property (tested removed is_transient property on ClassifiedError) - test_delivery.py: remove TestDeliveryRouter class (tested removed resolve_targets method), clean up unused imports - test_skills_hub.py: remove test_is_hub_installed (tested removed is_hub_installed method on HubLockFile) * fix(gateway): launchd_stop uses bootout so KeepAlive doesn't respawn (#7119) launchd_stop() previously used `launchctl kill SIGTERM` which only signals the process. Because the plist has KeepAlive.SuccessfulExit=false, launchd immediately respawns the gateway — making `hermes gateway stop` a no-op that prints '✓ Service stopped' while the service keeps running. Switch to `launchctl bootout` which unloads the service definition so KeepAlive can't trigger. The process exits and stays stopped until `hermes gateway start` (which already handles re-bootstrapping unloaded jobs via error codes 3/113). Also adds _wait_for_gateway_exit() after bootout to ensure the process is fully gone before returning, and tolerates 'already unloaded' errors. Fixes: .env changes not taking effect after gateway stop+restart on macOS. The root cause was that stop didn't actually stop — the respawned process loaded the old env before the user's restart command ran. * fix(acp): declare session load and resume capabilities in initialize response (#6985) The resume_session and load_session handlers were implemented but undiscoverable by ACP clients because the capabilities weren't declared in the initialize response. Adds load_session=True and resume=SessionResumeCapabilities() plus wire-format tests. Fixes #6633. Contributed by @luyao618. * docs: add cron troubleshooting guide Adds a troubleshooting guide for Hermes cron jobs covering: - Jobs not firing (schedule, gateway, timezone checks) - Delivery failures (platform tokens, [SILENT], permissions) - Skill loading failures (installed, ordering, interactive tools) - Job errors (script paths, lock contention, permissions) - Performance issues and diagnostic commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct inaccuracies and add sidebar entry for cron troubleshooting guide - Fix job state display: [active] not scheduled - Fix CLI mode claim: only gateway fires cron, not CLI sessions - Expand delivery targets table (5 → 10+ platforms with platform:chat_id syntax) - Fix disabled toolsets: cronjob, messaging, and clarify (not just cronjob) - Remove nonexistent 'hermes skills sync' command reference - Fix log file path: agent.log/errors.log, not scheduler.log - Fix execution model: sequential, not thread pool concurrent - Fix 'hermes cron run' description: next tick, not immediate - Add inactivity-based timeout details (HERMES_CRON_TIMEOUT) - Add sidebar entry in sidebars.ts under Guides & Tutorials * fix(gateway): avoid false failure reactions on restart cancellation * fix(gateway): route /background through active-session bypass When /background was sent during an active run, it was not in the platform adapter's bypass list and fell through to the interrupt path instead of spawning a parallel background task. Add "background" to the active-session command bypass in the platform adapter, and add an early return in the gateway runner's running-agent guard to route /background to _handle_background_command() before it reaches the default interrupt logic. Fixes #6827 * test(gateway): add /background to active-session bypass tests Adds a regression test verifying that /background bypasses the active-session guard in the platform adapter, matching the existing test pattern for /stop, /new, /approve, /deny, and /status. * fix(gateway): replace assertions with proper error handling in Telegram and Feishu Python assertions are stripped when running with `python -O` (optimized mode), making them unsuitable for runtime error handling. 1. `telegram_network.py:113` — After exhausting all fallback IPs, the code uses `assert last_error is not None` before `raise last_error`. In optimized mode, the assert is skipped; if `last_error` is unexpectedly None, `raise None` produces a confusing `TypeError` instead of a meaningful error. Replace with an explicit `if` check that raises `RuntimeError` with a descriptive message. 2. `feishu.py:975` — The `_configure_with_overrides` closure uses `assert original_configure is not None` as a guard. While the outer scope only installs this closure when `original_configure` is not None, the assert would silently disappear in optimized mode. Replace with an explicit `if` check for defensive safety. * fix(telegram): harden HTTPX request pools during reconnect - configure Telegram HTTPXRequest pool/timeouts with env-overridable defaults\n- use separate request/get_updates request objects to reduce pool contention\n- skip fallback-IP transport when proxy is configured (or explicitly disabled)\n\nThis mitigates recurrent pool-timeout failures during polling reconnect/bootstrap (delete_webhook). * fix(gateway): prevent duplicate messages on no-message-id platforms Platforms that don't return a message_id after the first send (Signal, GitHub webhooks) were causing GatewayStreamConsumer to re-enter the "first send" path on every tool boundary, posting one platform message per tool call (observed as 155 PR comments on a single response). Fix: treat _message_id == "__no_edit__" as a sentinel meaning "platform accepted the send but cannot be edited". When a tool boundary arrives in that state, skip the message_id/accumulated/last_sent_text reset so all continuation text is delivered once via _send_fallback_final rather than re-posted per segment. Also make prompt_toolkit imports in hermes_cli/commands.py optional so gateway and test environments that lack the package can still import resolve_command, gateway_help_lines, and COMMAND_REGISTRY. * fix(tests): repair three pre-existing gateway test failures - test_background_autocompletes: pytest.importorskip("prompt_toolkit") so the test skips gracefully where the CLI dep is absent - test_run_agent_progress_stays_in_originating_topic: update stale emoji 💻 → ⚙️ to match get_tool_emoji("terminal", default="⚙️") in run.py - test_internal_event_bypass{_authorization,_pairing}: mock _handle_message_with_agent to raise immediately; avoids the 300s run_in_executor hang that caused the tests to time out * fix(gateway): implement platform-aware PID termination * fix: prevent duplicate completion notifications on process kill (#7124) When kill_process() sends SIGTERM, both it and the reader thread race to call _move_to_finished() — kill_process sets exit_code=-15 and enqueues a notification, then the reader thread's process.wait() returns with exit_code=143 (128+SIGTERM) and enqueues a second one. Fix: make _move_to_finished() idempotent by tracking whether the session was actually removed from _running. The second call sees it was already moved and skips the completion_queue.put(). Adds regression test: test_move_to_finished_idempotent_no_duplicate * fix(gateway): validate Slack image downloads before caching Slack may return an HTML sign-in/redirect page instead of actual media bytes (e.g. expired token, restricted file access). This adds two layers of defense: 1. Content-Type check in slack.py rejects text/html responses early 2. Magic-byte validation in base.py's cache_image_from_bytes() rejects non-image data regardless of source platform Also adds ValueError guards in wecom.py and email.py so the new validation doesn't crash those adapters. Closes #6829 * fix(api-server): share one Docker container across all API conversations (#7127) The API server's _run_agent() was not passing task_id to run_conversation(), causing a fresh random UUID per request. This meant every Open WebUI message spun up a new Docker container and tore it down afterward — making persistent filesystem state impossible. Two fixes: 1. Pass task_id="default" so all API server conversations share the same Docker container (matching the design intent: one configured Docker environment, always the same container). 2. Derive a stable session_id from the system prompt + first user message hash instead of uuid4(). This stops hermes sessions list from being polluted with single-message throwaway sessions. Fixes #3438. * fix(security): prevent SSRF redirect bypass in Slack adapter * fix: make safe_url_for_log public, add SSRF redirect guards to base.py cache helpers Follow-up to Dusk1e's PR #7120 (Slack send_image redirect guard): - Rename _safe_url_for_log -> safe_url_for_log (drop underscore) since it is now imported cross-module by the Slack adapter - Add _ssrf_redirect_guard httpx event hook to cache_image_from_url() and cache_audio_from_url() in base.py — same pattern as vision_tools and the Slack adapter fix - Update url_safety.py docstring to reflect broader coverage - Add regression tests for image/audio redirect blocking + safe passthrough * fix(security): enforce path boundary checks in skill manager operations * feat(auth): add is_provider_explicitly_configured() helper Gate function for checking whether a user has explicitly selected a provider via hermes model/setup, auth.json active_provider, or env vars. Used in subsequent commits to prevent unauthorized credential auto-discovery. Follows the pattern from PR #4210. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(auth): gate Claude Code credential seeding behind explicit provider config _seed_from_singletons('anthropic') now checks is_provider_explicitly_configured('anthropic') before reading ~/.claude/.credentials.json. Without this, the auxiliary client fallback chain silently discovers and uses Claude Code tokens when the user's primary provider key is invalid — consuming their Claude Max subscription quota without consent. Follows the same gating pattern as PR #4210 (setup wizard gate) but applied to the credential pool seeding path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(auth): make 'auth remove' for claude_code prevent re-seeding Previously, removing a claude_code credential from the anthropic pool only printed a note — the next load_pool() re-seeded it from ~/.claude/.credentials.json. Now writes a 'suppressed_sources' flag to auth.json that _seed_from_singletons checks before seeding. Follows the pattern of env: source removal (clears .env var) and device_code removal (clears auth store state). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(auxiliary): skip anthropic in fallback chain when not explicitly configured _resolve_api_key_provider() now checks is_provider_explicitly_configured before calling _try_anthropic(). Previously, any auxiliary fallback (e.g. when kimi-coding key was invalid) would silently discover and use Claude Code OAuth tokens — consuming the user's Claude Max subscription without their knowledge. This is the auxiliary-client counterpart of the setup-wizard gate in PR #4210. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * security(approval): close 4 pattern gaps found by source-grounded audit Four gaps in DANGEROUS_PATTERNS found by running 10 targeted tests that each mapped to a specific pattern in approval.py and checked whether the documented defense actually held. 1. **Heredoc script injection** — `python3 << 'EOF'` bypasses the existing `-e`/`-c` flag pattern. Adds pattern for interpreter + `<<` covering python{2,3}, perl, ruby, node. 2. **PID expansion self-termination** — `kill -9 $(pgrep hermes)` is opaque to the existing `pkill|killall` + name pattern because command substitution is not expanded at detection time. Adds structural patterns matching `kill` + `$(pgrep` and backtick variants. 3. **Git destructive operations** — `git reset --hard`, `push --force`, `push -f`, `clean -f*`, and `branch -D` were entirely absent. Note: `branch -d` also triggers because IGNORECASE is global — acceptable since -d is still a delete, just a safe one, and the prompt is only a confirmation, not a hard block. 4. **chmod +x then execute** — two-step social engineering where a script containing dangerous commands is first written to disk (not checked by write_file), then made executable and run as `./script`. Pattern catches `chmod +x ... [;&|]+ ./` combos. Does not solve the deeper architectural issue (write_file not checking content) — that is called out in the PR description as a known limitation. Tests: 23 new cases across 4 test classes, all in test_approval.py: - TestHeredocScriptExecution (7 cases, incl. regressions for -c) - TestPgrepKillExpansion (5 cases, incl. safe kill PID negative) - TestGitDestructiveOps (8 cases, incl. safe git status/push negatives) - TestChmodExecuteCombo (3 cases, incl. safe chmod-only negative) Full suite: 146 passed, 0 failed. * fix(feishu): wrap image bytes in BytesIO before uploading to lark SDK * feat(telegram): support custom base_url for credential proxy When extra.base_url is set in the Telegram platform config, use it as the base URL for all Telegram API requests instead of api.telegram.org. This allows agents to route Telegram traffic through the credential proxy, which injects the real bot token — the VM never sees it. Also supports extra.base_file_url for file downloads (defaults to base_url if not set separately). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(cli): make /status show gateway-style session status * fix(matrix): remove eyes reaction on processing complete The on_processing_complete handler was never removing the eyes reaction because _send_reaction didn't return the reaction event_id. Fix: - _send_reaction returns Optional[str] event_id - on_processing_start stores it in _pending_reactions dict - on_processing_complete redacts the eyes reaction before adding completion emoji * test: update Matrix reaction tests for new _send_reaction return type _send_reaction now returns Optional[str] (event_id) instead of bool. Tests updated: - test_send_reaction: assert result == event_id string - test_send_reaction_no_client: assert result is None - test_on_processing_start_sends_eyes: _send_reaction returns event_id, now also asserts _pending_reactions is populated - test_on_processing_complete_sends_check: set up _pending_reactions and mock _redact_reaction, assert eyes reaction is redacted before sending check * fix(matrix): log redact failures and add missing reaction test cases Add debug logging when eyes reaction redaction fails, and add tests for the success=False path and the no-pending-reaction edge case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(discord): add channel_skill_bindings for auto-loading skills per channel Simplified implementation of the feature from PR #6842 (RunzhouLi). Allows Discord channels/forum threads to auto-bind skills via config: discord: channel_skill_bindings: - id: "123456" skills: ["skill-a", "skill-b"] The run.py auto-skill loader now handles both str and list[str], loading multiple skills in order and concatenating their payloads. Forum threads inherit their parent channel's bindings. Co-authored-by: RunzhouLi <RunzhouLi@users.noreply.github.com> * test(discord): add tests for channel_skill_bindings resolution * fix: flush stdin after curses/terminal menus to prevent escape sequence leakage (#7167) After curses.wrapper() or simple_term_menu exits, endwin() restores the terminal but does NOT drain the OS input buffer. Leftover escape-sequence bytes from arrow key navigation remain buffered and get silently consumed by the next input()/getpass.getpass() call. This caused a user-reported bug where selecting Z.AI/GLM as provider wrote ^[^[ (two ESC chars) into .env as the API key, because the buffered escape bytes were consumed by getpass before the user could type anything. Fix: add flush_stdin() helper using termios.tcflush(TCIFLUSH) and call it after every curses.wrapper() and simple_term_menu .show() return across all interactive menu sites: - hermes_cli/curses_ui.py (curses_checklist) - hermes_cli/setup.py (_curses_prompt_choice) - hermes_cli/tools_config.py (_prompt_choice) - hermes_cli/auth.py (_prompt_model_selection) - hermes_cli/main.py (3 simple_term_menu usages) * fix: UTF-8 config encoding, pairing hint, credential_pool key, header normalization (#7174) Four small fixes: (1) UTF-8 encoding for config open (@zhangchn #7063), (2) pairing hint placeholders (@konsisumer #7057), (3) missing credential_pool in cheap route (@kuishou68 #7025), (4) case-insensitive rate limit headers (@kuishou68 #7019). * fix(telegram): use valid reaction emojis for processing completion (#7175) Telegram's Bot API only allows a specific set of emoji for bot reactions (the ReactionEmoji enum). ✅ (U+2705) and ❌ (U+274C) are not in that set, causing on_processing_complete reactions to silently fail with REACTION_INVALID (caught at debug log level). Replace with 👍 (U+1F44D) / 👎 (U+1F44E) which are always available in Telegram's allowed reaction list. The 👀 (eyes) reaction used by on_processing_start was already valid. Based on the fix by @ppdng in PR #6685. Fixes #6068 * fix: add Alibaba/DashScope rate-limit pattern to error classifier Port from anomalyco/opencode#21355: Alibaba's DashScope API returns a unique throttling message ('Request rate increased too quickly...') that doesn't match standard rate-limit patterns ('rate limit', 'too many requests'). This caused Alibaba errors to fall through to the 'unknown' category rather than being properly classified as rate_limit with appropriate backoff/rotation. Add 'rate increased too quickly' to _RATE_LIMIT_PATTERNS and test with the exact error message observed from the Alibaba provider. * fix: pass config_context_length to switch_model context compressor When switching models at runtime, the config_context_length override was not being passed to the new context compressor instance. This meant the user-specified context length from config.yaml was lost after a model switch. - Store _config_context_length on AIAgent instance during __init__ - Pass _config_context_length when creating new ContextCompressor in switch_model - Add test to verify config_context_length is preserved across model switches Fixes: quando estamos alterando o modelo não está alterando o tamanho do contexto * fix: opencode-go missing from /model list and improve HERMES_OVERLAYS credential check When opencode-go API key is set, it should appear in the /model list. The provider was already in PROVIDER_TO_MODELS_DEV and PROVIDER_REGISTRY, so it appears via Part 1 (built-in source). Also fixes a potential issue in Part 2 (HERMES_OVERLAYS) where providers with auth_type=api_key but no extra_env_vars would not be detected: - Now also checks api_key_env_vars from PROVIDER_REGISTRY for api_key auth_type - Add test verifying opencode-go appears when OPENCODE_GO_API_KEY is set * fix: always show model selection menu for custom providers Previously, _model_flow_named_custom() returned immediately when a saved model existed, making it impossible to switch models on multi-model endpoints (OpenRouter, vLLM clusters, etc.). Now the function always probes the endpoint and shows the selection menu with the current model pre-selected and marked '(current)'. Falls back to the saved model if endpoint probing fails. Fixes #6862 * test: add regression tests for custom provider model switching Covers: probe always called, model switch works, probe failure fallback, first-time flow unchanged. * fix(test): correct mock target for fetch_api_models in custom provider tests fetch_api_models is imported locally inside _model_flow_named_custom from hermes_cli.models, not defined as a module-level attribute of hermes_cli.main. Patch the source module so the local import picks up the mock. Also force simple_term_menu ImportError so tests reliably use the input() fallback path regardless of environment. Co-Authored-By: Claude <noreply@anthropic.com> * fix(model): normalize native provider-prefixed model ids * fix(model): normalize direct provider ids in auxiliary routing * fix(model): tighten direct-provider fallback normalization * fix: profile paths broken in Docker — profiles go to /root/.hermes instead of mounted volume (#7170) In Docker, HERMES_HOME=/opt/data (set in Dockerfile) and users mount their .hermes directory to /opt/data. However, profile operations used Path.home() / '.hermes' which resolves to /root/.hermes in Docker — an ephemeral container path, not the mounted volume. This caused: - Profiles created at /root/.hermes/profiles/ (lost on container recreate) - active_profile sticky file written to wrong location - profile list looking at wrong directory Fix: Add get_default_hermes_root() to hermes_constants.py that detects Docker/custom deployments (HERMES_HOME outside ~/.hermes) and returns HERMES_HOME as the root. Also handles Docker profiles correctly (<root>/profiles/<name> → root is grandparent). Files changed: - hermes_constants.py: new get_default_hermes_root() - hermes_cli/profiles.py: _get_default_hermes_home() delegates to shared fn - hermes_cli/main.py: _apply_profile_override() + _invalidate_update_cache() - hermes_cli/gateway.py: _profile_suffix() + _profile_arg() - Tests: 12 new tests covering Docker scenarios * feat(gateway): add native Weixin/WeChat support via iLink Bot API Add first-class Weixin platform adapter for personal WeChat accounts: - Long-poll inbound delivery via iLink getupdates - AES-128-ECB encrypted CDN media upload/download - QR-code login flow for gateway setup wizard - context_token persistence for reply continuity - DM/group access policies with allowlists - Native text, image, video, file, voice handling - Markdown formatting with header rewriting and table-to-list conversion - Block-aware message chunking (preserves fenced code blocks) - Typing indicators via getconfig/sendtyping - SSRF protection on remote media downloads - Message deduplication with TTL Integration across all gateway touchpoints: - Platform enum, config, env overrides, connected platforms check - Adapter creation in gateway runner - Authorization maps (allowed users, allow all) - Cron delivery routing - send_message tool with native media support - Toolset definition (hermes-weixin) - Channel directory (session-based) - Platform hint in prompt builder - CLI status display - hermes tools default toolset mapping Co-authored-by: Zihan Huang <bravohenry@users.noreply.github.com> * fix: salvage follow-ups for Weixin adapter (#6747) - Remove sys.path.insert hack (leftover from standalone dev) - Add token lock (acquire_scoped_lock/release_scoped_lock) in connect()/disconnect() to prevent duplicate pollers across profiles - Fix get_connected_platforms: WEIXIN check must precede generic token/api_key check (requires both token AND account_id) - Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS - Add gateway setup wizard with QR login flow - Add platform status check for partially configured state - Add weixin.md docs page with full adapter documentation - Update environment-variables.md reference with all 11 env vars - Update sidebars.ts to include weixin docs page - Wire all gateway integration points onto current main Salvaged from PR #6747 by Zihan Huang. * fix: complete Weixin platform parity audit — 16 missing integration points Systematic audit found Weixin missing from: Code: - gateway/run.py: early WEIXIN_ALLOW_ALL_USERS env check - gateway/platforms/webhook.py: cross-platform delivery routing - hermes_cli/dump.py: platform detection for config export - hermes_cli/setup.py: hermes setup wizard platform list + _setup_weixin - hermes_cli/skills_config.py: platform labels for skills config UI Docs (11 pages): - developer-guide/architecture.md: platform adapter listing - developer-guide/cron-internals.md: delivery target table - developer-guide/gateway-internals.md: file tree - guides/cron-troubleshooting.md: supported platforms list - integrations/index.md: platform links - reference/toolsets-reference.md: toolset table - user-guide/configuration.md: platform keys for tool_progress - user-guide/features/cron.md: delivery target table - user-guide/messaging/index.md: intro text, feature table, mermaid diagram, toolset table, setup links - user-guide/messaging/webhooks.md: deliver field + routing table - user-guide/sessions.md: platform identifiers table * fix(gateway): handle provider command without config * feat(gateway): add fast mode support to gateway chats * fix: add _session_model_overrides to test runner fixture Follow-up for cherry-pick — _session_model_overrides was added to GatewayRunner.__init__ after the fast mode PR was written. * fix: fall back to default certs when CA bundle path doesn't exist (#7352) _resolve_verify() returned stale CA bundle paths fr…

@kshitijk4poor

Wires the agent/account_usage module from the preceding commit into /usage so users see provider-side quota/credit info alongside the existing session token report. CLI: - `_show_usage` appends account lines under the token table. Fetch runs in a 1-worker ThreadPoolExecutor with a 10s timeout so a slow provider API can never hang the prompt. Gateway: - `_handle_usage_command` resolves provider from the live agent when available, else from the persisted billing_provider/billing_base_url on the SessionDB row, so /usage still returns account info between turns when no agent is resident. Fetch runs via asyncio.to_thread. - Account section is appended to all three return branches: running agent, no-agent-with-history, and the new no-agent-no-history path (falls back to account-only output instead of "no data"). Tests: - 2 new tests in tests/gateway/test_usage_command.py cover the live- agent account section and the persisted-billing fallback path. Salvaged from PR #2486 by @kshitijk4poor. The original branch had drifted ~2615 commits behind main and rewrote _show_usage wholesale, which would have dropped the rate-limit and cached-agent blocks added in PRs #6541 and #7038. This commit re-adds only the new behavior on top of current main.

@kshitijk4poor

Wires the agent/account_usage module from the preceding commit into /usage so users see provider-side quota/credit info alongside the existing session token report. CLI: - `_show_usage` appends account lines under the token table. Fetch runs in a 1-worker ThreadPoolExecutor with a 10s timeout so a slow provider API can never hang the prompt. Gateway: - `_handle_usage_command` resolves provider from the live agent when available, else from the persisted billing_provider/billing_base_url on the SessionDB row, so /usage still returns account info between turns when no agent is resident. Fetch runs via asyncio.to_thread. - Account section is appended to all three return branches: running agent, no-agent-with-history, and the new no-agent-no-history path (falls back to account-only output instead of "no data"). Tests: - 2 new tests in tests/gateway/test_usage_command.py cover the live- agent account section and the persisted-billing fallback path. Salvaged from PR NousResearch#2486 by @kshitijk4poor. The original branch had drifted ~2615 commits behind main and rewrote _show_usage wholesale, which would have dropped the rate-limit and cached-agent blocks added in PRs NousResearch#6541 and NousResearch#7038. This commit re-adds only the new behavior on top of current main.

* fix(tools/delegate): propagate resolved ACP runtime settings to child agents * fix(whatsapp): kill bridge process tree on Windows disconnect * fix: forward auth when probing local model metadata Pass the user's configured api_key through local-server detection and context-length probes (detect_local_server_type, _query_local_context_length, query_ollama_num_ctx) and use LM Studio's native /api/v1/models endpoint in fetch_endpoint_model_metadata when a loaded instance is present — so the probed context length is the actual runtime value the user loaded the model at, not just the model's theoretical max. Helps local-LLM users whose auto-detected context length was wrong, causing compression failures and context-overrun crashes. * fix: thread api_key through ollama num_ctx probe + author map Follow-up for salvaged PR #3185: - run_agent.py: pass self.api_key to query_ollama_num_ctx() so Ollama behind an auth proxy (same issue class as the LM Studio fix) can be probed successfully. - scripts/release.py AUTHOR_MAP: map @tannerfokkens-maker's local-hostname commit email. * fix(tools): keep default-off toolsets disabled * chore(release): map mavrickdeveloper email for attribution * chore: map Es1la contributor email for AUTHOR_MAP (#13294) Credit preserved for PR #13270 (WhatsApp Windows disconnect fix). * feat: shell hooks — wire shell scripts as Hermes hook callbacks Users can declare shell scripts in config.yaml under a hooks: block that fire on plugin-hook events (pre_tool_call, post_tool_call, pre_llm_call, subagent_stop, etc). Scripts receive JSON on stdin, can return JSON on stdout to block tool calls or inject context pre-LLM. Key design: - Registers closures on existing PluginManager._hooks dict — zero changes to invoke_hook() call sites - subprocess.run(shell=False) via shlex.split — no shell injection - First-use consent per (event, command) pair, persisted to allowlist JSON - Bypass via --accept-hooks, HERMES_ACCEPT_HOOKS=1, or hooks_auto_accept - hermes hooks list/test/revoke/doctor CLI subcommands - Adds subagent_stop hook event fired after delegate_task children exit - Claude Code compatible response shapes accepted Cherry-picked from PR #13143 by @pefontana. * feat: attribution default_headers for ai-gateway provider Requests through Vercel AI Gateway now carry referrerUrl / appName / User-Agent attribution so traffic shows up in the gateway's analytics. Adds _AI_GATEWAY_HEADERS in auxiliary_client and a new ai-gateway.vercel.sh branch in _apply_client_headers_for_base_url. * feat: curated picker with live pricing for ai-gateway provider - Curated AI_GATEWAY_MODELS list in hermes_cli/models.py (OSS first, kimi-k2.5 as recommended default). - fetch_ai_gateway_models() filters the curated list against the live /v1/models catalog; falls back to the snapshot on network failure. - fetch_ai_gateway_pricing() translates Vercel's input/output field names to the prompt/completion shape the shared picker expects; carries input_cache_read / input_cache_write through unchanged. - get_pricing_for_provider() now handles ai-gateway. - _model_flow_ai_gateway() provides a guided URL prompt when no key is set and a pricing-column picker; routes ai-gateway to it instead of the generic api-key flow. * feat: promote ai-gateway in provider picker ordering Moves Vercel AI Gateway from the bottom of the list to near the top, adjacent to other multi-model aggregators. The existing bottom position was a result of the list growing by appending new providers over time — the new position makes it more discoverable. * feat: auto-promote free Moonshot models to top of ai-gateway picker When the live Vercel AI Gateway catalog exposes a Moonshot model with zero input AND output pricing, it's promoted to position #1 as the recommended default — even if the exact ID isn't in the curated AI_GATEWAY_MODELS list. This enables dynamic discovery of new free Moonshot variants without requiring a PR to update curation. Paid Moonshot models are unaffected; falls back to the normal curated recommended tag when no free Moonshot is live. * feat: use Vercel's deep-link for ai-gateway API key creation prompt Vercel provides a d?to= redirect URL that routes users through their team picker to the AI Gateway API keys management page. Using this specific URL lands users directly on the "Create key" page instead of the generic AI Gateway dashboard. * chore: register contributor in AUTHOR_MAP for release-note attribution Adds zheng.jerilyn@gmail.com → jerilynzheng to scripts/release.py so the check-attribution CI workflow passes. * fix: correct AI_GATEWAY_MODELS slugs to match Vercel's catalog The original list was copied from OpenRouter conventions and didn't match what Vercel actually hosts. Verified against the live /v1/models endpoint (266 models): - qwen/qwen3.6-plus → alibaba/qwen3.6-plus (Vercel hosts Qwen under alibaba/) - z-ai/glm-5.1 → zai/glm-5.1 (no hyphen) - x-ai/grok-4.20 → xai/grok-4.20-reasoning (no hyphen, picks reasoning variant) - google/gemini-3-flash-preview → google/gemini-3-flash (no -preview suffix) - moonshotai/kimi-k2.5 → moonshotai/kimi-k2.6 (newest available) * fix(gateway/api_server): deduplicate concurrent idempotent requests * chore(release): map yukipukikedy@gmail.com to Yukipukii1 * fix(env_loader): warn when non-ASCII stripped from credential env vars (#13300) Load-time sanitizer silently removed non-ASCII codepoints from any env var ending in _API_KEY / _TOKEN / _SECRET / _KEY, turning copy-paste artifacts (Unicode lookalikes, ZWSP, NBSP) into opaque provider-side API_KEY_INVALID errors. Warn once per key to stderr with the offending codepoints (U+XXXX) and guidance to re-copy from the provider dashboard. * fix: restrict provider URL detection to exact hostname matches * fix: extend hostname-match provider detection across remaining call sites Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the two openai/xai sites in run_agent.py. This finishes the sweep: the same substring-match false-positive class (e.g. https://api.openai.com.evil/v1, https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1) existed in eight more call sites, and the hostname helper was duplicated in two modules. - utils: add shared base_url_hostname() (single source of truth). - hermes_cli/runtime_provider, run_agent: drop local duplicates, import from utils. Reuse the cached AIAgent._base_url_hostname attribute everywhere it's already populated. - agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg selection to hostname equality. - run_agent: native-anthropic check in the Claude-style model branch and in the AIAgent init provider-auto-detect branch. - agent/model_metadata: Anthropic /v1/models context-length lookup. - hermes_cli/providers.determine_api_mode: anthropic / openai URL heuristics for custom/unknown providers (the /anthropic path-suffix convention for third-party gateways is preserved). - tools/delegate_tool: anthropic detection for delegated subagent runtimes. - hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint native-OpenAI detection (paired with deduping the repeated check into a single is_native_openai boolean per branch). Tests: - tests/test_base_url_hostname.py covers the helper directly (path-containing-host, host-suffix, trailing dot, port, case). - tests/hermes_cli/test_determine_api_mode_hostname.py adds the same regression class for determine_api_mode, plus a test that the /anthropic third-party gateway convention still wins. Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP. * fix: sweep remaining provider-URL substring checks across codebase Completes the hostname-hardening sweep — every substring check against a provider host in live-routing code is now hostname-based. This closes the same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen, ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI, Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI, and Anthropic. New helper: - utils.base_url_host_matches(base_url, domain) — safe counterpart to 'domain in base_url'. Accepts hostname equality and subdomain matches; rejects path segments, host suffixes, and prefix collisions. Call sites converted (real-code only; tests, optional-skills, red-teaming scripts untouched): run_agent.py (10 sites): - AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check) - header cascade for openrouter / copilot / kimi / qwen / chatgpt - interleaved-thinking trigger (openrouter + claude) - _is_openrouter_url(), _is_qwen_portal() - is_native_anthropic check - github-models-vs-copilot detection (3 sites) - reasoning-capable route gate (nousresearch, vercel, github) - codex-backend detection in API kwargs build - fallback api_mode Bedrock detection agent/auxiliary_client.py (7 sites): - extra-headers cascades in 4 distinct client-construction paths (resolve custom, resolve auto, OpenRouter-fallback-to-custom, _async_client_from_sync, resolve_provider_client explicit-custom, resolve_auto_with_codex) - _is_openrouter_client() base_url sniff agent/usage_pricing.py: - resolve_billing_route openrouter branch agent/model_metadata.py: - _is_openrouter_base_url(), Bedrock context-length lookup hermes_cli/providers.py: - determine_api_mode Bedrock heuristic hermes_cli/runtime_provider.py: - _is_openrouter_url flag for API-key preference (issues #420, #560) hermes_cli/doctor.py: - Kimi User-Agent header for /models probes tools/delegate_tool.py: - subagent Codex endpoint detection trajectory_compressor.py: - _detect_provider() cascade (8 providers: openrouter, nous, codex, zai, kimi-coding, arcee, minimax-cn, minimax) cli.py, gateway/run.py: - /model-switch cache-enabled hint (openrouter + claude) Bedrock detection tightened from 'bedrock-runtime in url' to 'hostname starts with bedrock-runtime. AND host is under amazonaws.com'. ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'. Tests: - tests/test_base_url_hostname.py extended with a base_url_host_matches suite (exact match, subdomain, path-segment rejection, host-suffix rejection, host-prefix rejection, empty-input, case-insensitivity, trailing dot). Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock, gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback, fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution, delegate, credential_pool, context_compressor, plus the 4 hostname test modules). 26-assertion E2E call-site verification across 6 modules passes. * refactor(steer): simplify injection marker to 'User guidance:' prefix (#13340) The mid-run steer marker was '[USER STEER (injected mid-run, not tool output): <text>]'. Replaced with a plain two-newline-prefixed 'User guidance: <text>' suffix. Rationale: the marker lives inside the tool result's content string regardless of whether the tool returned JSON, plain text, an MCP result, or a plugin result. The bracketed tag read like structured metadata that some tools (terminal, execute_code) could confuse with their own output formatting. A plain labelled suffix works uniformly across every content shape we produce. Behavior unchanged: - Still injected into the last tool-role message's content. - Still preserves multimodal (Anthropic) content-block lists by appending a text block. - Still drained at both sites added in #12959 and #13205 — per-tool drain between individual calls, and pre-API-call drain at the top of each main-loop iteration. Checked Codex's equivalent (pending_input / inject_user_message_without_turn in codex-rs/core): they record mid-turn user input as a real role:user message via record_user_prompt_and_emit_turn_item(). That's cleaner for their Responses-API model but not portable to Chat Completions where role alternation after tool_calls is strict. Embedding the guidance in the last tool result remains the correct placement for us. Validation: all 21 tests in tests/run_agent/test_steer.py pass. * refactor(ai-gateway): single source of truth for model catalog (#13304) Delete the stale literal `_PROVIDER_MODELS["ai-gateway"]` (gpt-5, gemini-2.5-pro, claude-4.5 — outdated the moment PR #13223 landed with its curated `AI_GATEWAY_MODELS` snapshot) and derive it from `AI_GATEWAY_MODELS` instead, so the picker tuples and the bare-id fallback catalog stay in sync automatically. Also fixes `get_default_model_for_provider('ai-gateway')` to return kimi-k2.6 (the curated recommendation) instead of claude-opus-4.6. * fix(whatsapp): remove 120s timeout on bridge npm install (#13339) The WhatsApp bridge depends on @whiskeysockets/baileys pulled directly from a GitHub commit tarball, which on slower connections or when GitHub is sluggish routinely exceeds 120s. The hardcoded timeout surfaced as a raw TimeoutExpired traceback during 'hermes whatsapp' setup. Switch to the same pattern used by the TUI npm install at line ~945: no timeout, --no-fund/--no-audit/--progress=false to keep output clean, stderr captured and tailed on failure. Also resolve npm via shutil.which so missing Node.js gives a clean error instead of FileNotFoundError, and handle Ctrl+C cleanly. Co-authored-by: teknium1 <teknium@nousresearch.com> * fix(cli): dispatch /steer inline while agent is running (#13354) Classic-CLI /steer typed during an active agent run was queued through self._pending_input alongside ordinary user input. process_loop, which drains that queue, is blocked inside self.chat() for the entire run, so the queued command was not pulled until AFTER _agent_running had flipped back to False — at which point process_command() took the idle fallback ("No agent running; queued as next turn") and delivered the steer as an ordinary next-turn user message. From Utku's bug report on PR #13205: mid-run /steer arrived minutes later at the end of the turn as a /queue-style message, completely defeating its purpose. Fix: add _should_handle_steer_command_inline() gating — when _agent_running is True and the user typed /steer, dispatch process_command(text) directly from the prompt_toolkit Enter handler on the UI thread instead of queueing. This mirrors the existing _should_handle_model_command_inline() pattern for /model and is safe because agent.steer() is thread-safe (uses _pending_steer_lock, no prompt_toolkit state mutation, instant return). No changes to the idle-path behavior: /steer typed with no active agent still takes the normal queue-and-drain route so the fallback "No agent running; queued as next turn" message is preserved. Validation: - 7 new unit tests in tests/cli/test_cli_steer_busy_path.py covering the detector, dispatch path, and idle-path control behavior. - All 21 existing tests in tests/run_agent/test_steer.py still pass. - Live PTY end-to-end test with real agent + real openrouter model: 22:36:22 API call #1 (model requested execute_code) 22:36:26 ENTER FIRED: agent_running=True, text='/steer ...' 22:36:26 INLINE STEER DISPATCH fired 22:36:43 agent.log: 'Delivered /steer to agent after tool batch' 22:36:44 API call #2 included the steer; response contained marker Same test on the tip of main without this fix shows the steer landing as a new user turn ~20s after the run ended. * feat: add transport types + migrate Anthropic normalize path Add agent/transports/types.py with three shared dataclasses: - NormalizedResponse: content, tool_calls, finish_reason, reasoning, usage, provider_data - ToolCall: id, name, arguments, provider_data (per-tool-call protocol metadata) - Usage: prompt_tokens, completion_tokens, total_tokens, cached_tokens Add normalize_anthropic_response_v2() to anthropic_adapter.py — wraps the existing v1 function and maps its output to NormalizedResponse. One call site in run_agent.py (the main normalize branch) uses v2 with a back-compat shim to SimpleNamespace for downstream code. No ABC, no registry, no streaming, no client lifecycle. Those land in PR 3 with the first concrete transport (AnthropicTransport). 46 new tests: - test_types.py: dataclass construction, build_tool_call, map_finish_reason - test_anthropic_normalize_v2.py: v1-vs-v2 regression tests (text, tools, thinking, mixed, stop reasons, mcp prefix stripping, edge cases) Part of the provider transport refactor (PR 2 of 9). * test: stop testing mutable data — convert change-detectors to invariants (#13363) Catalog snapshots, config version literals, and enumeration counts are data that changes as designed. Tests that assert on those values add no behavioral coverage — they just break CI on every routine update and cost engineering time to 'fix.' Replace with invariants where one exists, delete where none does. Deleted (pure snapshots): - TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al - TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models' - test_browser_camofox_state::test_config_version_matches_current_schema (docstring literally said it would break on unrelated bumps) Relaxed (keep plumbing check, drop snapshot): - Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists: now assert 'provider exists and has >= 1 entry' instead of specific names - HuggingFace main/models.py consistency test: drop 'len >= 6' floor Dynamicized (follow source, not a literal): - 3x test_config.py migration tests: raw['_config_version'] == DEFAULT_CONFIG['_config_version'] instead of hardcoded 21 Fixed stale tests against intentional behavior changes: - test_insights::test_gateway_format_hides_cost: name matches new behavior (no dollar figures); remove contradicting '$' in text assertion - test_config::prefers_api_then_url_then_base_url: flipped per PR #9332; rename + update to base_url > url > api - test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to assert called — contract is 'credential flowed through' - test_interrupt_propagation: add provider/model/_base_url to bare-agent fixture so the stale-timeout code path resolves Fixed stale integration tests against opt-in plugin gate: - transform_tool_result + transform_terminal_output: write plugins.enabled allow-list to config.yaml and reset the plugin manager singleton Source fix (real consistency invariant): - agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length (262144, same as K2.5). test_model_metadata_has_context_lengths was correctly catching the gap. Policy: - AGENTS.md Testing section: new subsection 'Don't write change-detector tests' with do/don't examples. Reviewers should reject catalog-snapshot assertions in new tests. Covers every test that failed on the last completed main CI run (24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present + test_terminal_and_file_toolsets_resolve_all_tools, which now pass both alone and with the full tests/tools/ directory (xdist ordering flake that resolved itself). * fix(whatsapp): remove shadowing shutil import in cmd_whatsapp (#13364) The re-pair branch had a redundant 'import shutil' inside cmd_whatsapp, which made shutil a function-local throughout the whole scope. The earlier 'shutil.which("npm")' call at the dependency-install step then crashed with UnboundLocalError before control ever reached the local import. shutil is already imported at module level (line 48), so the local import was dead code anyway. Drop it. * feat(voice): add cli beep toggle * feat(skills+terminal): make bundled skill scripts runnable out of the box (#13384) * feat(skills): inject absolute skill dir and expand ${HERMES_SKILL_DIR} templates When a skill loads, the activation message now exposes the absolute skill directory and substitutes ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in the SKILL.md body, so skills with bundled scripts can instruct the agent to run them by absolute path without an extra skill_view round-trip. Also adds opt-in inline-shell expansion: !`cmd` snippets in SKILL.md are pre-executed (with the skill directory as CWD) and their stdout is inlined into the message before the agent reads it. Off by default — enable via skills.inline_shell in config.yaml — because any snippet runs on the host without approval. Changes: - agent/skill_commands.py: template substitution, inline-shell expansion, absolute skill-dir header, supporting-files list now shows both relative and absolute forms. - hermes_cli/config.py: new skills.template_vars, skills.inline_shell, skills.inline_shell_timeout knobs. - tests/agent/test_skill_commands.py: coverage for header, both template tokens (present and missing session id), template_vars disable, inline-shell default-off, enabled, CWD, and timeout. - website/docs/developer-guide/creating-skills.md: documents the template tokens, the absolute-path header, and the opt-in inline shell with its security caveat. Validation: tests/agent/ 1591 passed (includes 9 new tests). E2E: loaded a real skill in an isolated HERMES_HOME; confirmed ${HERMES_SKILL_DIR} resolves to the absolute path, ${HERMES_SESSION_ID} resolves to the passed task_id, !`date` runs when opt-in is set, and stays literal when it isn't. * feat(terminal): source ~/.bashrc (and user-listed init files) into session snapshot bash login shells don't source ~/.bashrc, so tools that install themselves there — nvm, asdf, pyenv, cargo, custom PATH exports — stay invisible to the environment snapshot Hermes builds once per session. Under systemd or any context with a minimal parent env, that surfaces as 'node: command not found' in the terminal tool even though the binary is reachable from every interactive shell on the machine. Changes: - tools/environments/local.py: before the login-shell snapshot bootstrap runs, prepend guarded 'source <file>' lines for each resolved init file. Missing files are skipped, each source is wrapped with a '[ -r ... ] && . ... || true' guard so a broken rc can't abort the bootstrap. - hermes_cli/config.py: new terminal.shell_init_files (explicit list, supports ~ and ${VAR}) and terminal.auto_source_bashrc (default on) knobs. When shell_init_files is set it takes precedence; when it's empty and auto_source_bashrc is on, ~/.bashrc gets auto-sourced. - tests/tools/test_local_shell_init.py: 10 tests covering the resolver (auto-bashrc, missing file, explicit override, ~/${VAR} expansion, opt-out) and the prelude builder (quoting, guarded sourcing), plus a real-LocalEnvironment snapshot test that confirms exports in the init file land in subsequent commands' environment. - website/docs/reference/faq.md: documents the fix in Troubleshooting, including the zsh-user pattern of sourcing ~/.zshrc or nvm.sh directly via shell_init_files. Validation: 10/10 new tests pass; tests/tools/test_local_*.py 40/40 pass; tests/agent/ 1591/1591 pass; tests/hermes_cli/test_config.py 50/50 pass. E2E in an isolated HERMES_HOME: confirmed that a fake ~/.bashrc setting a marker var and PATH addition shows up in a real LocalEnvironment().execute() call, that auto_source_bashrc=false suppresses it, that an explicit shell_init_files entry wins over the auto default, and that a missing bashrc is silently skipped. * fix(gateway): prevent --replace race condition causing multiple instances When starting the gateway with --replace, concurrent invocations could leave multiple instances running simultaneously. This happened because write_pid_file() used a plain overwrite, so the second racer would silently replace the first process's PID record. Changes: - gateway/status.py: write_pid_file() now uses atomic O_CREAT|O_EXCL creation. If the file already exists, it raises FileExistsError, allowing exactly one process to win the race. - gateway/run.py: before writing the PID file, re-check get_running_pid() and catch FileExistsError from write_pid_file(). In both cases, stop the runner and return False so the process exits cleanly. Fixes #11718 * fix(gateway): force-unlink stale PID file after --replace takeover If the old process crashed without firing its atexit handler, remove_pid_file() is a no-op. Force-unlink the stale gateway.pid so write_pid_file() (O_CREAT|O_EXCL) does not hit FileExistsError. * fix(gateway): close --replace race completely by claiming PID before adapter startup Follow-up on top of opriz's atomic PID file fix. The prior change caught the race AFTER runner.start(), so the loser still opened Telegram polling and Discord gateway sockets before detecting the conflict and exiting. Hoist the PID-claim block to BEFORE runner.start(). Now the loser of the O_CREAT|O_EXCL race returns from start_gateway() without ever bringing up any platform adapter — no Telegram conflict, no Discord duplicate session. Also add regression tests: - test_write_pid_file_is_atomic_against_concurrent_writers: second write_pid_file() raises FileExistsError rather than clobbering. - Two existing replace-path tests updated to stateful mocks since the real post-kill state (get_running_pid None after remove_pid_file) is now exercised by the hoisted re-check. * refactor: remove redundant local imports already available at module level Sweep ~74 redundant local imports across 21 files where the same module was already imported at the top level. Also includes type fixes and lint cleanups on the same branch. * refactor: remove remaining redundant local imports (comprehensive sweep) Full AST-based scan of all .py files to find every case where a module or name is imported locally inside a function body but is already available at module level. This is the second pass — the first commit handled the known cases from the lint report; this one catches everything else. Files changed (19): cli.py — 16 removals: time as _time/_t/_tmod (×10), re / re as _re (×2), os as _os, sys, partial os from combo import, from model_tools import get_tool_definitions gateway/run.py — 8 removals: MessageEvent as _ME / MessageType as _MT (×3), os as _os2, MessageEvent+MessageType (×2), Platform, BasePlatformAdapter as _BaseAdapter run_agent.py — 6 removals: get_hermes_home as _ghh, partial (contextlib, os as _os), cleanup_vm, cleanup_browser, set_interrupt as _sif (×2), partial get_toolset_for_tool hermes_cli/main.py — 4 removals: get_hermes_home, time as _time, logging as _log, shutil hermes_cli/config.py — 1 removal: get_hermes_home as _ghome hermes_cli/runtime_provider.py — 1 removal: load_config as _load_bedrock_config hermes_cli/setup.py — 2 removals: importlib.util (×2) hermes_cli/nous_subscription.py — 1 removal: from hermes_cli.config import load_config hermes_cli/tools_config.py — 1 removal: from hermes_cli.config import load_config, save_config cron/scheduler.py — 3 removals: concurrent.futures, json as _json, from hermes_cli.config import load_config batch_runner.py — 1 removal: list_distributions as get_all_dists (kept print_distribution_info, not at top level) tools/send_message_tool.py — 2 removals: import os (×2) tools/skills_tool.py — 1 removal: logging as _logging tools/browser_camofox.py — 1 removal: from hermes_cli.config import load_config tools/image_generation_tool.py — 1 removal: import fal_client environments/tool_context.py — 1 removal: concurrent.futures gateway/platforms/bluebubbles.py — 1 removal: httpx as _httpx gateway/platforms/whatsapp.py — 1 removal: import asyncio tui_gateway/server.py — 2 removals: from datetime import datetime, import time All alias references (_time, _t, _tmod, _re, _os, _os2, _json, _ghh, _ghome, _sif, _ME, _MT, _BaseAdapter, _load_bedrock_config, _httpx, _logging, _log, get_all_dists) updated to use the top-level names. * fix(update): keep get_hermes_home late-bound in _install_hangup_protection Follow-up to the redundant-imports sweep. _install_hangup_protection used to import get_hermes_home locally; the sweep hoisted it to the module-level binding already present at line 164. test_non_fatal_if_log_setup_fails monkeypatches hermes_cli.config.get_hermes_home to raise, which only works when the function late-binds its lookup. The hoisted version captures the reference at import time and bypasses the monkeypatch. Restore the local import (with a distinct local alias) so the test seam works and the stdio-untouched-on-setup-failure invariant is actually exercised. * feat(maps): add guest_house, camp_site, and dual-key bakery lookup (#13398) Small follow-up inspired by stale PR #2421 (@poojandpatel). - bakery now searches both shop=bakery AND amenity=bakery in one Overpass query so indie bakeries tagged either way are returned. Reproduces #2421's Lawrenceville, NJ test case (The Gingered Peach, WildFlour Bakery). - Adds tourism=guest_house and tourism=camp_site as first-class categories. - CATEGORY_TAGS entries can now be a list of (key, value) tuples; new _tags_for() normaliser + tag_pairs= kwarg on build_overpass_nearby/bbox union the results in one query. Old single-tuple call sites unchanged (back-compat preserved). - SKILL.md: 44 → 46 categories, list updated. * fix(gateway): preserve sender attribution in shared group sessions Generalize shared multi-user session handling so non-thread group sessions (group_sessions_per_user=False) get the same treatment as shared threads: inbound messages are prefixed with [sender name], and the session prompt shows a multi-user note instead of pinning a single **User:** line into the cached system prompt. Before: build_session_key already treated these as shared sessions, but _prepare_inbound_message_text and build_session_context_prompt only recognized shared threads — creating cross-user attribution drift and prompt-cache contamination in shared groups. - Add is_shared_multi_user_session() helper alongside build_session_key() so both the session key and the multi-user branches are driven by the same rules (DMs never shared, threads shared unless thread_sessions_per_user, groups shared unless group_sessions_per_user). - Add shared_multi_user_session field to SessionContext, populated by build_session_context() from config. - Use context.shared_multi_user_session in the prompt builder (label is 'Multi-user thread' when a thread is present, 'Multi-user session' otherwise). - Use the helper in _prepare_inbound_message_text so non-thread shared groups also get [sender] prefixes. Default behavior unchanged: DMs stay single-user, groups with group_sessions_per_user=True still show the user normally, shared threads keep their existing multi-user behavior. Tests (65 passed): - tests/gateway/test_session.py: new shared non-thread group prompt case. - tests/gateway/test_shared_group_sender_prefix.py: inbound preprocessing for shared non-thread groups and default groups. * feat: add transport ABC + AnthropicTransport wired to all paths Add ProviderTransport ABC (4 abstract methods: convert_messages, convert_tools, build_kwargs, normalize_response) plus optional hooks (validate_response, extract_cache_stats, map_finish_reason). Add transport registry with lazy discovery — get_transport() auto-imports transport modules on first call. Add AnthropicTransport — delegates to existing anthropic_adapter.py functions, wired to ALL Anthropic code paths in run_agent.py: - Main normalize loop (L10775) - Main build_kwargs (L6673) - Response validation (L9366) - Finish reason mapping (L9534) - Cache stats extraction (L9827) - Truncation normalize (L9565) - Memory flush build_kwargs + normalize (L7363, L7395) - Iteration-limit summary + retry (L8465, L8498) Zero direct adapter imports remain for transport methods. Client lifecycle, streaming, auth, and credential management stay on AIAgent. 20 new tests (ABC contract, registry, AnthropicTransport methods). 359 anthropic-related tests pass (0 failures). PR 3 of the provider transport refactor. * feat: Add KittenTTS provider for local TTS synthesis Add support for KittenTTS - a lightweight, local TTS engine with models ranging from 25-80MB that runs on CPU without requiring a GPU or API key. Features: - Support for 8 built-in voices (Jasper, Bella, Luna, etc.) - Configurable model size (nano 25MB, micro 41MB, mini 80MB) - Adjustable speech speed - Model caching for performance - Automatic WAV to Opus conversion for Telegram voice messages Configuration example (config.yaml): tts: provider: kittentts kittentts: model: KittenML/kitten-tts-nano-0.8-int8 voice: Jasper speed: 1.0 clean_text: true Installation: pip install https://github.com/KittenML/KittenTTS/releases/download/0.8.1/kittentts-0.8.1-py3-none-any.whl * feat(tts): complete KittenTTS integration (tools/setup/docs/tests) Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the provider behaves like every other TTS backend end to end. - tools/tts_tool.py: `_check_kittentts_available()` helper and wire into `check_tts_requirements()`; extend Opus-conversion list to include kittentts (WAV → Opus for Telegram voice bubbles); point the missing-package error at `hermes setup tts`. - hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech" toolset picker, with a `kittentts` post_setup hook that auto-installs the wheel + soundfile via pip. - hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install flow in `_setup_tts_provider()`, provider_labels entry, and status row in the `hermes setup` summary. - website/docs/user-guide/features/tts.md: add KittenTTS to the provider table, config example, ffmpeg note, and the zero-config voice-bubble tip. - tests/tools/test_tts_kittentts.py: 10 unit tests covering generation, model caching, config passthrough, ffmpeg conversion, availability detection, and the missing-package dispatcher branch. E2E verified against the real `kittentts` wheel: - WAV direct output (pcm_s16le, 24kHz mono) - MP3 conversion via ffmpeg (from WAV) - Telegram flow (provider in Opus-conversion list) produces `codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the `[[audio_as_voice]]` marker - check_tts_requirements() returns True when kittentts is installed * chore(release): map abdi.moya@gmail.com -> AxDSan for release notes * fix(security): apply file safety to copilot acp fs * chore(release): map fr@tecompanytea.com → ifrederico * test(copilot-acp): patch HERMES_HOME alongside HOME in hub-block test file_safety now uses profile-aware get_hermes_home(), so the test fixture must override HERMES_HOME too — otherwise it resolves to the conftest's isolated tempdir and the hub-cache path doesn't match. * test(conftest): reset module-level state + unset platform allowlists (#13400) Three fixes that close the remaining structural sources of CI flakes after PR #13363. ## 1. Per-test reset of module-level singletons and ContextVars Python modules are singletons per process, and pytest-xdist workers are long-lived. Module-level dicts/sets and ContextVars persist across tests on the same worker. A test that sets state in `tools.approval._session_approved` and doesn't explicitly clear it leaks that state to every subsequent test on the same worker. New `_reset_module_state` autouse fixture in `tests/conftest.py` clears: - tools.approval: _session_approved, _session_yolo, _permanent_approved, _pending, _gateway_queues, _gateway_notify_cbs, _approval_session_key - tools.interrupt: _interrupted_threads - gateway.session_context: 10 session/cron ContextVars (reset to _UNSET) - tools.env_passthrough: _allowed_env_vars_var (reset to empty set) - tools.credential_files: _registered_files_var (reset to empty dict) - tools.file_tools: _read_tracker, _file_ops_cache This was the single biggest remaining class of CI flakes. `test_command_guards::test_warn_session_approved` and `test_combined_cli_session_approves_both` were failing 12/15 recent main runs specifically because `_session_approved` carried approvals from a prior test's session into these tests' `"default"` session lookup. ## 2. Unset platform allowlist env vars in hermetic fixture `TELEGRAM_ALLOWED_USERS`, `DISCORD_ALLOWED_USERS`, and 20 other `*_ALLOWED_USERS` / `*_ALLOW_ALL_USERS` vars are now unset per-test in the same place credential env vars already are. These aren't credentials but they change gateway auth behavior; if set from any source (user shell, leaky test, CI env) they flake button-authorization tests. Fixes three `test_telegram_approval_buttons` tests that were failing across recent runs of the full gateway directory. ## 3. Two specific tests with module-level captured state - `test_signal::TestSignalPhoneRedaction`: `agent.redact._REDACT_ENABLED` is captured at module import from `HERMES_REDACT_SECRETS`, not read per-call. `monkeypatch.delenv` at test time is too late. Added `monkeypatch.setattr("agent.redact._REDACT_ENABLED", True)` per skill xdist-cross-test-pollution Pattern 5. - `test_internal_event_bypass_pairing::test_non_internal_event_without_user_triggers_pairing`: `gateway.pairing.PAIRING_DIR` is captured at module import from HERMES_HOME, so per-test HERMES_HOME redirection in conftest doesn't retroactively move it. Test now monkeypatches PAIRING_DIR directly to its tmp_path, preventing rate-limit state from prior xdist workers from letting the pairing send-call be suppressed. ## Validation - tests/tools/: 3494 pass (0 fail) including test_command_guards - tests/gateway/: 3504 pass (0 fail) across repeat runs - tests/agent/ + tests/hermes_cli/ + tests/run_agent/ + tests/tools/: 8371 pass, 37 skipped, 0 fail — full suite across directories No production code changed. * fix(auth): hermes auth remove sticks for shell-exported env vars (#13418) Removing an env-seeded credential only cleared ~/.hermes/.env and the current process's os.environ, leaving shell-exported vars (shell profile, systemd EnvironmentFile, launchd plist) to resurrect the entry on the next load_pool() call. This matched the pre-#11485 codex behaviour. Now we suppress env:<VAR> in auth.json on remove, gate _seed_from_env() behind is_source_suppressed(), clear env:* suppressions on auth add, and print a diagnostic pointing at the shell when the var lives there. Applies to every env:* seeded credential (xai, deepseek, moonshot, zai, nvidia, openrouter, anthropic, etc.), not just xai. Reported by @teknium1 from community user 'Artificial Brain' — couldn't remove their xAI key via hermes auth remove. * feat(cli): add numbered keyboard shortcuts to approval and clarify prompts * chore(release): add francip to AUTHOR_MAP * feat(skills): add adversarial-ux-test optional skill Adds a structured adversarial UX testing skill that roleplays the worst-case user for any product. Uses a 6-step workflow: 1. Define a specific grumpy persona (age 50+, tech-resistant) 2. Browse the app in-character attempting real tasks 3. Write visceral in-character feedback (the Rant) 4. Apply a pragmatism filter (RED/YELLOW/WHITE/GREEN classification) 5. Create tickets only for real issues (RED + GREEN) 6. Deliver a structured report with screenshots The pragmatism filter is the key differentiator - it prevents raw persona complaints from becoming tickets, separating genuine UX problems from "I hate computers" noise. Includes example personas for 8 industry verticals and practical tips from real-world testing sessions. Ref: https://x.com/Teknium/status/2035708510034641202 * chore: attribution + catalog rows for adversarial-ux-test - AUTHOR_MAP: omni@comelse.com -> omnissiah-comelse - skills-catalog.md: add adversarial-ux-test row under dogfood - optional-skills-catalog.md: add new Dogfood section * fix(auth): unify credential source removal — every source sticks (#13427) Every credential source Hermes reads from now behaves identically on `hermes auth remove`: the pool entry stays gone across fresh load_pool() calls, even when the underlying external state (env var, OAuth file, auth.json block, config entry) is still present. Before this, auth_remove_command was a 110-line if/elif with five special cases, and three more sources (qwen-cli, copilot, custom config) had no removal handler at all — their pool entries silently resurrected on the next invocation. Even the handled cases diverged: codex suppressed, anthropic deleted-without-suppressing, nous cleared without suppressing. Each new provider added a new gap. What's new: agent/credential_sources.py — RemovalStep registry, one entry per source (env, claude_code, hermes_pkce, nous device_code, codex device_code, qwen-cli, copilot gh_cli + env vars, custom config). auth_remove_command dispatches uniformly via find_removal_step(). Changes elsewhere: agent/credential_pool.py — every upsert in _seed_from_env, _seed_from_singletons, and _seed_custom_pool now gates on is_source_suppressed(provider, source) via a shared helper. hermes_cli/auth_commands.py — auth_remove_command reduced to 25 lines of dispatch; auth_add_command now clears ALL suppressions for the provider on re-add (was env:* only). Copilot is special: the same token is seeded twice (gh_cli via _seed_from_singletons + env:<VAR> via _seed_from_env), so removing one entry without suppressing the other variants lets the duplicate resurrect. The copilot RemovalStep suppresses gh_cli + all three env variants (COPILOT_GITHUB_TOKEN, GH_TOKEN, GITHUB_TOKEN) at once. Tests: 11 new unit tests + 4059 existing pass. 12 E2E scenarios cover every source in isolated HERMES_HOME with simulated fresh processes. * feat(account-usage): add per-provider account limits module Ports agent/account_usage.py and its tests from the original PR #2486 branch. Defines AccountUsageSnapshot / AccountUsageWindow dataclasses, a shared renderer, and provider-specific fetchers for OpenAI Codex (wham/usage), Anthropic OAuth (oauth/usage), and OpenRouter (/credits and /key). Wiring into /usage lands in a follow-up salvage commit. Authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> * feat(/usage): append account limits section in CLI and gateway Wires the agent/account_usage module from the preceding commit into /usage so users see provider-side quota/credit info alongside the existing session token report. CLI: - `_show_usage` appends account lines under the token table. Fetch runs in a 1-worker ThreadPoolExecutor with a 10s timeout so a slow provider API can never hang the prompt. Gateway: - `_handle_usage_command` resolves provider from the live agent when available, else from the persisted billing_provider/billing_base_url on the SessionDB row, so /usage still returns account info between turns when no agent is resident. Fetch runs via asyncio.to_thread. - Account section is appended to all three return branches: running agent, no-agent-with-history, and the new no-agent-no-history path (falls back to account-only output instead of "no data"). Tests: - 2 new tests in tests/gateway/test_usage_command.py cover the live- agent account section and the persisted-billing fallback path. Salvaged from PR #2486 by @kshitijk4poor. The original branch had drifted ~2615 commits behind main and rewrote _show_usage wholesale, which would have dropped the rate-limit and cached-agent blocks added in PRs #6541 and #7038. This commit re-adds only the new behavior on top of current main. * feat(opencode-go): add Kimi K2.6 and Qwen3.5/3.6 Plus to curated catalog (#13429) OpenCode Go's published model list (opencode.ai/docs/go) includes kimi-k2.6, qwen3.5-plus, and qwen3.6-plus, but Hermes' curated lists didn't carry them. When the live /models probe fails during `hermes model`, users fell back to the stale curated list and had to type newer models via 'Enter custom model name'. Adds kimi-k2.6 (now first in the Go list), qwen3.6-plus, and qwen3.5-plus to both the model picker (hermes_cli/models.py) and setup defaults (hermes_cli/setup.py). All routed through the existing opencode-go chat_completions path — no api_mode changes needed. * feat(patch): add 'did you mean?' feedback when patch fails to match When patch_replace() cannot find old_string in a file, the error message now includes the closest matching lines from the file with line numbers and context. This helps the LLM self-correct without a separate read_file call. Implements Phase 1 of #536: enhanced patch error feedback with no architectural changes. - tools/fuzzy_match.py: new find_closest_lines() using SequenceMatcher - tools/file_operations.py: attach closest-lines hint to patch errors - tests/tools/test_fuzzy_match.py: 5 new tests for find_closest_lines * fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage Follow-ups on top of @teyrebaz33's cherry-picked commit: 1. New shared helper format_no_match_hint() in fuzzy_match.py with a startswith('Could not find') gate so the snippet only appends to genuine no-match errors — not to 'Found N matches' (ambiguous), 'Escape-drift detected', or 'identical strings' errors, which would all mislead the model. 2. file_tools.patch_tool suppresses the legacy generic '[Hint: old_string not found...]' string when the rich 'Did you mean?' snippet is already attached — no more double-hint. 3. Wire the same helper into patch_parser.py (V4A patch mode, both _validate_operations and _apply_update) and skill_manager_tool.py so all three fuzzy callers surface the hint consistently. Tests: 7 new gating tests in TestFormatNoMatchHint cover every error class (ambiguous, drift, identical, non-zero match count, None error, no similar content, happy path). 34/34 test_fuzzy_match, 96/96 test_file_tools + test_patch_parser + test_skill_manager_tool pass. E2E verified across all four scenarios: no-match-with-similar, no-match-no-similar, ambiguous, success. V4A mode confirmed end-to-end with a non-matching hunk. * Normalize FAL_KEY env handling (ignore whitespace-only values) Treat whitespace-only FAL_KEY the same as unset so users who export FAL_KEY=" " (or CI that leaves a blank token) get the expected 'not set' error path instead of a confusing downstream fal_client failure. Applied to the two direct FAL_KEY checks in image_generation_tool.py: image_generate_tool's upfront credential check and check_fal_api_key(). Both keep the existing managed-gateway fallback intact. Adapted the original whitespace/valid tests to pin the managed gateway to None so the whitespace assertion exercises the direct-key path rather than silently relying on gateway absence. * fix(fal): extend whitespace-only FAL_KEY handling to all call sites Follow-up to PR #2504. The original fix covered the two direct FAL_KEY checks in image_generation_tool but left four other call sites intact, including the managed-gateway gate where a whitespace-only FAL_KEY falsely claimed 'user has direct FAL' and *skipped* the Nous managed gateway fallback entirely. Introduce fal_key_is_configured() in tools/tool_backend_helpers.py as a single source of truth (consults os.environ, falls back to .env for CLI-setup paths) and route every FAL_KEY presence check through it: - tools/image_generation_tool.py : _resolve_managed_fal_gateway, image_generate_tool's upfront check, check_fal_api_key - hermes_cli/nous_subscription.py : direct_fal detection, selected toolset gating, tools_ready map - hermes_cli/tools_config.py : image_gen needs-setup check Verified by extending tests/tools/test_image_generation_env.py and by E2E exercising whitespace + managed-gateway composition directly. * fix: slash commands now respect require_mention in Telegram groups When require_mention is enabled, slash commands no longer bypass mention checks. Bare /command without @mention is filtered in groups, while /command@botname (bot menu) and @botname /command still pass. Commands still pass unconditionally when require_mention is disabled, preserving backward compatibility. Closes #6033 * test(telegram): update /cmd@botname assertion for entity-only detection Current main's _message_mentions_bot() uses MessageEntity-only detection (commit e330112a), so the test for '/status@hermes_bot' needs to include a MENTION entity. Real Telegram always emits one for /cmd@botname — the bot menu and CommandHandler rely on this mechanism. * chore(release): add pinion05 to AUTHOR_MAP * docs(xurl skill): document UsernameNotFound workaround (xurl v1.1.0) (#13458) xurl v1.1.0 added an optional USERNAME positional to `xurl auth oauth2` that skips the `/2/users/me` lookup, which has been returning 403/UsernameNotFound for many devs. Documents the workaround in both setup (step 5) and troubleshooting. Reported by @itechnologynet. * refactor(acp): validate method_id against advertised provider in authenticate() (#13468) * feat(models): hide OpenRouter models that don't advertise tool support Port from Kilo-Org/kilocode#9068. hermes-agent is tool-calling-first — every provider path assumes the model can invoke tools. Models whose OpenRouter supported_parameters doesn't include 'tools' (e.g. image-only or completion-only models) cannot be driven by the agent loop and fail at the first tool call. Filter them out of fetch_openrouter_models() so they never appear in the model picker (`hermes model`, setup wizard, /model slash command). Permissive when the field is missing — OpenRouter-compatible gateways (Nous Portal, private mirrors, older snapshots) don't always populate supported_parameters. Treat missing as 'unknown → allow' rather than silently emptying the picker on those gateways. Only hide models whose supported_parameters is an explicit list that omits tools. Tests cover: tools present → kept, tools absent → dropped, field missing → kept, malformed non-list → kept, non-dict item → kept, empty list → dropped. * refactor(acp): validate method_id against advertised provider in authenticate() Previously authenticate() accepted any method_id whenever the server had provider credentials configured. This was not a vulnerability under the personal-assistant trust model (ACP is stdio-only, local-trust — anything that can reach the transport is already code-execution-equivalent to the user), but it was sloppy API hygiene: the advertised auth_methods list from initialize() was effectively ignored. Now authenticate() only returns AuthenticateResponse when method_id matches the currently-advertised provider (case-insensitive). Mismatched or missing method_id returns None, consistent with the no-credentials case. Raised by xeloxa via GHSA-g5pf-8w9m-h72x. Declined as a CVE (ACP transport is stdio, local-trust model), but the correctness fix is worth having on its own. * test(mcp): add failing tests for circuit-breaker recovery The MCP circuit breaker in tools/mcp_tool.py has no half-open state and no reset-on-reconnect behavior, so once it trips after 3 consecutive failures it stays tripped for the process lifetime. These tests lock in the intended recovery behavior: 1. test_circuit_breaker_half_opens_after_cooldown — after the cooldown elapses, the next call must actually probe the session; success closes the breaker. 2. test_circuit_breaker_reopens_on_probe_failure — a failed probe re-arms the cooldown instead of letting every subsequent call through. 3. test_circuit_breaker_cleared_on_reconnect — a successful OAuth recovery resets the breaker even if the post-reconnect retry fails (a successful reconnect is sufficient evidence the server is viable again). All three currently fail, as expected. * fix(mcp): add half-open state to circuit breaker The MCP circuit breaker previously had no path back to the closed state: once _server_error_counts[srv] reached _CIRCUIT_BREAKER_THRESHOLD the gate short-circuited every subsequent call, so the only reset path (on successful call) was unreachable. A single transient 3-failure blip (bad network, server restart, expired token) permanently disabled every tool on that MCP server for the rest of the agent session. Introduce a classic closed/open/half-open state machine: - Track a per-server breaker-open timestamp in _server_breaker_opened_at alongside the existing failure count. - Add _CIRCUIT_BREAKER_COOLDOWN_SEC (60s). Once the count reaches threshold, calls short-circuit for the cooldown window. - After the cooldown elapses, the *next* call falls through as a half-open probe that actually hits the session. Success resets the breaker via _reset_server_error; failure re-bumps the count via _bump_server_error, which re-stamps the open timestamp and re-arms the cooldown. The error message now includes the live failure count and an "Auto-retry available in ~Ns" hint so the model knows the breaker will self-heal rather than giving up on the tool for the whole session. Covers tests 1 (half-opens after cooldown) and 2 (reopens on probe failure); test 3 (cleared on reconnect) still fails pending fix #2. * fix(mcp): reset circuit breaker on successful OAuth reconnect Previously the breaker was only cleared when the post-reconnect retry call itself succeeded (via _reset_server_error at the end of the try block). If OAuth recovery succeeded but the retry call happened to fail for a different reason, control fell through to the needs_reauth path which called _bump_server_error — adding to an already-tripped count instead of the fresh count the reconnect justified. With fix #1 in place this would still self-heal on the next cooldown, but we should not pay a 60s stall when we already have positive evidence the server is viable. Move _reset_server_error(server_name) up to immediately after the reconnect-and-ready-wait block, before the retry_call. The subsequent retry still goes through _bump_server_error on failure, so a genuinely broken server re-trips the breaker as normal — but the retry starts from a clean count (1 after a failure), not a stale one. * fix(/model): accept provider switches when /models is unreachable Gateway /model <name> --provider opencode-go (or any provider whose /models endpoint is down, 404s, or doesn't exist) silently failed. validate_requested_model returned accepted=False whenever fetch_api_models returned None, switch_model returned success=False, and the gateway never wrote _session_model_overrides — so the switch appeared to succeed in the error message flow but the next turn kept calling the old provider. The validator already had static-catalog fallbacks for MiniMax and Codex (providers without a /models endpoint). Extended the same pattern as the terminal fallback: when the live probe fails, consult provider_model_ids() for the curated catalog. Known models → accepted+recognized. Close typos → auto-corrected. Unknown models → soft-accepted with a 'Not in curated catalog' warning. Providers with no catalog at all → soft-accepted with a generic 'Note:' warning, finally honoring the in-code comment ('Accept and persist, but warn') that had been lying since it was written. Tests: 7 new tests in test_opencode_go_validation_fallback.py covering the catalog lookup, case-insensitive match, auto-correct, unknown-with-suggestion, unknown-without-suggestion, and no-catalog paths. TestValidateApiFallback in test_model_validation.py updated — its four 'rejected_when_api_down' tests were encoding exactly the bug being fixed. * fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot Kimi/Moonshot endpoints require explicit parameters that Hermes was not sending, causing 'Response truncated due to output length limit' errors and inconsistent reasoning behavior. Root cause analysis against Kimi CLI source (MoonshotAI/kimi-cli, packages/kosong/src/kosong/chat_provider/kimi.py): 1. max_tokens: Kimi's API defaults to a very low value when omitted. Reasoning tokens share the output budget — the model exhausts it on thinking alone. Send 32000, matching Kimi CLI's generate() default. 2. reasoning_effort: Kimi CLI sends this as a top-level parameter (not inside extra_body). Hermes was not sending it at all because _supports_reasoning_extra_body() returns False for non-OpenRouter endpoints. 3. extra_body.thinking: Kimi CLI uses with_thinking() which sets extra_body.thinking={"type":"enabled"} alongside reasoning_effort. This is a separate control from the OpenAI-style reasoning extra_body that Hermes sends for OpenRouter/GitHub. Without it, the Kimi gateway may not activate reasoning mode correctly. Covers api.kimi.com (Kimi Code) and api.moonshot.ai/cn (Moonshot). Tests: 6 new test cases for max_tokens, reasoning_effort, and extra_body.thinking under various configs. * chore(release): add mengjian-github to AUTHOR_MAP * fix(skills): respect HERMES_SESSION_PLATFORM in _is_skill_disabled Fixes #13027 Previously, `_is_skill_disabled()` only checked the explicit `platform` argument and `os.getenv('HERMES_PLATFORM')`, missing the gateway session context (`HERMES_SESSION_PLATFORM`). This caused `skill_view()` to expose skills that were platform-disabled for the active gateway session. Add `_get_session_platform()` helper that resolves the platform from `gateway.session_context.get_session_env`, mirroring the logic in `agent.skill_utils.get_disabled_skill_names()`. Now the platform resolution follows the same precedence as skill_utils: 1. Explicit `platform` argument 2. `HERMES_PLATFORM` environment variable 3. `HERMES_SESSION_PLATFORM` from gateway session context * chore: register VTRiot in AUTHOR_MAP * fix(cron): cancel orphan coroutine on delivery timeout before standalone fallback When the live adapter delivery path (_deliver_result) or media send path (_send_media_via_adapter) times out at future.result(timeout=N), the underlying coroutine scheduled via asyncio.run_coroutine_threadsafe can still complete on the event loop, causing a duplicate send after the standalone fallback runs. Cancel the future on TimeoutError before re-raising, so the standalone fallback is the sole delivery path. Adds TestDeliverResultTimeoutCancelsFuture and TestSendMediaTimeoutCancelsFuture. * test(cron): exercise _deliver_result and _send_media_via_adapter directly for timeout-cancel The original tests replicated the try/except/cancel/raise pattern inline with a mocked future, which tested Python's try/except semantics rather than the scheduler's behavior. Rewrite them to invoke _deliver_result and _send_media_via_adapter end-to-end with a real concurrent.futures.Future whose .result() raises TimeoutError. Mutation-verified: both tests fail when the try/except wrappers are removed from cron/scheduler.py, pass with them in place. * chore: remove stale requirements.txt in favor of pyproject.toml (#13515) The root requirements.txt has drifted from pyproject.toml for years (unpinned, missing deps like slack-bolt, slack-sdk, exa-py, anthropic) and no part of the codebase (CI, Dockerfiles, scripts, docs) consumes it. It exists only for drive-by 'pip install -r requirements.txt' users and will drift again within weeks of any sync. Canonical install remains: pip install -e ".[all]" Closes #13488 (thanks @hobostay — your sync was correct, we're just deleting the drift trap instead of patching it). * fix(agent): normalize socks:// env proxies for httpx/anthropic WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed. Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them: - run_agent._get_proxy_from_env - agent.auxiliary_client._validate_proxy_env_urls - agent.anthropic_adapter.build_anthropic_client - gateway.platforms.base.resolve_proxy_url Regression coverage: - run_agent proxy env resolution - auxiliary proxy env normalization - gateway proxy URL resolution Verified with: PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py 39 passed. * chore(release): add UNLINEARITY to AUTHOR_MAP * fix(permissions): handle None response from ACP request_permission * fix: support pagination and cwd filtering in list_sessions * fix(acp): follow-up — named-const page size, alias kwarg, tests - Replace kwargs.get('limit', 50) with module-level _LIST_SESSIONS_PAGE_SIZE constant. ListSessionsRequest schema has no 'limit' field, so the kwarg path was dead. Constant is the single source of truth for the page cap. - Use next_cursor= (field name) instead of nextCursor= (alias). Both work under the schema's populate_by_name config, but using the declared Python field name is the consistent style in this file. - Add docstring explaining cwd pass-through and cursor semantics. - Add 4 tests: first-page with next_cursor, single-page no next_cursor, cursor resumes after match, unknown cursor returns empty page. * feat: add buttons to update hermes and restart gateway * security(runtime_provider): close OLLAMA_API_KEY substring-leak sweep miss (#13522) Two call sites still used a raw substring check to identify ollama.com: hermes_cli/runtime_provider.py:496: _is_ollama_url = "ollama.com" in base_url.lower() run_agent.py:6127: if fb_base_url_hint and "ollama.com" in fb_base_url_hint.lower() ... Same bug class as GHSA-xf8p-v2cg-h7h5 (OpenRouter substring leak), which was fixed in commit dbb7e00e via base_url_host_matches() across the codebase. The earlier sweep missed these two Ollama sites. Self-discovered during April 2026 security-advisory triage; filed as GHSA-76xc-57q6-vm5m. Impact is narrow — requires a user with OLLAMA_API_KEY configured AND a custom base_url whose path or look-alike host contains 'ollama.com'. Users on default provider flows are unaffected. Filed as a draft advisory to use the private-fork flow; not CVE-worthy on its own. Fix is mechanical: replace substring check with base_url_host_matches at both sites. Same helper the rest of the codebase uses. Tests: 67 -> 71 passing. 7 new host-matcher cases in tests/test_base_url_hostname.py (path injection, lookalike host, localtest.me subdomain, ollama.ai TLD confusion, localhost, genuine ollama.com, api.ollama.com subdomain) + 4 call-site tests in tests/hermes_cli/test_runtime_provider_resolution.py verifying OLLAMA_API_KEY is selected only when base_url actually targets ollama.com. Fixes GHSA-76xc-57q6-vm5m * fix(env_passthrough): reject Hermes provider credentials from skill passthrough (#13523) A skill declaring `required_environment_variables: [ANTHROPIC_TOKEN]` in its SKILL.md frontmatter silently bypassed the `execute_code` sandbox's credential-scrubbing guarantee. `register_env_passthrough` had no blocklist, so any name a skill chose flipped `is_env_passthrough(name) => True`, which shortcircuits the sandbox's secret filter. Fix: reject registration when the name appears in `_HERMES_PROVIDER_ENV_BLOCKLIST` (the canonical list of Hermes-managed credentials — provider keys, gateway tokens, etc.). Log a warning naming GHSA-rhgp-j443-p4rf so operators see the rejection in logs. Non-Hermes third-party API keys (TENOR_API_KEY for gif-search, NOTION_TOKEN for notion skills, etc.) remain legitimately registerable — they were never in the sandbox scrub list in the first place. Tests: 16 -> 17 passing. Two old tests that documented the bypass (`test_passthrough_allows_blocklisted_var`, `test_make_run_env_passthrough`) are rewritten to assert the new fail-closed behavior. New `test_non_hermes_api_key_still_registerable` locks in that legitimate third-party keys are unaffected. Reported in GHSA-rhgp-j443-p4rf by @q1uf3ng. Hardening; not CVE-worthy on its own per the decision matrix (attacker must already have operator consent to install a malicious skill). * fix(acp): wire approval callback + make it thread-local (#13525) Two related ACP approval issues: GHSA-96vc-wcxf-jjff — ACP's _run_agent never set HERMES_INTERACTIVE (or any other flag recognized by tools.approval), so check_all_command_guards took the non-interactive auto-approve path and never consulted the ACP-supplied ap…

…etween turns (NousResearch#7038) The gateway /usage handler only looked in _running_agents for the agent object, which is only populated while the agent is actively processing a message. Between turns (when users actually type /usage), the dict is empty and the handler fell through to a rough message-count estimate. The agent object actually lives in _agent_cache between turns (kept for prompt caching). This fix checks both dicts, with _running_agents taking priority (mid-turn) and _agent_cache as the between-turns fallback. Also brings the gateway output to parity with the CLI /usage: - Model name - Detailed token breakdown (input, output, cache read, cache write) - Cost estimation (estimated amount or 'included' for subscriptions) - Cache token lines hidden when zero (cleaner output) This fixes Nous Portal rate limit headers not showing up for gateway users — the data was being captured correctly but the handler could never see it.

@kshitijk4poor

Wires the agent/account_usage module from the preceding commit into /usage so users see provider-side quota/credit info alongside the existing session token report. CLI: - `_show_usage` appends account lines under the token table. Fetch runs in a 1-worker ThreadPoolExecutor with a 10s timeout so a slow provider API can never hang the prompt. Gateway: - `_handle_usage_command` resolves provider from the live agent when available, else from the persisted billing_provider/billing_base_url on the SessionDB row, so /usage still returns account info between turns when no agent is resident. Fetch runs via asyncio.to_thread. - Account section is appended to all three return branches: running agent, no-agent-with-history, and the new no-agent-no-history path (falls back to account-only output instead of "no data"). Tests: - 2 new tests in tests/gateway/test_usage_command.py cover the live- agent account section and the persisted-billing fallback path. Salvaged from PR NousResearch#2486 by @kshitijk4poor. The original branch had drifted ~2615 commits behind main and rewrote _show_usage wholesale, which would have dropped the rate-limit and cached-agent blocks added in PRs NousResearch#6541 and NousResearch#7038. This commit re-adds only the new behavior on top of current main.

@kshitijk4poor

Wires the agent/account_usage module from the preceding commit into /usage so users see provider-side quota/credit info alongside the existing session token report. CLI: - `_show_usage` appends account lines under the token table. Fetch runs in a 1-worker ThreadPoolExecutor with a 10s timeout so a slow provider API can never hang the prompt. Gateway: - `_handle_usage_command` resolves provider from the live agent when available, else from the persisted billing_provider/billing_base_url on the SessionDB row, so /usage still returns account info between turns when no agent is resident. Fetch runs via asyncio.to_thread. - Account section is appended to all three return branches: running agent, no-agent-with-history, and the new no-agent-no-history path (falls back to account-only output instead of "no data"). Tests: - 2 new tests in tests/gateway/test_usage_command.py cover the live- agent account section and the persisted-billing fallback path. Salvaged from PR NousResearch#2486 by @kshitijk4poor. The original branch had drifted ~2615 commits behind main and rewrote _show_usage wholesale, which would have dropped the rate-limit and cached-agent blocks added in PRs NousResearch#6541 and NousResearch#7038. This commit re-adds only the new behavior on top of current main.

@kshitijk4poor

Wires the agent/account_usage module from the preceding commit into /usage so users see provider-side quota/credit info alongside the existing session token report. CLI: - `_show_usage` appends account lines under the token table. Fetch runs in a 1-worker ThreadPoolExecutor with a 10s timeout so a slow provider API can never hang the prompt. Gateway: - `_handle_usage_command` resolves provider from the live agent when available, else from the persisted billing_provider/billing_base_url on the SessionDB row, so /usage still returns account info between turns when no agent is resident. Fetch runs via asyncio.to_thread. - Account section is appended to all three return branches: running agent, no-agent-with-history, and the new no-agent-no-history path (falls back to account-only output instead of "no data"). Tests: - 2 new tests in tests/gateway/test_usage_command.py cover the live- agent account section and the persisted-billing fallback path. Salvaged from PR NousResearch#2486 by @kshitijk4poor. The original branch had drifted ~2615 commits behind main and rewrote _show_usage wholesale, which would have dropped the rate-limit and cached-agent blocks added in PRs NousResearch#6541 and NousResearch#7038. This commit re-adds only the new behavior on top of current main.

…etween turns (NousResearch#7038) The gateway /usage handler only looked in _running_agents for the agent object, which is only populated while the agent is actively processing a message. Between turns (when users actually type /usage), the dict is empty and the handler fell through to a rough message-count estimate. The agent object actually lives in _agent_cache between turns (kept for prompt caching). This fix checks both dicts, with _running_agents taking priority (mid-turn) and _agent_cache as the between-turns fallback. Also brings the gateway output to parity with the CLI /usage: - Model name - Detailed token breakdown (input, output, cache read, cache write) - Cost estimation (estimated amount or 'included' for subscriptions) - Cache token lines hidden when zero (cleaner output) This fixes Nous Portal rate limit headers not showing up for gateway users — the data was being captured correctly but the handler could never see it.

@kshitijk4poor

Wires the agent/account_usage module from the preceding commit into /usage so users see provider-side quota/credit info alongside the existing session token report. CLI: - `_show_usage` appends account lines under the token table. Fetch runs in a 1-worker ThreadPoolExecutor with a 10s timeout so a slow provider API can never hang the prompt. Gateway: - `_handle_usage_command` resolves provider from the live agent when available, else from the persisted billing_provider/billing_base_url on the SessionDB row, so /usage still returns account info between turns when no agent is resident. Fetch runs via asyncio.to_thread. - Account section is appended to all three return branches: running agent, no-agent-with-history, and the new no-agent-no-history path (falls back to account-only output instead of "no data"). Tests: - 2 new tests in tests/gateway/test_usage_command.py cover the live- agent account section and the persisted-billing fallback path. Salvaged from PR NousResearch#2486 by @kshitijk4poor. The original branch had drifted ~2615 commits behind main and rewrote _show_usage wholesale, which would have dropped the rate-limit and cached-agent blocks added in PRs NousResearch#6541 and NousResearch#7038. This commit re-adds only the new behavior on top of current main.

…etween turns (NousResearch#7038) The gateway /usage handler only looked in _running_agents for the agent object, which is only populated while the agent is actively processing a message. Between turns (when users actually type /usage), the dict is empty and the handler fell through to a rough message-count estimate. The agent object actually lives in _agent_cache between turns (kept for prompt caching). This fix checks both dicts, with _running_agents taking priority (mid-turn) and _agent_cache as the between-turns fallback. Also brings the gateway output to parity with the CLI /usage: - Model name - Detailed token breakdown (input, output, cache read, cache write) - Cost estimation (estimated amount or 'included' for subscriptions) - Cache token lines hidden when zero (cleaner output) This fixes Nous Portal rate limit headers not showing up for gateway users — the data was being captured correctly but the handler could never see it.

@kshitijk4poor

Wires the agent/account_usage module from the preceding commit into /usage so users see provider-side quota/credit info alongside the existing session token report. CLI: - `_show_usage` appends account lines under the token table. Fetch runs in a 1-worker ThreadPoolExecutor with a 10s timeout so a slow provider API can never hang the prompt. Gateway: - `_handle_usage_command` resolves provider from the live agent when available, else from the persisted billing_provider/billing_base_url on the SessionDB row, so /usage still returns account info between turns when no agent is resident. Fetch runs via asyncio.to_thread. - Account section is appended to all three return branches: running agent, no-agent-with-history, and the new no-agent-no-history path (falls back to account-only output instead of "no data"). Tests: - 2 new tests in tests/gateway/test_usage_command.py cover the live- agent account section and the persisted-billing fallback path. Salvaged from PR NousResearch#2486 by @kshitijk4poor. The original branch had drifted ~2615 commits behind main and rewrote _show_usage wholesale, which would have dropped the rate-limit and cached-agent blocks added in PRs NousResearch#6541 and NousResearch#7038. This commit re-adds only the new behavior on top of current main.

…etween turns (NousResearch#7038) The gateway /usage handler only looked in _running_agents for the agent object, which is only populated while the agent is actively processing a message. Between turns (when users actually type /usage), the dict is empty and the handler fell through to a rough message-count estimate. The agent object actually lives in _agent_cache between turns (kept for prompt caching). This fix checks both dicts, with _running_agents taking priority (mid-turn) and _agent_cache as the between-turns fallback. Also brings the gateway output to parity with the CLI /usage: - Model name - Detailed token breakdown (input, output, cache read, cache write) - Cost estimation (estimated amount or 'included' for subscriptions) - Cache token lines hidden when zero (cleaner output) This fixes Nous Portal rate limit headers not showing up for gateway users — the data was being captured correctly but the handler could never see it.

@kshitijk4poor

Wires the agent/account_usage module from the preceding commit into /usage so users see provider-side quota/credit info alongside the existing session token report. CLI: - `_show_usage` appends account lines under the token table. Fetch runs in a 1-worker ThreadPoolExecutor with a 10s timeout so a slow provider API can never hang the prompt. Gateway: - `_handle_usage_command` resolves provider from the live agent when available, else from the persisted billing_provider/billing_base_url on the SessionDB row, so /usage still returns account info between turns when no agent is resident. Fetch runs via asyncio.to_thread. - Account section is appended to all three return branches: running agent, no-agent-with-history, and the new no-agent-no-history path (falls back to account-only output instead of "no data"). Tests: - 2 new tests in tests/gateway/test_usage_command.py cover the live- agent account section and the persisted-billing fallback path. Salvaged from PR NousResearch#2486 by @kshitijk4poor. The original branch had drifted ~2615 commits behind main and rewrote _show_usage wholesale, which would have dropped the rate-limit and cached-agent blocks added in PRs NousResearch#6541 and NousResearch#7038. This commit re-adds only the new behavior on top of current main.

teknium1 merged commit 6da952b into main Apr 10, 2026
4 checks passed

github-actions Bot mentioned this pull request Apr 15, 2026

chore: bump NousResearch/hermes-agent version from v2026.4.8 to v2026.4.13 Docker-Hub-sirmark/docker-hermes-agent#1

Merged

teknium1 mentioned this pull request Apr 21, 2026

feat: add account limits section to /usage #13428

Merged

teknium1 mentioned this pull request Apr 21, 2026

feat: add account limits to /usage #2486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): /usage now shows rate limits, cost, and token details between turns#7038

fix(gateway): /usage now shows rate limits, cost, and token details between turns#7038
teknium1 merged 1 commit into
mainfrom
hermes/hermes-c711558a

teknium1 commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 10, 2026

Summary

What changed

Before/After

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant