docs: round 2 audit — messaging, developer-guide, guides, integrations by teknium1 · Pull Request #22858 · NousResearch/hermes-agent

teknium1 · 2026-05-09T21:59:07Z

Summary

Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/, guides/, and integrations/ against the live registries and gateway code. 24 files changed, +132 / -89.

Same audit method as PR #22784 (round 1): every CLI flag, slash command, config key, env var, provider name, toolset, and tool name verified against COMMAND_REGISTRY, PROVIDER_REGISTRY, DEFAULT_CONFIG, OPTIONAL_ENV_VARS, TOOLSETS, tools.registry, gateway/platforms/ adapters, and --help output.

Highlights of what was wrong

messaging/

google_chat.md: pip install 'hermes-agent[google_chat]' doesn't resolve — no such extra in pyproject.toml. Replaced with the actual deps the adapter needs.
qqbot.md: config namespace was platforms.qq; actual key is platforms.qqbot (the adapter silently ignores the wrong one). QQ_STT_BASE_URL was documented as a real env var; the adapter only reads it via config, not env.
teams-meetings.md: hermes teams-pipeline is plugin-gated (the teams_pipeline plugin must be enabled), not a built-in subcommand.
open-webui.md: same hermes config set API_SERVER_* issue as round 1's api-server.md — API_SERVER_* are env vars, not YAML keys, so config set writes them where the API server doesn't read. Also bumped example ports to 8650+ to dodge the default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646) collision.
index.md: API Server toolset is hermes-api-server (was hermes (default)); Google Chat slug is hermes-google_chat (underscore — plugin name uses _).
sms.md: example log line had 0.0.0.0:8080 but the default SMS_WEBHOOK_HOST is 127.0.0.1.

developer-guide/

acp-internals.md: bridged-callbacks list included a fictional message_callback. Drop it; clarify thinking_callback is currently set to None and reasoning flows through step_callback.
gateway-internals.md: 'gateway/builtin_hooks/ (always active)' is wrong — the directory is empty, _register_builtin_hooks() is a no-op stub. Platform tree listed qqbot.py (it's a sub-package), and was missing yuanbao.py, feishu_comment.py, msgraph_webhook.py. Adapter count was "14+" — actual is 20+.
provider-runtime.md: 'Current provider families' list was missing AWS Bedrock, Azure Foundry, NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama Cloud, LM Studio, Tencent TokenHub. Fallback section described only the legacy single-pair fallback_model — corrected to the canonical list-form fallback_providers chain.
environments.md: parsers list missed llama4_json and the deepseek_v31 alias.
browser-supervisor.md: pointed at scripts/browser_supervisor_e2e.py which doesn't exist.
architecture.md + agent-loop.md + gateway-internals.md: stale LOC numbers everywhere (~13,700 → 15k+, ~12,200 → 16k, ~10,400 → 11.5k, etc.). Replaced specific counts with 'large file' to stop them drifting again.
architecture.md: tool/toolset count drifted (61 tools / 52 toolsets → 70+ / ~28).
contributing.md: uv pip install -e ./tinker-atropos quietly assumes submodules were initialized.

guides/

operate-teams-meeting-pipeline.md: cron flags were all wrong. The schedule is positional (not --schedule), the script-only flag is --no-agent (not --script-only), and there's no --command flag — --script must point at a file under ~/.hermes/scripts/. Also replaced fictional hermes cron show <name> with the real hermes cron status.
automation-templates.md: cron create --skills "a,b" doesn't work — the flag is --skill (singular, repeatable). Fixed 5 occurrences.
minimax-oauth.md: hermes auth add minimax-oauth --region cn silently fails because --region isn't registered on the auth-add argparse spec. Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for China-region access.
cron-script-only.md: hermes send is fictional. Replaced the comparison-table mention with a webhook-subscription pointer; fixed dead link to non-existent /guides/pipe-script-output.
cron-troubleshooting.md: hermes serve isn't a real subcommand. Pointed at hermes gateway (foreground) / hermes gateway start (service).
local-ollama-setup.md: agent.api_timeout is not a config key. Right knob is the HERMES_API_TIMEOUT env var.
python-library.md: run_conversation() return dict has only final_response and messages — task_id is stored on the agent instance, not echoed back.
use-mcp-with-hermes.md: --args /c "npx -y …" wraps the npx command in one quoted string, so cmd.exe gets a single arg. Removed the surrounding quotes so argparse nargs='*' collects each token correctly.

integrations/

providers.md: Bedrock guardrail YAML keys were id/version (don't exist); actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 → api.gmi-serving.com/v1) and portal URL (inference.gmi.ai → www.gmicloud.ai) refreshed. Fallback section rewritten to lead with the canonical fallback_providers list form (was leading with the legacy single-dict fallback_model); supported-providers list extended (azure-foundry, alibaba-coding-plan, lmstudio).

index.md

'68 built-in tools' → '70+'. '15+ platforms' was both inconsistent with integrations/index.md ('19+') and undercounted — bumped to 20+ and added Weixin / QQ Bot / Yuanbao / Google Chat to the list.

Validation

	Before	After
Build (`npm run build`)	clean	clean
Broken-link warnings	155	155 (unchanged — same as round-1 post-skill-regen baseline)
Files changed	—	24
Lines	—	+132 / -89

Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/, guides/, and integrations/ against the live registries and gateway code. messaging/ - index.md: API Server toolset is hermes-api-server (was 'hermes (default)'); Google Chat slug is hermes-google_chat (underscore — plugin name uses _). - google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such extra); list the actual deps (google-cloud-pubsub, google-api-python-client, google-auth, google-auth-oauthlib). - qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is silently ignored by the adapter); QQ_STT_BASE_URL is not read directly — baseUrl lives under platforms.qqbot.extra.stt. - teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline plugin must be enabled), not a built-in subcommand. - sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default SMS_WEBHOOK_HOST). - open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to per-profile .env, not 'hermes config set' (same pattern fixed in api-server.md last round). Also bumped example ports to 8650+ to dodge the default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646) collision. developer-guide/ - architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py, gateway/run.py replaced with 'large file' to stop drifting. - agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)'). - gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway platform tree updated (qqbot is a sub-package, not qqbot.py; added yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/ (always active)' was wrong — it's an empty extension point and _register_builtin_hooks() is a no-op stub. - acp-internals.md: drop fictional 'message_callback' from the bridged- callbacks list; clarify thinking_callback is currently set to None. - provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry, NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama Cloud, LM Studio, Tencent TokenHub. Fallback section described only the legacy single-pair model — corrected to the canonical list-form fallback_providers chain. - environments.md: parsers list missing llama4_json and the deepseek_v31 alias; both register via @register_parser. - browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py which doesn't exist in-repo. - contributing.md: tinker-atropos is a git submodule — note that 'git submodule update --init' is required if cloning without --recurse-submodules. guides/ - operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is positional (not --schedule), the script-only flag is --no-agent (not --script-only), and there's no --command flag. Replaced with a real example that creates the script under ~/.hermes/scripts/ and uses the actual flags. Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'. - automation-templates.md: 'cron create --skills "a,b"' doesn't work — the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST rewrite. - minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently fails because --region isn't registered on the auth-add argparse spec. Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for China-region access. - cron-script-only.md: 'hermes send' is fictional — replaced the comparison- table mention with a webhook-subscription pointer; also fixed the dead link to /guides/pipe-script-output (page doesn't exist). - cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed at 'hermes gateway' (foreground) / 'hermes gateway start' (service). - local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right knob is the HERMES_API_TIMEOUT env var. - python-library.md: run_conversation() return dict has only final_response and messages — task_id is stored on the agent instance, not echoed back. - use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in one quoted string, so cmd.exe gets a single arg instead of the multi-token command line it needs. Removed the surrounding quotes — argparse nargs='*' collects each token correctly. integrations/ - providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist); actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 -> api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai) refreshed. Fallback section rewritten to lead with the canonical fallback_providers list form (was leading with the legacy fallback_model single dict); supported-providers list extended to include azure-foundry, alibaba-coding-plan, lmstudio. index.md - '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with integrations/index.md ('19+') and undercounted — bumped to 20+ and added Weixin/QQ Bot/Yuanbao/Google Chat to the list. Validation: 'npm run build' clean (exit 0); broken-link count unchanged at 155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.

NousResearch#22858) Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/, guides/, and integrations/ against the live registries and gateway code. messaging/ - index.md: API Server toolset is hermes-api-server (was 'hermes (default)'); Google Chat slug is hermes-google_chat (underscore — plugin name uses _). - google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such extra); list the actual deps (google-cloud-pubsub, google-api-python-client, google-auth, google-auth-oauthlib). - qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is silently ignored by the adapter); QQ_STT_BASE_URL is not read directly — baseUrl lives under platforms.qqbot.extra.stt. - teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline plugin must be enabled), not a built-in subcommand. - sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default SMS_WEBHOOK_HOST). - open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to per-profile .env, not 'hermes config set' (same pattern fixed in api-server.md last round). Also bumped example ports to 8650+ to dodge the default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646) collision. developer-guide/ - architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py, gateway/run.py replaced with 'large file' to stop drifting. - agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)'). - gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway platform tree updated (qqbot is a sub-package, not qqbot.py; added yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/ (always active)' was wrong — it's an empty extension point and _register_builtin_hooks() is a no-op stub. - acp-internals.md: drop fictional 'message_callback' from the bridged- callbacks list; clarify thinking_callback is currently set to None. - provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry, NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama Cloud, LM Studio, Tencent TokenHub. Fallback section described only the legacy single-pair model — corrected to the canonical list-form fallback_providers chain. - environments.md: parsers list missing llama4_json and the deepseek_v31 alias; both register via @register_parser. - browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py which doesn't exist in-repo. - contributing.md: tinker-atropos is a git submodule — note that 'git submodule update --init' is required if cloning without --recurse-submodules. guides/ - operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is positional (not --schedule), the script-only flag is --no-agent (not --script-only), and there's no --command flag. Replaced with a real example that creates the script under ~/.hermes/scripts/ and uses the actual flags. Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'. - automation-templates.md: 'cron create --skills "a,b"' doesn't work — the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST rewrite. - minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently fails because --region isn't registered on the auth-add argparse spec. Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for China-region access. - cron-script-only.md: 'hermes send' is fictional — replaced the comparison- table mention with a webhook-subscription pointer; also fixed the dead link to /guides/pipe-script-output (page doesn't exist). - cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed at 'hermes gateway' (foreground) / 'hermes gateway start' (service). - local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right knob is the HERMES_API_TIMEOUT env var. - python-library.md: run_conversation() return dict has only final_response and messages — task_id is stored on the agent instance, not echoed back. - use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in one quoted string, so cmd.exe gets a single arg instead of the multi-token command line it needs. Removed the surrounding quotes — argparse nargs='*' collects each token correctly. integrations/ - providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist); actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 -> api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai) refreshed. Fallback section rewritten to lead with the canonical fallback_providers list form (was leading with the legacy fallback_model single dict); supported-providers list extended to include azure-foundry, alibaba-coding-plan, lmstudio. index.md - '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with integrations/index.md ('19+') and undercounted — bumped to 20+ and added Weixin/QQ Bot/Yuanbao/Google Chat to the list. Validation: 'npm run build' clean (exit 0); broken-link count unchanged at 155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.

* chore(release): add Zhekinmaksim to AUTHOR_MAP (#22449) Maps zhekinmaksim@gmail.com to GitHub login Zhekinmaksim so contributor_audit.py recognizes their authored commit in the upcoming #21930 salvage PR. * fix(async): replace get_event_loop() with get_running_loop() in async contexts Follow-up to PR #21293 (cli.py), which fixed the same anti-pattern. `asyncio.get_event_loop()` is documented as effectively "always returns the running loop when called from a coroutine" and emits DeprecationWarning/RuntimeWarning in some interpreter configurations. The Python docs explicitly recommend get_running_loop() inside coroutines. Replaces the remaining 9 call sites that are unconditionally inside async def bodies: - tools/browser_cdp_tool.py — _cdp_call() (4 sites): deadline + remaining computations inside the async websockets.connect context manager. - hermes_cli/web_server.py — get_status, _start_device_code_flow, submit_oauth_code (3 sites): all FastAPI async endpoints offloading blocking httpx / PKCE work to run_in_executor. - environments/agent_loop.py — HermesAgentLoop (1 site): tool dispatch inside the async rollout loop. - environments/benchmarks/terminalbench_2/terminalbench2_env.py — rollout_and_score_eval (1 site): test verification thread offload. All 9 sites are unconditionally inside async def bodies, so a running loop is guaranteed and no try/except RuntimeError fallback is needed (unlike the cli.py case in #21293, which ran from a background thread). Behavior is identical on supported Python versions; aligns the codebase with the post-#21293 idiom and avoids future warnings as the deprecation hardens. Salvaged from PR #21930 by @Zhekinmaksim onto current main (the original branch was 109 commits behind and carried unintended stale-branch reverts of unrelated landed changes — _tail_lines encoding=utf-8 and the Windows PTY bridge guard). Only the 9 swaps from the PR's intended scope are applied here. * chore(release): add KvnGz to AUTHOR_MAP (#22458) Maps obafemiferanmi1999@gmail.com (the commit-author email used on PR #21473's branch) to GitHub login KvnGz (the PR/branch owner) so contributor_audit.py recognizes the authored commit in the upcoming salvage PR. * fix(tests): pin UTF-8 encoding when reading source files on Windows Three tests in tests/agent/test_auxiliary_config_bridge.py read in-tree source files (gateway/run.py and cli.py) via Path.read_text() with no encoding argument. The default falls back to the system locale, which on Western Windows installs is cp1252, and the read fails as soon as the source contains any byte that isn't valid cp1252 (e.g. an em-dash in a comment): UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 41190: character maps to <undefined> Linux CI doesn't catch this because the default Linux locale is UTF-8. Windows contributors hit it on every run of the test suite. Pin encoding="utf-8" on the three call sites that read repo source files. This matches the existing precedent in hermes_cli/doctor.py:363, where the same pattern (with an explanatory comment) was applied to fix the .env read on non-UTF-8 Windows locales. Affected tests now pass on Windows + Python 3.12: - TestGatewayBridgeCodeParity.test_gateway_has_auxiliary_bridge - TestGatewayBridgeCodeParity.test_gateway_no_compression_env_bridge - TestCLIDefaultsHaveAuxiliaryKeys.test_cli_defaults_can_merge_auxiliary * feat(plugins): add standalone_sender_fn for out-of-process cron delivery Plugin platforms (IRC, Teams, Google Chat) currently fail with `No live adapter for platform '<name>'` when a `deliver=<plugin>` cron job runs in a separate process from the gateway, even though the platforms are eligible cron targets via `cron_deliver_env_var` (added in #21306). Built-in platforms (Telegram, Discord, Slack, etc.) use direct REST helpers in `tools/send_message_tool.py` so cron can deliver without holding the gateway in the same process; plugin platforms historically depended on `_gateway_runner_ref()` which returns `None` out of process. This change adds an optional `standalone_sender_fn` field to `PlatformEntry` so plugins can register an ephemeral send path that opens its own connection, sends, and closes without needing the live adapter. The dispatch site in `_send_via_adapter` falls through to the hook when the gateway runner is unavailable, with a descriptive error when neither path applies. The hook is optional, so existing plugins are unaffected. Reference migrations land in the same change for IRC, Teams, and Google Chat, exercising the hook across stdlib (asyncio + IRC protocol), Bot Framework OAuth client_credentials, and Google service-account flows respectively. Security hardening on the new code paths: * IRC: control-character stripping on chat_id and message body to block CRLF command injection; bounded nick-collision retries; JOIN before PRIVMSG so channels with the default `+n` mode accept the delivery. * Teams: TEAMS_SERVICE_URL validated against an allowlist of known Bot Framework hosts (`smba.trafficmanager.net`, `smba.infra.gov.teams.microsoft.us`) to block SSRF; chat_id and tenant_id constrained to the documented Bot Framework character set; per-request timeouts so a slow STS endpoint cannot starve the activity POST. * Google Chat: chat_id and thread_id validated against strict resource-name regexes; service-account refresh wrapped in `asyncio.wait_for` so a hung token endpoint cannot stall the scheduler. Test coverage: 20 new tests covering happy path, missing-config errors, network failure modes, and each defensive validation. Existing tests unchanged. `bash scripts/run_tests.sh tests/tools/test_send_message_tool.py tests/gateway/test_irc_adapter.py tests/gateway/test_teams.py tests/gateway/test_google_chat.py` reports 341 passed, 0 regressions. Documentation: new "Out-of-process cron delivery" section in website/docs/developer-guide/adding-platform-adapters.md and an entry in gateway/platforms/ADDING_A_PLATFORM.md naming the hook. * fix(profiles): exclude infrastructure artifacts when cloning with --clone-all When the source profile is the default (~/.hermes), shutil.copytree() was copying multi-GB infrastructure alongside the ~40 MB of actual profile data: hermes-agent/ (repo checkout + 3 GB venv), .worktrees/, profiles/ (sibling profiles — recursive!), bin/ (installed binaries), node_modules/ (hundreds of MB). Add _CLONE_ALL_DEFAULT_EXCLUDE_ROOT frozenset with these five entries and pass an ignore callback to copytree(). Exclusions are gated on the source actually being the default profile (is_default_source) so named-profile sources are never affected. Also exclude at any depth: __pycache__/, *.pyc, *.pyo, *.sock, *.tmp. Profile data (config.yaml, .env, auth.json, state.db, sessions/, skills/, logs/) is preserved intact — clone-all means 'complete snapshot minus infrastructure'. Mirrors the approach already used by _default_export_ignore() and _DEFAULT_EXPORT_EXCLUDE_ROOT (the export-side exclusion set which is broader because it produces a portable archive, not a live clone). Co-authored-by: MustafaKara7 <karamusti912@gmail.com> Co-authored-by: fahdad <30740087+fahdad@users.noreply.github.com> Fixes #5022 Based on PRs #5025, #5026, and #21728 * fix(banner): resolve update-check repo from running code, not profile-scoped path check_for_updates() and _resolve_repo_dir() were preferring $HERMES_HOME/hermes-agent/ over Path(__file__).parent.parent.resolve() when looking for a .git checkout. For profiles created with --clone-all, $HERMES_HOME/hermes-agent/ points to a stale copy with a frozen HEAD, causing persistent "N commits behind" banners that never resolved. Flip the resolution order: prefer the running code's location first, fall back to $HERMES_HOME/hermes-agent/ only when the live checkout doesn't have a .git (system-wide pip installs, distro packages). The embedded-rev branch (HERMES_REVISION env var, set by nix builds) is unaffected — it uses git ls-remote against upstream, never reads the local checkout's HEAD. Based on PR #21728 by @fahdad * fix(gateway): clear slash-confirm state during session boundary cleanup * feat(gateway): stream Telegram edits safely * chore: add nik1t7n to AUTHOR_MAP Nikita Nosov (nik1t7n, PR #22264) — first-time contributor email and noreply alias. * fix(telegram): exclude row-label column from bullet items in table rendering When a GFM table has a row-label column (first column with no header), _render_table_block_for_telegram incorrectly included the row-label cell in the bullet zip alongside the data cells, producing a spurious bullet like '• 維度: 核心賣點' before the real data rows. Detect the row-label column by comparing the first data row cell count against the header count (has_row_label_col = len(first_data_row) == len(headers) + 1). When present, use cells[0] as the heading and zip headers against cells[1:] only, correctly excluding the row-label from the bullet list. Fixes #22604 * feat: confirm prompt for destructive slash commands (#4069) (#22687) /clear, /new, /reset, and /undo now ask the user to confirm before discarding conversation state — three-option prompt routed through the existing tools.slash_confirm primitive. Native yes/no buttons render on Telegram, Discord, and Slack (their adapters already implement send_slash_confirm); other platforms get a text-fallback prompt and reply with /approve, /always, or /cancel. The classic prompt_toolkit CLI uses the same three-option flow via the established _prompt_text_input pattern (see _confirm_and_reload_mcp). TUI keeps its existing modal overlay (#12312). Gated by new config key approvals.destructive_slash_confirm (default true). Picking 'Always Approve' flips the gate to false so subsequent destructive commands run silently — matches the established mcp_reload_confirm UX. Out of scope: /cron remove (separate domain — scheduled jobs, not session history). Existing TUI overlay env-var (HERMES_TUI_NO_CONFIRM) left unchanged; cosmetic unification can come later. Closes #4069. * fix(kanban): call recompute_ready after unlink_tasks removes a dependency Problem: unlink_tasks() removes a parent→child dependency edge but does not trigger recompute_ready(). A child whose last blocking parent is unlinked stays stuck in 'todo' indefinitely — it only promotes to 'ready' on the next dispatcher tick or a manual 'hermes kanban recompute'. For CLI-only users without a dispatcher, the child is permanently stuck. Root cause: complete_task() and unblock_task() both call recompute_ready() after their write transaction so downstream children are evaluated immediately. unlink_tasks() was missing this call — removing a dependency is semantically equivalent to completing one, so the same recompute is needed. Fix: Capture the rowcount result before the write_txn exits, then call recompute_ready(conn) outside the transaction when a row was actually deleted (so the child sees the updated task_links state). Tests: Added test_unlink_tasks_triggers_recompute_ready in tests/hermes_cli/test_kanban_db.py: creates parent A (done) + parent C (running), child B with both parents (todo), unlinks C→B, asserts B is ready immediately. Stash-verified: FAILS without fix (child stays todo), PASSES with fix. 62/62 tests green in tests/hermes_cli/test_kanban_db.py. Closes #22459. * chore: add wesleysimplicio to AUTHOR_MAP * perf(google_chat): defer heavy google-cloud imports to first adapter use (#22681) Plugin discovery imports every bundled platform plugin at model_tools import time. The google_chat adapter unconditionally pulled in google.cloud.pubsub_v1, googleapiclient, grpc, httplib2, and friends at module top — about 33 MB RSS and 110 ms wall on every CLI invocation, even ones that never construct a gateway adapter. Wrap the heavy imports in _load_google_modules(): an idempotent loader that rebinds the module-level globals (pubsub_v1, service_account, HttpError, MediaFileUpload, …) on first call and is invoked from GoogleChatAdapter.__init__, connect(), and check_google_chat_requirements(). The HttpError = Exception placeholder is preserved for the brief window before the loader runs, so 'except HttpError as exc:' clauses stay correct (Python looks up the name at try/except evaluation time, not at function definition time). Measured impact on a 9950X3D, 7-run medians: import cli: 895 → 787 ms (-108 ms / -12%) 133 → 110 MB ( -23 MB / -17%) import model_tools: 491 → 400 ms ( -91 ms / -19%) 95 → 66 MB ( -29 MB / -31%) google_chat alone: 244 → 132 ms (-112 ms / -46%) 83 → 50 MB ( -33 MB / -40%) hermes chat -q (cold): 177 → 145 MB ( -32 MB / -18%) Real-world win lands on every path that imports cli.py: hermes chat, hermes gateway, cron jobs, batch runs, subagents. Long-lived gateway processes save ~30 MB resident. All 157 google_chat tests pass; full gateway suite (5050 tests) green. * feat(plugins): HERMES_PLUGINS_DEBUG=1 surfaces plugin discovery logs (#22684) Plugin authors had no easy way to figure out why their plugin wasn't loading — failures were buried in agent.log at WARNING and skip reasons (disabled, not enabled, depth cap, exclusive) were DEBUG-only and invisible by default. Set HERMES_PLUGINS_DEBUG=1 to attach a stderr handler at DEBUG to the hermes_cli.plugins logger only. Surfaces: - which directories were scanned + manifest counts per source - per manifest: resolved key, name, kind, source, on-disk path - skip reasons (disabled, not enabled, exclusive, depth cap, no register) - per load: tools/hooks/slash/CLI commands the plugin registered - full traceback on YAML parse failure (exc_info on the existing warning) - full traceback on register() exceptions, pointing at the plugin author's line Env var off (default) → zero new stderr output, same as before. Touches only hermes_cli/plugins.py + a doc section in the plugin-build guide + an entry in the env-vars reference. 3 new tests lock the attach/idempotent/no-attach behavior. * fix(kanban): gate claim + unblock on parent completion Enforce the parent-completion invariant at claim_task (the single ready->running chokepoint) and re-gate unblock_task so blocked->ready only fires when parents are done. Prevents child tasks from running ahead of in-progress parents under the create-then-link race. Also adds a stress test that races concurrent create+link against hammered claim_task and asserts no child runs while any parent is undone. Ref: kanban/boards/cookai/workspaces/t_a6acd07d/root-cause.md Refs: t_8d6af9d6 * chore: add SiliconID to AUTHOR_MAP * feat(delegate): show user's actual concurrency / spawn-depth limits in tool description (#22694) The delegate_task tool description hardcoded 'default 3' / 'default 2' for max_concurrent_children / max_spawn_depth, which misled the model on any install that raised these limits — the schema text said 'default 3' even when the user had set max_concurrent_children=15 / max_spawn_depth=3, so the model would self-cap at 3 and never use the headroom. Make the description dynamic. ToolEntry gains an optional dynamic_schema_overrides callable; registry.get_definitions() merges its output on top of the static schema before returning it. delegate_tool registers a builder that reads the current delegation.* config and emits: - 'up to N items concurrently for this user' (N = max_concurrent_children) - 'Nested delegation IS enabled / OFF for this user (max_spawn_depth=N)' - 'orchestrator children can themselves delegate up to M more level(s)' - 'orchestrator_enabled=false' when the kill switch is set The model_tools cache key already includes config.yaml mtime+size, so edits to delegation.* in config invalidate the cached tool definitions without an explicit hook. CLI_CONFIG staleness within a process is a pre-existing limitation of _load_config and out of scope here. Static description / tasks.description / role.description in DELEGATE_TASK_SCHEMA are placeholders so module import doesn't trigger cli.CLI_CONFIG load before the test conftest can redirect HERMES_HOME. * fix(cli): expand composite toolset when mixed with configurables in platform_toolsets When platform_toolsets[<platform>] contains both a composite (e.g. hermes-cli) and at least one configurable opt-in (e.g. spotify), the has_explicit_config branch in _get_platform_tools silently dropped the composite, leaving sessions with only the configurable + plugin tools and no native tools (terminal, file, web, browser, memory, etc.). Mirror the else-branch's subset inference for composites that sit alongside the configurables, but apply _DEFAULT_OFF_TOOLSETS only to the implicit expansion so user-listed default-off toolsets (spotify, discord) survive. * fix(gateway): refresh runtime argv metadata * fix(plugins): resolve Git binary for installs under minimal PATH Resolve git via shutil.which with POSIX and Git-for-Windows fallbacks before clone and pull so Dashboard/API installs do not misreport Git as missing. Add regression tests for the resolver and pull subprocess invocation. * chore: add xieNniu to AUTHOR_MAP * fix(telegram): honor message.quote for partial-quote reply context When a Telegram user replies using the native quote feature to select only part of a prior message, _build_message_event was injecting the ENTIRE replied-to message into reply_to_text via message.reply_to_message.text/caption. python-telegram-bot exposes the user-selected substring as message.quote (TextQuote.text); we now prefer that and fall back to the full replied-to text only when no native quote is present. The agent-visible "[Replying to: \"...\"]" prefix can otherwise expand the user's narrow quote into the full prior message, causing the agent to act on unrelated actionable-looking text the user did not select (e.g. multi-item briefings where the user quotes one bullet but the prefix injects every bullet). Falls back cleanly when message.quote is absent (PTB <21 or replies that don't quote a substring). Fixes #22619 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(profiles): honour active_profile when HERMES_HOME points to hermes root Problem: After `hermes profile use NAME`, the gateway (started via systemd with HERMES_HOME=/root/.hermes hardcoded) ignores the active profile and always runs as the Default profile. WebUI, Telegram, and all non-CLI platforms are affected. Root cause: _apply_profile_override() contained an early-return guard: if profile_name is None and os.environ.get("HERMES_HOME"): return # trust the inherited value The intent was to let child processes inherit their parent's profile via HERMES_HOME without redundantly re-reading active_profile. But systemd also sets HERMES_HOME — to the hermes root (/root/.hermes), not a profile directory — so the guard fired and silently skipped the active_profile check. The user's `hermes profile use NAME` write to ~/.hermes/active_profile was never seen by the gateway process. Fix: Only skip the active_profile check when HERMES_HOME is already a profile directory, identified by its immediate parent directory being named "profiles" (e.g. ~/.hermes/profiles/coder or /opt/data/profiles/coder). When HERMES_HOME points to a root directory (parent name != "profiles"), continue to read active_profile. Tests: - test_hermes_home_at_root_with_active_profile_is_redirected: the bug scenario — HERMES_HOME=/root/.hermes + active_profile=coder → HERMES_HOME must be redirected to .../profiles/coder. Stash-verified: FAILS without fix, PASSES with fix. - test_hermes_home_already_profile_dir_is_trusted: child-process inheritance contract unchanged — .../profiles/coder is trusted as-is. - test_hermes_home_unset_reads_active_profile: classic path unchanged. - test_hermes_home_unset_default_profile_no_redirect: "default" still produces no redirect. 4/4 tests green. Closes #22502. * fix(dingtalk): clarify webhook media behavior * fix(dingtalk): align override signatures with base + guard Optional[error] in tests * feat(mcp): add codex preset for built-in MCP server discovery Adds 'codex' to the _MCP_PRESETS registry so users can add it via Connecting to 'codex'... ✓ Connected! Found 2 tool(s) from 'codex': codex Run a Codex session. Accepts configuration parameters matchi... codex-reply Continue a Codex conversation by providing the thread id and... Enable all 2 tools? [Y/n/select]: Cancelled. without manually specifying the command and args. Enables: codex mcp-server → Hermes native MCP client → Codex tools available as first-class Hermes tools. * fix(cron): avoid github skill false positives in scanner * fix(cron): keep auth-header exfiltration blocked * fix(cron): allow quoted URL in github auth-header allowlist The github-pr-workflow skill wraps the URL in double-quotes ('curl -H ... "https://api.github.com/..."'), which the original allowlist regex (\s+https://api...) did not match. Without this, the bundled github-pr-workflow skill is still blocked at every cron tick despite #22605's fix landing for the bare-URL form. Make the leading quote optional and add a regression test pinning both single- and double-quoted forms. * tests: add Windows skip guards for UNIX-only stdlib imports * fix: move pytest.importorskip below pytest import in skip-guarded tests The original PR placed 'pwd = pytest.importorskip("pwd")' on line 4 but 'import pytest' on line 9 — NameError on module load. Same for test_file_sync_back.py. Plus, the in-function 'pwd = pytest.importorskip' calls in test_auto_detected_root_is_rejected confused Python's scope analysis (later 'import pytest' made pytest local everywhere in the function) and caused UnboundLocalError. Drop the now-redundant in-function importorskip calls and rely on the module-level guard. * chore: add wali-reheman to AUTHOR_MAP * feat(gateway): add Telegram guest mention mode * fix: follow-up for salvaged PR #22263 - Restore allowed_chats gate before thread_id check so ignored_threads applies universally (even to guest mentions). - Compute _message_mentions_bot once in _should_process_message to eliminate redundant second entity scan when guest_mode=true and the message does not mention the bot. - Remove redundant _is_group_chat from _is_guest_mention (caller already verified the message is a group chat). - Update _telegram_allowed_chats docstring to note guest_mode exception. - Add test coverage: bot_command entity, text_mention entity, caption_entities, and ignored_threads + guest_mode interaction. - Add nik1t7n to AUTHOR_MAP. * fix(agent): notify context engine on commit_memory_session (#22764) When session_id rotates (e.g. /new), commit_memory_session was firing MemoryManager.on_session_end but skipping ContextEngine.on_session_end. Engines that accumulate per-session state (LCM-style DAGs, summary stores) leaked that state from the rotated-out session into whatever continued under the same compressor instance. Mirror the call shutdown_memory_provider already makes — same lifecycle moment, same hook contract ("real session boundaries (CLI exit, /reset, gateway expiry)"). /new is a real boundary for the old session_id; providers keep their state but the rotated-out session_id is done. 6 regression tests covering both-hooks-fire, no-memory-manager, no-context-engine, both failure-tolerant paths. Closes #22394. * fix(tests): harden run_tests.sh — uv-aware bootstrap + scrub HERMES_CRON_SESSION (#22767) Two unrelated but co-located fixes to scripts/run_tests.sh: 1. pytest-split bootstrap (#22401): the script tried '$PYTHON -m pip install pytest-split' on first run, but uv-created venvs ship without pip. Result: 'No module named pip' before any test ran. Add a uv fallback (uv pip install --python $PYTHON), keep pip as a secondary path, and emit a clear error pointing at 'uv pip install -e ".[dev]"' when neither is available. Also declare pytest-split in pyproject.toml dev extra so a normal '.[dev]' install provisions it. 2. HERMES_CRON_SESSION leak (#22400): the hermetic env scrub already unsets HERMES_GATEWAY_SESSION and HERMES_INTERACTIVE but missed the sibling HERMES_CRON_SESSION. When run_tests.sh is invoked from a Hermes cron job, that variable leaks into pytest, flipping tools/approval.py into cron-deny mode and breaking tests/acp/test_approval_isolation.py and friends. Closes #22400. Closes #22401. * fix(kanban): sanitize comment author rendering in build_worker_context (#22769) Operator-controlled HERMES_PROFILE values were rendered as '**${author}** (${ts}):' — markdown bold with no provenance prefix. Worker comment bodies render directly underneath. A misleading profile name like 'hermes-system' or 'operator' could be misread by the next worker as a system directive above attacker-influenced content (confused-deputy primitive gated on operator misconfig). The LLM-controlled author-forgery surface was already closed in #22435 (author removed from KANBAN_COMMENT_SCHEMA). This is defense-in-depth: render with an explicit 'comment from worker `<author>` at <ts>:' prefix so even 'hermes-system' resolves to 'comment from worker `hermes-system` at ...' — parseable as worker-comment metadata, not a system directive. Strip backticks from author so they can't break out of the fence. Update test_build_worker_context_caps_comments to count by body regex since the rendered author line now also starts with 'comment '. Closes #22452. * fix(agent): hydrate memory-nudge counters from conversation_history (#22774) Gateway creates a fresh AIAgent per inbound message in several common scenarios: cache miss, idle eviction (1h TTL), config-signature mismatch, process restart. A freshly-built AIAgent has _turns_since_memory=0 and _user_turn_count=0, so the memory.nudge_interval trigger ('_turns_since_memory >= _memory_nudge_interval') can never be reached when these reconstructions happen on roughly the cadence of the interval. A user can chat for hours on Telegram without ever seeing a self-improvement review fire. Reconstruct the counters from conversation_history at the top of run_conversation(), right after the existing _hydrate_todo_store call. Idempotent guard ('if self._user_turn_count == 0') means a cached agent that already accumulated counters keeps them; only freshly-built agents hydrate. Modulo arithmetic preserves the original 1-in-N cadence rather than firing a review immediately on resume. 7 regression tests pinning the contract (mid-cycle history, modulo wrap, idempotency, zero-interval skip, role==user filtering, production-code anchor). Closes #22357. * fix(api-server): emit length/error finish_reason for truncation/failure (#22775) Non-streaming /v1/chat/completions wrapped any AIAgent result \u2014 including partial/failed runs \u2014 as a successful 200 with finish_reason='stop' and the internal failure string substituted into message.content. API clients had no way to distinguish 'agent answered: X' from 'agent crashed and the X you see is its error message'. After the fix: - completed: True \u2192 200 finish_reason='stop' (unchanged) - partial + truncated text \u2192 200 finish_reason='length' + hermes extras - partial + no text / failed \u2192 502 OpenAI error envelope (SDKs raise) - other failures \u2192 200 finish_reason='error' + hermes extras Adds X-Hermes-Completed / X-Hermes-Partial / X-Hermes-Error headers plus a 'hermes' extras object on partial responses for clients that want the full picture. Closes #22496. * fix(cli): make Ctrl+Enter insert newline on WSL/SSH/Windows Terminal (#22777) Native Windows, WSL, SSH sessions, and Windows Terminal all send Ctrl+Enter as bare LF (c-j). Hermes was binding c-j as submit on every POSIX platform, so Ctrl+Enter submitted instead of inserting a newline on those terminals. Reported in #22379. Add _preserve_ctrl_enter_newline() predicate that detects the environments where Ctrl+Enter must produce a newline (sys.platform == 'win32', SSH_CONNECTION/SSH_CLIENT/SSH_TTY env, WT_SESSION, WSL_DISTRO_NAME, /proc/version 'microsoft' marker). Gate the c-j-as-submit binding off in those environments and gate the c-j-as-newline handler on. Local POSIX TTYs without those markers (docker exec, plain ssh from a Mac) keep c-j as submit so plain Enter still works on thin PTYs. Add install_ctrl_enter_alias() in hermes_cli/pt_input_extras.py mapping the three CSI-u / modifyOtherKeys variants of Ctrl+Enter ('\x1b[13;5u', '\x1b[27;5;13~', '\x1b[27;5;13u') to the (Escape, ControlM) tuple Alt+Enter produces. This lets Kitty / mintty / xterm-with-modifyOtherKeys users over SSH get a Ctrl+Enter newline through the existing Alt+Enter handler. 9 new tests + extended existing test_lf_enter_binds_to_submit_handler_posix to cover bare-local vs SSH branches. Closes #22379. * fix(fallback): skip chain entries matching current provider/model/base_url (#22780) _try_activate_fallback() walked the chain by index without comparing the candidate entry against the currently-failing backend. So a misconfigured chain that listed the same provider+model as the primary, or two custom_providers entries pointing at the same shim URL, would loop the same failure 3x for the same backend. After the fix, advance() skips: - entries where (provider, model) match the current agent's - entries with a base_url + model matching the current backend (catches two custom_providers names pointing at the same shim) Recursing through self._try_activate_fallback() continues to the next chain entry; if everything matches, returns False and the caller moves on without retrying the same broken path. 3 regression tests covering same-provider-same-model skip, same-base_url- same-model skip, and the all-self-matching-returns-False exhaustion path. Closes #22548 (the Hermes-side portion). The 120s timeout itself in the downstream claude-cli shim is a deployment concern documented in that issue's wherewolf87 comment. * fix(memory): tighten MEMORY_GUIDANCE against ephemeral PR/issue/SHA notes (#22781) The model regularly writes session-outcome facts to MEMORY.md despite the existing 'Do NOT save task progress' line — entries like 'Submitted PR #22577 for the kanban dedup fix' or 'Fixed bug X in file Y'. These are stale within days, pollute the system prompt, and crowd out durable user preferences (the issue #22563 reporter saw 9 sections of bug-fix notes injected on a brand-new task). Add explicit examples of what NOT to save (PR numbers, issue numbers, commit SHAs, 'fixed/submitted/Phase N done', file counts) plus the 7-day-staleness heuristic so the model has a concrete calibration target rather than guessing what counts as 'task progress'. Closes #22563 (the prompt-side, low-risk portion). The bigger relevance-based-injection / vector-retrieval feature requested in #22563 is tracked under #2184 (Richer local memory). Per skill rule on prompt caching, dynamic memory injection breaks the frozen-snapshot invariant and needs a separate design call. * fix(tools): install cua-driver when Computer Use is enabled via 'hermes tools' (#22765) Returning users who enabled '🖱️ Computer Use (macOS)' via 'hermes tools' saw '✓ Saved configuration' but no install — cua-driver was never on PATH and the toolset failed at first use. Two compounding causes: 1. _toolset_needs_configuration_prompt fell through to _toolset_has_keys, which returned True for any provider with empty env_vars. cua-driver has no env vars, so the gate skipped _configure_toolset entirely and _run_post_setup('cua_driver') never ran. 2. No stable CLI entry-point existed for re-running the install when the picker no-op'd it (e.g. when toggling the toolset off+on inside one picker session, where 'added' is empty). Changes: - hermes_cli/tools_config.py: add _POST_SETUP_INSTALLED registry mapping post_setup keys to installed-state predicates. The gate now returns True when any visible provider has a registered post_setup whose predicate fails. cua_driver is the only opt-in for now; other post_setup hooks keep their existing behaviour. - hermes_cli/main.py: add 'hermes computer-use install' and 'hermes computer-use status' as a stable docs target. install reuses the same _run_post_setup('cua_driver') path that the picker invokes; status reports whether cua-driver is on PATH. - tools/computer_use/cua_backend.py: install hint now points users at 'hermes computer-use install' first. - website/docs/user-guide/features/computer-use.md: document the new command as the primary install path. - website/docs/reference/cli-commands.md: catalog 'hermes computer-use' alongside 'hermes tools'. - tests/hermes_cli/test_post_setup_gating.py: regression coverage for the gate predicate (missing -> setup forced, installed -> setup skipped, broken predicate -> non-blocking, unregistered keys -> behaviour unchanged). Fixes #22737. Reported by @f-trycua. * perf(doctor): parallelize API connectivity checks and disable IMDS (#22766) `hermes doctor` ran every connectivity probe sequentially and on a typical developer laptop spent ~2s of its ~5s wall time inside boto3's EC2 instance-metadata-service lookup (169.254.169.254) — the default AWS credential chain probes IMDS even when AWS_BEARER_TOKEN_BEDROCK or AWS_ACCESS_KEY_ID is the only legitimate source. Refactor the API Connectivity section so every probe (OpenRouter, Anthropic, ~16 static API-key providers + dynamic profiles, AWS Bedrock) is a pure function returning a structured result, then fan them out through a ThreadPoolExecutor(max_workers=8). Output order, glyphs, colours, padding, and issue strings stay byte-for-byte identical to the sequential implementation; results are gathered in submission order. Also disable IMDS for the parallel block by setting AWS_EC2_METADATA_DISABLED=true on the parent thread before submitting work (and restoring its prior value in a finally block). Bedrock's real-API call gets a Config(connect_timeout=5, read_timeout=10, retries={max_attempts:1}) so a transient regional failure can't pad the run by 30+ seconds. Measured impact (5-run medians, 9950X3D): hermes doctor: 5.07 → 2.16 s (-57%) Doctor tests: 48 passed (test_doctor.py + test_doctor_command_install.py). The remaining ~2s of wall is import overhead + a couple of one-off network calls outside the API Connectivity section (`fetch_models_dev` provider catalog refresh, Nous OAuth refresh in `Auth Providers`). Those are next-tier targets, not part of this change. * fix: send correct resolution param to xAI image generation API The xAI /v1/images/generations endpoint expects resolution as a literal string ('1k' or '2k'), not the numeric value ('1024'). - Change _XAI_RESOLUTIONS from a dict mapping to a validation set - Use the resolution key directly instead of the mapped value - Fall back to DEFAULT_RESOLUTION on invalid config values Fixes 422 Unprocessable Entity errors when resolution was sent. * chore: add A-kamal to AUTHOR_MAP for PR #18678 * test(xai-image): regression-guard literal '1k'/'2k' resolution payload The xAI image-gen provider was DOA from PR #14765 onward — every request 422'd because the resolution param was being mapped to '1024'/'2048' but xAI's API expects the literal strings '1k'/'2k'. PR #18678 fixed the mapping; this test asserts the wire payload carries the literal so the regression cannot recur silently. * perf(gateway): defer QQAdapter and YuanbaoAdapter imports via PEP 562 (#22790) `gateway/platforms/__init__.py` eagerly imported `QQAdapter` and `YuanbaoAdapter` at package-init time, which transitively pulled in qqbot's chunked-upload + keyboards + onboard machinery and yuanbao's websocket stack. About 84 ms wall and 23 MB RSS on every fresh process that touched anything under `gateway.platforms` — including `hermes chat` (via run_agent → cli's plugin discovery transitive import). Nothing in the codebase actually consumes these symbols from the package root; every real call site uses the long-form path (`from gateway.platforms.qqbot import QQAdapter`, `from gateway.platforms.yuanbao import YuanbaoAdapter` in gateway/run.py). The eager re-export was only there for convenience. Replace with a PEP 562 module-level `__getattr__` that lazily imports on first attribute access. Public API stays identical: `from gateway.platforms import QQAdapter` keeps working but only pays the import cost when the symbol is actually touched. `__dir__` preserves help() / autocomplete behavior. Measured impact (7-run medians, 9950X3D): import gateway.platforms 127 → 43 ms (-66%) 50 → 27 MB (-46%) import gateway.platforms.base 127 → 44 ms (-65%) 50 → 27 MB (-46%) import cli (full chat path) 745 → 710 ms ( -5%) 96 → 90 MB ( -6%) hermes chat -q (cold) -5 MB The per-import win is biggest because qqbot/yuanbao deps don't overlap with anything on the gateway-platforms path — full `import cli` already loads aiohttp/websockets transitively from other places, so the marginal CLI win is smaller than the isolated import benchmark. The `gateway.platforms.base` win is what matters most for long-lived gateway processes: every gateway boot saves 23 MB resident. All 144 qqbot tests pass; broader gateway suite (5132 tests) passes modulo 4 pre-existing flakes also failing on main without this change. * docs: deep audit — fix stale config keys, missing commands, and registry drift (#22784) * docs: deep audit — fix stale config keys, missing commands, and registry drift Cross-checked ~80 high-impact docs pages (getting-started, reference, top-level user-guide, user-guide/features) against the live registries: hermes_cli/commands.py COMMAND_REGISTRY (slash commands) hermes_cli/auth.py PROVIDER_REGISTRY (providers) hermes_cli/config.py DEFAULT_CONFIG (config keys) toolsets.py TOOLSETS (toolsets) tools/registry.py get_all_tool_names() (tools) python -m hermes_cli.main <subcmd> --help (CLI args) reference/ - cli-commands.md: drop duplicate hermes fallback row + duplicate section, add stepfun/lmstudio to --provider enum, expand auth/mcp/curator subcommand lists to match --help output (status/logout/spotify, login, archive/prune/ list-archived). - slash-commands.md: add missing /sessions and /reload-skills entries + correct the cross-platform Notes line. - tools-reference.md: drop bogus '68 tools' headline, drop fictional 'browser-cdp toolset' (these tools live in 'browser' and are runtime-gated), add missing 'kanban' and 'video' toolset sections, fix MCP example to use the real mcp_<server>_<tool> prefix. - toolsets-reference.md: list browser_cdp/browser_dialog inside the 'browser' row, add missing 'kanban' and 'video' toolset rows, drop the stale '38 tools' count for hermes-cli. - profile-commands.md: add missing install/update/info subcommands, document fish completion. - environment-variables.md: dedupe GMI_API_KEY/GMI_BASE_URL rows (kept the one with the correct gmi-serving.com default). - faq.md: Anthropic/Google/OpenAI examples — direct providers exist (not just via OpenRouter), refresh the OpenAI model list. getting-started/ - installation.md: PortableGit (not MinGit) is what the Windows installer fetches; document the 32-bit MinGit fallback. - installation.md / termux.md: installer prefers .[termux-all] then falls back to .[termux]. - nix-setup.md: Python 3.12 (not 3.11), Node.js 22 (not 20); fix invalid 'nix flake update --flake' invocation. - updating.md: 'hermes backup restore --state pre-update' doesn't exist — point at the snapshot/quick-snapshot flow; correct config key 'updates.pre_update_backup' (was 'update.backup'). user-guide/ - configuration.md: api_max_retries default 3 (not 2); display.runtime_footer is the real key (not display.runtime_metadata_footer); checkpoints defaults enabled=false / max_snapshots=20 (not true / 50). - configuring-models.md: 'hermes model list' / 'hermes model set ...' don't exist — hermes model is interactive only. - tui.md: busy_indicator -> tui_status_indicator with values kaomoji|emoji|unicode|ascii (not kawaii|minimal|dots|wings|none). - security.md: SSH backend keys (TERMINAL_SSH_HOST/USER/KEY) live in .env, not config.yaml. - windows-wsl-quickstart.md: there is no 'hermes api' subcommand — the OpenAI-compatible API server runs inside hermes gateway. user-guide/features/ - computer-use.md: approvals.mode (not security.approval_level); fix broken ./browser-use.md link to ./browser.md. - fallback-providers.md: top-level fallback_providers (not model.fallback_providers); the picker is subcommand-based, not modal. - api-server.md: API_SERVER_* are env vars — write to per-profile .env, not 'hermes config set' which targets YAML. - web-search.md: drop web_crawl as a registered tool (it isn't); deep-crawl modes are exposed through web_extract. - kanban.md: failure_limit default is 2, not '~5'. - plugins.md: drop hard-coded '33 providers' count. - honcho.md: fix unclosed quote in echo HONCHO_API_KEY snippet; document that 'hermes honcho' subcommand is gated on memory.provider=honcho; reconcile subcommand list with actual --help output. - memory-providers.md: legacy 'hermes honcho setup' redirect documented. Verified via 'npm run build' — site builds cleanly; broken-link count went from 149 to 146 (no regressions, fixed a few in passing). * docs: round 2 audit fixes + regenerate skill catalogs Follow-up to the previous commit on this branch: Round 2 manual fixes: - quickstart.md: KIMI_CODING_API_KEY mentioned alongside KIMI_API_KEY; voice-mode and ACP install commands rewritten — bare 'pip install ...' doesn't work for curl-installed setups (no pip on PATH, not in repo dir); replaced with 'cd ~/.hermes/hermes-agent && uv pip install -e ".[voice]"'. ACP already ships in [all] so the curl install includes it. - cli.md / configuration.md: 'auxiliary.compression.model' shown as 'google/gemini-3-flash-preview' (the doc's own claimed default); actual default is empty (= use main model). Reworded as 'leave empty (default) or pin a cheap model'. - built-in-plugins.md: added the bundled 'kanban/dashboard' plugin row that was missing from the table. Regenerated skill catalogs: - ran website/scripts/generate-skill-docs.py to refresh all 163 per-skill pages and both reference catalogs (skills-catalog.md, optional-skills-catalog.md). This adds the entries that were genuinely missing — productivity/teams-meeting-pipeline (bundled), optional/finance/* (entire category — 7 skills: 3-statement-model, comps-analysis, dcf-model, excel-author, lbo-model, merger-model, pptx-author), creative/hyperframes, creative/kanban-video-orchestrator, devops/watchers, productivity/shop-app, research/searxng-search, apple/macos-computer-use — and rewrites every other per-skill page from the current SKILL.md. Most diffs are tiny (one line of refreshed metadata). Validation: - 'npm run build' succeeded. - Broken-link count moved 146 -> 155 — the +9 are zh-Hans translation shells that lag every newly-added skill page (pre-existing pattern). No regressions on any en/ page. * feat(transports/codex): pass reasoning.effort to xAI Responses API The is_xai_responses branch only sent include=[reasoning.encrypted_content] without forwarding the resolved reasoning_effort. Other Responses providers (OpenAI, GitHub) already get effort forwarded — this aligns the xAI path. Without this, agent.reasoning_effort is silently dropped on the xAI direct path, making Hermes unable to control reasoning depth on grok-4.x via api.x.ai. Tests added to TestCodexBuildKwargs cover effort passthrough, disabled state, and minimal-clamp parity with non-xAI. * perf(models_dev): cache-first lookup, skip network when disk cache is fresh (#22808) `fetch_models_dev()` is on the hot path of every `AIAgent.__init__` (via `context_compressor → get_model_context_length`). The previous policy was "always try network first, only fall back to disk if network fails," so every fresh `hermes chat` / `hermes gateway` / batch / cron process paid 250-500 ms re-fetching a 2 MB JSON registry that was already on disk from earlier runs. Add a stage 2 between in-mem and network: if `models_dev_cache.json` exists and its mtime is younger than the existing `_MODELS_DEV_CACHE_TTL` (1 hour, same TTL the in-mem cache already uses), load from disk and skip the network call. The in-mem TTL is anchored to the disk file's age, so a 50-min-old cache stays in-memory for only 10 more minutes — no surprise extension of staleness window. Invariants preserved: - `force_refresh=True` still always hits the network and only falls back to disk on failure (`hermes config refresh` semantics). - Missing disk cache → fall through to network (first-ever run). - Stale disk cache (mtime > TTL) → fall through to network. - Negative file age (clock skew) → fall through to network. - Network failure → existing stage-4 stale-disk fallback unchanged. Measured impact (3-run medians, 9950X3D, fresh process per run): fetch_models_dev cold: 256 → 17 ms (-93%) hermes chat -q wall: 4.00 → 3.73 s (-7% median) 3.99 → 3.60 s (-10% min) The chat-end-to-end win is bounded below by API latency variance, but the fetch_models_dev microbenchmark is the cleanest signal: 239 ms shaved off every fresh-process agent construction. Win compounds with the previous perf PRs: #22681 google_chat lazy-load #22766 doctor parallel + IMDS off #22790 gateway.platforms PEP 562 Tests: all 30 `tests/agent/test_models_dev.py` pass (added 4 new ones covering the new disk-cache-first path, force_refresh override, stale disk fallback, and missing-disk-cache fall-through). Full `tests/agent/` suite: 2560 passed, 0 failed. * fix(auxiliary): rotate pooled auth after quota failures * chore: add Qwinty to AUTHOR_MAP * fix(browser_tool): do not cache transient None cloud provider resolution Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True` before resolution. If credentials were briefly unavailable on the first call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned the entire process to local mode forever, even after credentials self-healed seconds later. Root cause: bookkeeping was set up-front, so any code path that fell through to `return _cached_cloud_provider` (config read failure, no credentials yet, explicit-provider instantiation failure) committed the transient `None` to the cache permanently. Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now set only when (a) the user explicitly chose `cloud_provider: local`, or (b) a provider was successfully resolved. All transient `None` paths return without poisoning the cache, so the next call retries. Explicit provider instantiation failures now log at warning level with stack trace so operators can diagnose them. Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py covering explicit local, successful resolution, no-credentials-yet, config read failure, and explicit provider instantiation failure. Stash-verify confirmed the 3 transient-None tests fail without the fix. All 320 existing browser tests still green. Closes #22324 * fix(browser_tool): fall through to autodetect on config read failure * fix(email): send IMAP ID extension to support 163/NetEase mailbox 163/NetEase IMAP servers reject every UID SEARCH/FETCH with `BYE Unsafe Login` unless the client first identifies itself via the RFC 2971 ID command after LOGIN. Without this, the email gateway logs in OK but then fails on the very first poll and the connection is torn down. Send the ID payload best-effort after both `imap.login()` sites (`EmailAdapter.connect` and `_fetch_new_messages`). Failures are swallowed at debug level so non-supporting IMAP servers (Gmail, Outlook, Fastmail, Yahoo, etc.) keep working unchanged. Closes #22271 * fix(email): use real hermes version in IMAP ID command * fix(deps): declare youtube-transcript-api in pyproject.toml [youtube] extra skills/media/youtube-content/scripts/fetch_transcript.py and optional-skills/productivity/memento-flashcards/scripts/youtube_quiz.py both import youtube-transcript-api at runtime, but the package was not listed in pyproject.toml. A fresh `uv sync` therefore omits it, and both skills fail on first invocation with: ModuleNotFoundError: No module named 'youtube_transcript_api' Add a new [youtube] optional-dependency group with youtube-transcript-api>=1.2.0 (the v1.x API surface the scripts already use) and include it in [all] so standard installs pick it up. Regression tests: TestPyprojectDeclaresYoutubeExtra verifies the extra is present in pyproject.toml and included in [all]. Closes #22243 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(agent): extract thinking from content-list blocks for DeepSeek V4 Pro DeepSeek V4 Pro returns thinking content as typed blocks inside the content array rather than as a top-level reasoning_content field: [{"type": "thinking", "thinking": "..."}, {"type": "output", ...}] _extract_reasoning only handled content as a plain string, so the thinking text was silently dropped. On the next turn the session was replayed without the thinking block, causing: HTTP 400: The content[].thinking in the thinking mode must be passed back to the API. Fix: when content is a list and no structured reasoning field was found, scan for items with type=='thinking' and accumulate their 'thinking' (or 'text') value into reasoning_parts. Structured fields (reasoning, reasoning_content, reasoning_details) still take priority so existing provider behaviour is unchanged. Closes #21944 * fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open ALTER TABLE calls inside _migrate_add_optional_columns were guarded by a snapshot of PRAGMA table_info taken at function entry. When the gateway dispatcher opens the kanban DB twice per tick (once in _tick_once_for_board and once via init_db's discard-and-reconnect path), a second connection can run the same migration before the first one commits, causing: sqlite3.OperationalError: duplicate column name: consecutive_failures This crashed the dispatcher on every first tick after a gateway restart (subsequent ticks succeeded because the columns were then present). Fix: introduce _add_column_if_missing() which wraps ALTER TABLE in a try/except that swallows OperationalError whose message contains 'duplicate column name'. All ALTER TABLE calls in _migrate_add_optional_columns are routed through this helper. Closes #21708 * fix(doctor): skip pluggable provider profiles when a dedicated check exists (#22346) Problem ------- `hermes doctor` ran two health checks for Anthropic: a dedicated one with the correct `x-api-key` + `anthropic-version` headers, and a generic Bearer-auth one driven by the pluggable `ProviderProfile` for "anthropic". The generic check called `https://api.anthropic.com/v1/models` with `Authorization: Bearer ...`, which Anthropic answers with HTTP 404, producing a noisy duplicate warning even when the dedicated check passed. Root cause ---------- `hermes_cli/doctor.py:_build_apikey_providers_list` deduplicated profiles against a `_known_canonical` set built from the static list (Z.AI/GLM, Kimi, DeepSeek, …). Providers with their own dedicated check above the generic loop (Anthropic, OpenRouter, Bedrock) were not in that set, so their profiles were appended and ran a second, broken check. Fix --- Add `{"anthropic", "openrouter", "bedrock"}` to the skip set, and also skip profiles whose aliases match any of those names (e.g. `claude`, `claude-oauth` → anthropic). Tests ----- tests/hermes_cli/test_doctor_dedicated_provider_skip.py: - test_build_apikey_providers_list_skips_dedicated_check_providers: asserts the assembled list does not contain anthropic, openrouter, or bedrock entries. - test_build_apikey_providers_list_includes_non_dedicated_providers: sanity guard that legitimate providers (DeepSeek, Z.AI/GLM) survive. Both confirmed via stash-verify (fail pre-fix with anthropic/openrouter leaking, pass post-fix). Fixes #22346 * fix(doctor): normalize provider name and aliases before dedicated-skip check * fix(completion): use valid zsh _arguments exclusion-group syntax The generated zsh completion script used `(-h --help)` as the exclusion group for `_arguments`, which zsh rejects with: _arguments:comparguments: invalid argument: (-h --help){-h,--help}[...] Exclusion groups in `_arguments` cannot contain long options. Use the canonical `(-)` form (exclude all other options) which correctly handles flag pairs like `-h`/`--help`. Fixes NousResearch/hermes-agent#22686 * fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805) Two co-located fixes: 1. agent/model_metadata.py: bump hy3-preview static fallback from 256000 to 262144 (256 * 1024) to match OpenRouter live metadata so cache and offline both agree (issue #22268). 2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the exact-value change-detector (assert ctx == 256000) with an invariant assertion (registered + >= 4096). Per AGENTS.md 'Don't write change-detector tests': pinning the upstream-controlled context length is exactly the test class the rule forbids — it breaks every time the provider bumps the published value, with zero behavioral coverage gained. Salvage of #22574 with a redirect on the test approach. The contributor's diff bumped the integer and added a SECOND change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144, which would re-break on the next published bump. We instead delete the change-detector entirely and assert the relationship. Closes #22268. * fix(delegate): add explicit do-not-use guidance to acp_command/acp_args schema (carve-out of #22680) acp_command / acp_args descriptions previously primed the model to populate them — "Per-task ACP command override (e.g. 'copilot')" — even when no ACP CLI was installed. Models with weaker schema-following discipline would set them and the spawn would fail. Add explicit "Do NOT set unless the user has explicitly told you" guidance at both the top-level acp_command and the per-task override. Strengthen acp_args to mention it's empty unless acp_command is set. Adds 2 tests pinning the descriptions. Note: this is a cosmetic prompt-engineering fix — the params remain exposed in the schema. The fully-correct fix is to gate them behind a config flag or runtime ACP-CLI detection so the schema only emits them when an ACP harness is available. Tracked as a follow-up; this PR ships the low-cost stopgap. Salvage of #22680 (delegate schema only). The original PR also bundled unrelated fixes for #22548, #21944, #22150 — those need separate PRs since #22548 and #21944 are already addressed on main (#22780 + #22798 in flight) and #22150 deserves its own review. Closes #22013. * feat(gateway): add Telegram notification mode to suppress intermediate push notifications Add a configurable notifications mode for the Telegram platform adapter that controls which messages trigger push notifications. - display.platforms.telegram.notifications: "all" (default) | "important" - HERMES_TELEGRAM_NOTIFICATIONS env var override - In "important" mode, all sends use disable_notification=True except: - Approvals (send_exec_approval) and slash confirmations - Final response messages (metadata["notify"]=True) - Zero overhead in default "all" mode - Zero impact on non-Telegram platforms Closes #22771 * chore: add CalmProton to AUTHOR_MAP * fix(telegram): default notifications to 'important' (silence intermediate) Per-tool-call push notifications on Telegram are noisy enough that 'all' is the wrong default — long agent runs spam the user's notification shade with status messages they didn't ask to be pinged about. Final responses, approval prompts, and slash confirmations still notify; intermediate progress, streaming, and tool-progress messages now deliver silently via disable_notification. Users who want the legacy behavior can opt back in with: display: platforms: telegram: notifications: all or HERMES_TELEGRAM_NOTIFICATIONS=all. * fix(gateway): adopt unit's HERMES_HOME for --system CLI ops When systemd_restart / systemd_status / systemd_stop run under sudo, HERMES_HOME is stripped and HOME=/root, so get_hermes_home() resolves to /root/.hermes instead of the unit's pinned home. read_runtime_status and get_running_pid then look at the wrong gateway_state.json — the 60s status poll never sees "running", times out, and forces another systemctl restart that SIGTERMs the in-progress new gateway. Read the unit's pinned HERMES_HOME from `systemctl show -p Environment` and mirror it into os.environ before any HERMES_HOME-derived read. Early-out when system=False (user-scope inherits naturally). Errors swallowed so a transient systemctl failure doesn't break unrelated CLI ops. Closes #22035. * chore: add mbac to AUTHOR_MAP * fix(openrouter): add x-grok-conv-id header for Grok models to improve prompt cache hit rates (carve-out of #22708) Pass session_id through to provider profile build_api_kwargs_extras so the OpenRouter profile can attach an xAI cache-affinity header (x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt cache requires server affinity via this header — without it the cache is poisoned and Grok prompt-cache hit rates drop dramatically on multi-turn sessions. Carve-out of #22708 by Ninso112. The original PR bundled a /diff slash command, a zsh completion fix (already on main via #22802), and holographic memory null-guards. This salvage keeps just the Grok header work — small, targeted, and well-tested. Other contributors and changes preserved for separate review. Closes #22705. * chore: add Ninso112 to AUTHOR_MAP * fix(gateway): preserve Ctrl+C for Windows foreground runs * fix: make session search initialize session db * perf(teams): defer httpx import to first webhook call (#22831) Same pattern as the google_chat lazy-load (PR #22681), applied to the Teams plugin. The bundled `plugins/platforms/teams/adapter.py` did `import httpx` at module top, which dragged the entire httpx + httpcore stack into every process that triggered plugin discovery — including `hermes` invocations that never instantiate the Teams adapter. `httpx` is only needed inside one method (`TeamsMeetingPipeline._write_summary_via_incoming_webhook`), and the `httpx.AsyncBaseTransport` parameter annotation is already string-only thanks to the existing `from __future__ import annotations`. Move the runtime import inside the method. Measured impact (7-run medians, 9950X3D): teams plugin alone: 118 → 89 ms (-25%) 46 → 38 MB (-17%) import cli (full): unchanged import model_tools: unchanged The full-CLI numbers are flat because httpx is loaded transitively from many other modules on that path. The microbench win is the real signal: 29 ms / 8 MB shaved off any process that touches the teams plugin without otherwise pulling httpx — primarily future workflows where the gateway is enabled but Teams is not configured. Tests: 44/44 `tests/gateway/test_teams.py` pass; 345 across all plugin-platform suites (teams + qqbot + google_chat). The test file imports `httpx` itself for the `MockTransport` fixture, which is correct — tests legitimately use httpx, only the plugin's module-level import was the issue. * fix(acp): honor task cwd for foreground terminal commands * feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838) Pick openrouter/pareto-code as your model and OpenRouter auto-routes each request to the cheapest model meeting your coding-quality bar (ranked by Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0, default 0.65) tunes the floor. - hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so it shows up in the picker with a description - hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands on a mid-tier coder on the current Pareto frontier) - plugins/model-providers/openrouter: emit extra_body.plugins = [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code AND the score is a valid float in [0.0, 1.0] - agent/transports/chat_completions.py: same emission on the legacy flag path (when no provider profile is loaded) - run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into both build_kwargs() invocations and the context-summary extra_body path - cli.py: read openrouter.min_coding_score once at init, validate float in [0,1], pass to AIAgent constructions (CLI + background-task paths) - cron/scheduler.py, batch_runner.py, tools/delegate_tool.py, tui_gateway/server.py: propagate the kwarg (mirrors providers_order plumbing — subagents inherit, cron/batch read from config) - tests: profile-level + transport-level coverage of the model gating, unset/empty/out-of-range handling, and the legacy flag path - docs: new 'OpenRouter Pareto Code Router' section in providers.md Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a mid-tier coder, at omission we get the strongest. Score is silently dropped on any model other than openrouter/pareto-code, so it's safe to leave set. * fix(gateway): preserve reasoning_content, codex_message_items, finish_reason on transcript replay (#22839) PR #2974 whitelisted three reasoning fields (reasoning, reasoning_details, codex_reasoning_items) for the gateway's simple-text replay branch. Three more fields were added to the DB later but the whitelist was never updated: - reasoning_content: provider-facing thinking text. _copy_reasoning_content_for_api promotes 'reasoning' -> 'reasoning_content' at send time only when the strings happen to match. Carrying the original verbatim avoids loss for providers that return them as distinct fields (DeepSeek/Kimi/ Moonshot thinking modes), and preserves the empty-string sentinel that DeepSeek V4 Pro requires for thinking-mode replay. - codex_message_items: exact assistant message items with 'phase'. OpenAI docs: 'preserve and resend phase on all assistant messages — dropping it can degrade performance.' Required for prefix cache hits. No recovery path exists — once dropped, gone. - finish_reason: informational; cheap to keep so transcripts replay identically across CLI and gateway. The CLI is unaffected because cli.py keeps the live in-memory message list across turns (cli.py:10046 'self.conversation_history = result["messages"]'). The gateway rebuilds agent_history from the SQLite transcript on every turn, so any field stripped during replay is silently lost. Refactors the inline whitelist into a module-level _build_replay_entry() helper so the contract can be unit-tested. 16 new tests pin the field set and falsy-value handling. Verified end-to-end: DB stores all 8 fields, replay now preserves all 8 (was preserving only 5 for assistant text turns). * docs(openrouter): document auxiliary.<task>.extra_body for OR routing and Pareto (#22844) The plumbing for setting OpenRouter provider preferences and the Pareto Code router on auxiliary tasks already exists — auxiliary.<task>.extra_body is forwarded verbatim by call_llm() / async_call_llm(). It just wasn't documented, so users who wanted (e.g.) Pareto Code routing for compression but the strongest coder for the main agent had no way to discover the escape hatch. - hermes_cli/config.py: expand the auxiliary section header with a YAML example showing provider routing plus plugins under extra_body, and an explicit note that main-agent provider_routing / openrouter.min_coding_score do NOT propagate to aux calls (each task is independent by design) - website/docs/user-guide/configuration.md: new 'OpenRouter routing and Pareto Code for auxiliary tasks' subsection with worked example - website/docs/integrations/providers.md: cross-link from the Pareto Code Router section to the aux-side doc E2E verified that auxiliary.<task>.extra_body reaches the OpenRouter API with the configured provider routing and plugins blocks intact. * docs: round 2 audit — messaging, developer-guide, guides, integrations (#22858) Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/, guides/, and integrations/ against the live registries and gateway code. messaging/ - index.md: API Server toolset is hermes-api-server (was 'hermes (default)'); Google Chat slug is hermes-google_chat (underscore — plugin name uses _). - google_chat.md: drop bogus 'pip install hermes-agent[google…

NousResearch#22858) Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/, guides/, and integrations/ against the live registries and gateway code. messaging/ - index.md: API Server toolset is hermes-api-server (was 'hermes (default)'); Google Chat slug is hermes-google_chat (underscore — plugin name uses _). - google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such extra); list the actual deps (google-cloud-pubsub, google-api-python-client, google-auth, google-auth-oauthlib). - qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is silently ignored by the adapter); QQ_STT_BASE_URL is not read directly — baseUrl lives under platforms.qqbot.extra.stt. - teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline plugin must be enabled), not a built-in subcommand. - sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default SMS_WEBHOOK_HOST). - open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to per-profile .env, not 'hermes config set' (same pattern fixed in api-server.md last round). Also bumped example ports to 8650+ to dodge the default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646) collision. developer-guide/ - architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py, gateway/run.py replaced with 'large file' to stop drifting. - agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)'). - gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway platform tree updated (qqbot is a sub-package, not qqbot.py; added yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/ (always active)' was wrong — it's an empty extension point and _register_builtin_hooks() is a no-op stub. - acp-internals.md: drop fictional 'message_callback' from the bridged- callbacks list; clarify thinking_callback is currently set to None. - provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry, NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama Cloud, LM Studio, Tencent TokenHub. Fallback section described only the legacy single-pair model — corrected to the canonical list-form fallback_providers chain. - environments.md: parsers list missing llama4_json and the deepseek_v31 alias; both register via @register_parser. - browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py which doesn't exist in-repo. - contributing.md: tinker-atropos is a git submodule — note that 'git submodule update --init' is required if cloning without --recurse-submodules. guides/ - operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is positional (not --schedule), the script-only flag is --no-agent (not --script-only), and there's no --command flag. Replaced with a real example that creates the script under ~/.hermes/scripts/ and uses the actual flags. Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'. - automation-templates.md: 'cron create --skills "a,b"' doesn't work — the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST rewrite. - minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently fails because --region isn't registered on the auth-add argparse spec. Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for China-region access. - cron-script-only.md: 'hermes send' is fictional — replaced the comparison- table mention with a webhook-subscription pointer; also fixed the dead link to /guides/pipe-script-output (page doesn't exist). - cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed at 'hermes gateway' (foreground) / 'hermes gateway start' (service). - local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right knob is the HERMES_API_TIMEOUT env var. - python-library.md: run_conversation() return dict has only final_response and messages — task_id is stored on the agent instance, not echoed back. - use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in one quoted string, so cmd.exe gets a single arg instead of the multi-token command line it needs. Removed the surrounding quotes — argparse nargs='*' collects each token correctly. integrations/ - providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist); actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 -> api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai) refreshed. Fallback section rewritten to lead with the canonical fallback_providers list form (was leading with the legacy fallback_model single dict); supported-providers list extended to include azure-foundry, alibaba-coding-plan, lmstudio. index.md - '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with integrations/index.md ('19+') and undercounted — bumped to 20+ and added Weixin/QQ Bot/Yuanbao/Google Chat to the list. Validation: 'npm run build' clean (exit 0); broken-link count unchanged at 155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.

* chore: add Qwinty to AUTHOR_MAP * fix(browser_tool): do not cache transient None cloud provider resolution Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True` before resolution. If credentials were briefly unavailable on the first call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned the entire process to local mode forever, even after credentials self-healed seconds later. Root cause: bookkeeping was set up-front, so any code path that fell through to `return _cached_cloud_provider` (config read failure, no credentials yet, explicit-provider instantiation failure) committed the transient `None` to the cache permanently. Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now set only when (a) the user explicitly chose `cloud_provider: local`, or (b) a provider was successfully resolved. All transient `None` paths return without poisoning the cache, so the next call retries. Explicit provider instantiation failures now log at warning level with stack trace so operators can diagnose them. Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py covering explicit local, successful resolution, no-credentials-yet, config read failure, and explicit provider instantiation failure. Stash-verify confirmed the 3 transient-None tests fail without the fix. All 320 existing browser tests still green. Closes #22324 * fix(browser_tool): fall through to autodetect on config read failure * fix(email): send IMAP ID extension to support 163/NetEase mailbox 163/NetEase IMAP servers reject every UID SEARCH/FETCH with `BYE Unsafe Login` unless the client first identifies itself via the RFC 2971 ID command after LOGIN. Without this, the email gateway logs in OK but then fails on the very first poll and the connection is torn down. Send the ID payload best-effort after both `imap.login()` sites (`EmailAdapter.connect` and `_fetch_new_messages`). Failures are swallowed at debug level so non-supporting IMAP servers (Gmail, Outlook, Fastmail, Yahoo, etc.) keep working unchanged. Closes #22271 * fix(email): use real hermes version in IMAP ID command * fix(deps): declare youtube-transcript-api in pyproject.toml [youtube] extra skills/media/youtube-content/scripts/fetch_transcript.py and optional-skills/productivity/memento-flashcards/scripts/youtube_quiz.py both import youtube-transcript-api at runtime, but the package was not listed in pyproject.toml. A fresh `uv sync` therefore omits it, and both skills fail on first invocation with: ModuleNotFoundError: No module named 'youtube_transcript_api' Add a new [youtube] optional-dependency group with youtube-transcript-api>=1.2.0 (the v1.x API surface the scripts already use) and include it in [all] so standard installs pick it up. Regression tests: TestPyprojectDeclaresYoutubeExtra verifies the extra is present in pyproject.toml and included in [all]. Closes #22243 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(agent): extract thinking from content-list blocks for DeepSeek V4 Pro DeepSeek V4 Pro returns thinking content as typed blocks inside the content array rather than as a top-level reasoning_content field: [{"type": "thinking", "thinking": "..."}, {"type": "output", ...}] _extract_reasoning only handled content as a plain string, so the thinking text was silently dropped. On the next turn the session was replayed without the thinking block, causing: HTTP 400: The content[].thinking in the thinking mode must be passed back to the API. Fix: when content is a list and no structured reasoning field was found, scan for items with type=='thinking' and accumulate their 'thinking' (or 'text') value into reasoning_parts. Structured fields (reasoning, reasoning_content, reasoning_details) still take priority so existing provider behaviour is unchanged. Closes #21944 * fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open ALTER TABLE calls inside _migrate_add_optional_columns were guarded by a snapshot of PRAGMA table_info taken at function entry. When the gateway dispatcher opens the kanban DB twice per tick (once in _tick_once_for_board and once via init_db's discard-and-reconnect path), a second connection can run the same migration before the first one commits, causing: sqlite3.OperationalError: duplicate column name: consecutive_failures This crashed the dispatcher on every first tick after a gateway restart (subsequent ticks succeeded because the columns were then present). Fix: introduce _add_column_if_missing() which wraps ALTER TABLE in a try/except that swallows OperationalError whose message contains 'duplicate column name'. All ALTER TABLE calls in _migrate_add_optional_columns are routed through this helper. Closes #21708 * fix(doctor): skip pluggable provider profiles when a dedicated check exists (#22346) Problem ------- `hermes doctor` ran two health checks for Anthropic: a dedicated one with the correct `x-api-key` + `anthropic-version` headers, and a generic Bearer-auth one driven by the pluggable `ProviderProfile` for "anthropic". The generic check called `https://api.anthropic.com/v1/models` with `Authorization: Bearer ...`, which Anthropic answers with HTTP 404, producing a noisy duplicate warning even when the dedicated check passed. Root cause ---------- `hermes_cli/doctor.py:_build_apikey_providers_list` deduplicated profiles against a `_known_canonical` set built from the static list (Z.AI/GLM, Kimi, DeepSeek, …). Providers with their own dedicated check above the generic loop (Anthropic, OpenRouter, Bedrock) were not in that set, so their profiles were appended and ran a second, broken check. Fix --- Add `{"anthropic", "openrouter", "bedrock"}` to the skip set, and also skip profiles whose aliases match any of those names (e.g. `claude`, `claude-oauth` → anthropic). Tests ----- tests/hermes_cli/test_doctor_dedicated_provider_skip.py: - test_build_apikey_providers_list_skips_dedicated_check_providers: asserts the assembled list does not contain anthropic, openrouter, or bedrock entries. - test_build_apikey_providers_list_includes_non_dedicated_providers: sanity guard that legitimate providers (DeepSeek, Z.AI/GLM) survive. Both confirmed via stash-verify (fail pre-fix with anthropic/openrouter leaking, pass post-fix). Fixes #22346 * fix(doctor): normalize provider name and aliases before dedicated-skip check * fix(completion): use valid zsh _arguments exclusion-group syntax The generated zsh completion script used `(-h --help)` as the exclusion group for `_arguments`, which zsh rejects with: _arguments:comparguments: invalid argument: (-h --help){-h,--help}[...] Exclusion groups in `_arguments` cannot contain long options. Use the canonical `(-)` form (exclude all other options) which correctly handles flag pairs like `-h`/`--help`. Fixes NousResearch/hermes-agent#22686 * fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805) Two co-located fixes: 1. agent/model_metadata.py: bump hy3-preview static fallback from 256000 to 262144 (256 * 1024) to match OpenRouter live metadata so cache and offline both agree (issue #22268). 2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the exact-value change-detector (assert ctx == 256000) with an invariant assertion (registered + >= 4096). Per AGENTS.md 'Don't write change-detector tests': pinning the upstream-controlled context length is exactly the test class the rule forbids — it breaks every time the provider bumps the published value, with zero behavioral coverage gained. Salvage of #22574 with a redirect on the test approach. The contributor's diff bumped the integer and added a SECOND change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144, which would re-break on the next published bump. We instead delete the change-detector entirely and assert the relationship. Closes #22268. * fix(delegate): add explicit do-not-use guidance to acp_command/acp_args schema (carve-out of #22680) acp_command / acp_args descriptions previously primed the model to populate them — "Per-task ACP command override (e.g. 'copilot')" — even when no ACP CLI was installed. Models with weaker schema-following discipline would set them and the spawn would fail. Add explicit "Do NOT set unless the user has explicitly told you" guidance at both the top-level acp_command and the per-task override. Strengthen acp_args to mention it's empty unless acp_command is set. Adds 2 tests pinning the descriptions. Note: this is a cosmetic prompt-engineering fix — the params remain exposed in the schema. The fully-correct fix is to gate them behind a config flag or runtime ACP-CLI detection so the schema only emits them when an ACP harness is available. Tracked as a follow-up; this PR ships the low-cost stopgap. Salvage of #22680 (delegate schema only). The original PR also bundled unrelated fixes for #22548, #21944, #22150 — those need separate PRs since #22548 and #21944 are already addressed on main (#22780 + #22798 in flight) and #22150 deserves its own review. Closes #22013. * feat(gateway): add Telegram notification mode to suppress intermediate push notifications Add a configurable notifications mode for the Telegram platform adapter that controls which messages trigger push notifications. - display.platforms.telegram.notifications: "all" (default) | "important" - HERMES_TELEGRAM_NOTIFICATIONS env var override - In "important" mode, all sends use disable_notification=True except: - Approvals (send_exec_approval) and slash confirmations - Final response messages (metadata["notify"]=True) - Zero overhead in default "all" mode - Zero impact on non-Telegram platforms Closes #22771 * chore: add CalmProton to AUTHOR_MAP * fix(telegram): default notifications to 'important' (silence intermediate) Per-tool-call push notifications on Telegram are noisy enough that 'all' is the wrong default — long agent runs spam the user's notification shade with status messages they didn't ask to be pinged about. Final responses, approval prompts, and slash confirmations still notify; intermediate progress, streaming, and tool-progress messages now deliver silently via disable_notification. Users who want the legacy behavior can opt back in with: display: platforms: telegram: notifications: all or HERMES_TELEGRAM_NOTIFICATIONS=all. * fix(gateway): adopt unit's HERMES_HOME for --system CLI ops When systemd_restart / systemd_status / systemd_stop run under sudo, HERMES_HOME is stripped and HOME=/root, so get_hermes_home() resolves to /root/.hermes instead of the unit's pinned home. read_runtime_status and get_running_pid then look at the wrong gateway_state.json — the 60s status poll never sees "running", times out, and forces another systemctl restart that SIGTERMs the in-progress new gateway. Read the unit's pinned HERMES_HOME from `systemctl show -p Environment` and mirror it into os.environ before any HERMES_HOME-derived read. Early-out when system=False (user-scope inherits naturally). Errors swallowed so a transient systemctl failure doesn't break unrelated CLI ops. Closes #22035. * chore: add mbac to AUTHOR_MAP * fix(openrouter): add x-grok-conv-id header for Grok models to improve prompt cache hit rates (carve-out of #22708) Pass session_id through to provider profile build_api_kwargs_extras so the OpenRouter profile can attach an xAI cache-affinity header (x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt cache requires server affinity via this header — without it the cache is poisoned and Grok prompt-cache hit rates drop dramatically on multi-turn sessions. Carve-out of #22708 by Ninso112. The original PR bundled a /diff slash command, a zsh completion fix (already on main via #22802), and holographic memory null-guards. This salvage keeps just the Grok header work — small, targeted, and well-tested. Other contributors and changes preserved for separate review. Closes #22705. * chore: add Ninso112 to AUTHOR_MAP * fix(gateway): preserve Ctrl+C for Windows foreground runs * fix: make session search initialize session db * perf(teams): defer httpx import to first webhook call (#22831) Same pattern as the google_chat lazy-load (PR #22681), applied to the Teams plugin. The bundled `plugins/platforms/teams/adapter.py` did `import httpx` at module top, which dragged the entire httpx + httpcore stack into every process that triggered plugin discovery — including `hermes` invocations that never instantiate the Teams adapter. `httpx` is only needed inside one method (`TeamsMeetingPipeline._write_summary_via_incoming_webhook`), and the `httpx.AsyncBaseTransport` parameter annotation is already string-only thanks to the existing `from __future__ import annotations`. Move the runtime import inside the method. Measured impact (7-run medians, 9950X3D): teams plugin alone: 118 → 89 ms (-25%) 46 → 38 MB (-17%) import cli (full): unchanged import model_tools: unchanged The full-CLI numbers are flat because httpx is loaded transitively from many other modules on that path. The microbench win is the real signal: 29 ms / 8 MB shaved off any process that touches the teams plugin without otherwise pulling httpx — primarily future workflows where the gateway is enabled but Teams is not configured. Tests: 44/44 `tests/gateway/test_teams.py` pass; 345 across all plugin-platform suites (teams + qqbot + google_chat). The test file imports `httpx` itself for the `MockTransport` fixture, which is correct — tests legitimately use httpx, only the plugin's module-level import was the issue. * fix(acp): honor task cwd for foreground terminal commands * feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838) Pick openrouter/pareto-code as your model and OpenRouter auto-routes each request to the cheapest model meeting your coding-quality bar (ranked by Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0, default 0.65) tunes the floor. - hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so it shows up in the picker with a description - hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands on a mid-tier coder on the current Pareto frontier) - plugins/model-providers/openrouter: emit extra_body.plugins = [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code AND the score is a valid float in [0.0, 1.0] - agent/transports/chat_completions.py: same emission on the legacy flag path (when no provider profile is loaded) - run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into both build_kwargs() invocations and the context-summary extra_body path - cli.py: read openrouter.min_coding_score once at init, validate float in [0,1], pass to AIAgent constructions (CLI + background-task paths) - cron/scheduler.py, batch_runner.py, tools/delegate_tool.py, tui_gateway/server.py: propagate the kwarg (mirrors providers_order plumbing — subagents inherit, cron/batch read from config) - tests: profile-level + transport-level coverage of the model gating, unset/empty/out-of-range handling, and the legacy flag path - docs: new 'OpenRouter Pareto Code Router' section in providers.md Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a mid-tier coder, at omission we get the strongest. Score is silently dropped on any model other than openrouter/pareto-code, so it's safe to leave set. * fix(gateway): preserve reasoning_content, codex_message_items, finish_reason on transcript replay (#22839) PR #2974 whitelisted three reasoning fields (reasoning, reasoning_details, codex_reasoning_items) for the gateway's simple-text replay branch. Three more fields were added to the DB later but the whitelist was never updated: - reasoning_content: provider-facing thinking text. _copy_reasoning_content_for_api promotes 'reasoning' -> 'reasoning_content' at send time only when the strings happen to match. Carrying the original verbatim avoids loss for providers that return them as distinct fields (DeepSeek/Kimi/ Moonshot thinking modes), and preserves the empty-string sentinel that DeepSeek V4 Pro requires for thinking-mode replay. - codex_message_items: exact assistant message items with 'phase'. OpenAI docs: 'preserve and resend phase on all assistant messages — dropping it can degrade performance.' Required for prefix cache hits. No recovery path exists — once dropped, gone. - finish_reason: informational; cheap to keep so transcripts replay identically across CLI and gateway. The CLI is unaffected because cli.py keeps the live in-memory message list across turns (cli.py:10046 'self.conversation_history = result["messages"]'). The gateway rebuilds agent_history from the SQLite transcript on every turn, so any field stripped during replay is silently lost. Refactors the inline whitelist into a module-level _build_replay_entry() helper so the contract can be unit-tested. 16 new tests pin the field set and falsy-value handling. Verified end-to-end: DB stores all 8 fields, replay now preserves all 8 (was preserving only 5 for assistant text turns). * docs(openrouter): document auxiliary.<task>.extra_body for OR routing and Pareto (#22844) The plumbing for setting OpenRouter provider preferences and the Pareto Code router on auxiliary tasks already exists — auxiliary.<task>.extra_body is forwarded verbatim by call_llm() / async_call_llm(). It just wasn't documented, so users who wanted (e.g.) Pareto Code routing for compression but the strongest coder for the main agent had no way to discover the escape hatch. - hermes_cli/config.py: expand the auxiliary section header with a YAML example showing provider routing plus plugins under extra_body, and an explicit note that main-agent provider_routing / openrouter.min_coding_score do NOT propagate to aux calls (each task is independent by design) - website/docs/user-guide/configuration.md: new 'OpenRouter routing and Pareto Code for auxiliary tasks' subsection with worked example - website/docs/integrations/providers.md: cross-link from the Pareto Code Router section to the aux-side doc E2E verified that auxiliary.<task>.extra_body reaches the OpenRouter API with the configured provider routing and plugins blocks intact. * docs: round 2 audit — messaging, developer-guide, guides, integrations (#22858) Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/, guides/, and integrations/ against the live registries and gateway code. messaging/ - index.md: API Server toolset is hermes-api-server (was 'hermes (default)'); Google Chat slug is hermes-google_chat (underscore — plugin name uses _). - google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such extra); list the actual deps (google-cloud-pubsub, google-api-python-client, google-auth, google-auth-oauthlib). - qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is silently ignored by the adapter); QQ_STT_BASE_URL is not read directly — baseUrl lives under platforms.qqbot.extra.stt. - teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline plugin must be enabled), not a built-in subcommand. - sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default SMS_WEBHOOK_HOST). - open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to per-profile .env, not 'hermes config set' (same pattern fixed in api-server.md last round). Also bumped example ports to 8650+ to dodge the default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646) collision. developer-guide/ - architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py, gateway/run.py replaced with 'large file' to stop drifting. - agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)'). - gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway platform tree updated (qqbot is a sub-package, not qqbot.py; added yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/ (always active)' was wrong — it's an empty extension point and _register_builtin_hooks() is a no-op stub. - acp-internals.md: drop fictional 'message_callback' from the bridged- callbacks list; clarify thinking_callback is currently set to None. - provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry, NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama Cloud, LM Studio, Tencent TokenHub. Fallback section described only the legacy single-pair model — corrected to the canonical list-form fallback_providers chain. - environments.md: parsers list missing llama4_json and the deepseek_v31 alias; both register via @register_parser. - browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py which doesn't exist in-repo. - contributing.md: tinker-atropos is a git submodule — note that 'git submodule update --init' is required if cloning without --recurse-submodules. guides/ - operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is positional (not --schedule), the script-only flag is --no-agent (not --script-only), and there's no --command flag. Replaced with a real example that creates the script under ~/.hermes/scripts/ and uses the actual flags. Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'. - automation-templates.md: 'cron create --skills "a,b"' doesn't work — the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST rewrite. - minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently fails because --region isn't registered on the auth-add argparse spec. Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for China-region access. - cron-script-only.md: 'hermes send' is fictional — replaced the comparison- table mention with a webhook-subscription pointer; also fixed the dead link to /guides/pipe-script-output (page doesn't exist). - cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed at 'hermes gateway' (foreground) / 'hermes gateway start' (service). - local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right knob is the HERMES_API_TIMEOUT env var. - python-library.md: run_conversation() return dict has only final_response and messages — task_id is stored on the agent instance, not echoed back. - use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in one quoted string, so cmd.exe gets a single arg instead of the multi-token command line it needs. Removed the surrounding quotes — argparse nargs='*' collects each token correctly. integrations/ - providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist); actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 -> api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai) refreshed. Fallback section rewritten to lead with the canonical fallback_providers list form (was leading with the legacy fallback_model single dict); supported-providers list extended to include azure-foundry, alibaba-coding-plan, lmstudio. index.md - '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with integrations/index.md ('19+') and undercounted — bumped to 20+ and added Weixin/QQ Bot/Yuanbao/Google Chat to the list. Validation: 'npm run build' clean (exit 0); broken-link count unchanged at 155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89. * perf(image_gen): defer fal_client import to first generation request (#22859) `tools/image_generation_tool.py` did `import fal_client` at module top, which pulled the entire fal_client + httpx + rich stack on every process that ran `discover_builtin_tools()` — every `hermes` cold start, even ones that never touch image generation. Make the import lazy: replace the eager import with a placeholder (`fal_client: Any = None`) and add an idempotent `_load_fal_client()` that rebinds the module global on first use. Call it from the two runtime entry points (`_ManagedFalSyncClient.__init__` and `_submit_fal_request`) and from the SDK-presence check in `check_image_generation_requirements`. The loader short-circuits if the global is already truthy, which preserves the test pattern of monkeypatching `fal_client` to install a mock — the `monkeypatch.setattr(image_tool, "fal_client", ...)` calls in test_image_generation.py keep working unchanged. Measured impact (15-run min times, 9950X3D): tools.image_generation_tool alone: 77 → 20 ms (-74%) 36 → 20 MB (-44%) import cli (full): 734 → 720 ms (-2%) import model_tools: 372 → 366 ms (-2%) The microbench is dramatic but the full-CLI win is small — fal_client shares its httpx + rich dependencies with the rest of the agent, so on a real cold start most of the 16 MB / 64 ms is already paid by other imports. The win matters mostly for processes that touch this tool without otherwise loading httpx (rare) and for architectural consistency with the previous lazy-load PRs (#22681 google_chat, #22831 teams). Tests: 55/55 `tests/tools/test_image_generation.py` pass, including the cases that monkeypatch the module global to install a mock fal_client. End-to-end verification confirms `import model_tools` no longer pulls `fal_client` into `sys.modules`. * fix(gateway): finalize final stream edit on done * chore: add kidonng to AUTHOR_MAP * fix(skills-hub): cover remaining SSRF fetch paths after #10029 * fix(context_compressor): treat streaming premature-close as transient error Problem: When a provider or proxy drops a streaming response mid-flight (httpcore raises RemoteProtocolError: "incomplete chunked read", "peer closed connection", "response ended prematurely", etc.), _generate_summary would not classify it as a transient error. Instead of retrying on the main model, it entered the generic 60-second cooldown, leaving context growing unbounded until the cooldown expired. Issue #18458. Root cause: _is_connection_error in auxiliary_client.py did not match httpcore's streaming premature-close error substrings. context_compressor.py's _generate_summary except block never called _is_connection_error, so those errors fell through to the 60-second generic cooldown rather than triggering the retry-on-main fallback path used for timeouts. Fix: 1. auxiliary_client.py — extend _is_connection_error keyword list with: "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror". Also guard the `from openai import ...` with try/except ImportError so the function works in environments without the openai package. 2. context_compressor.py — import _is_connection_error and call it in _generate_summary's except block as _is_streaming_closed. Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode) and use the shorter 30s transient cooldown for streaming-closed errors. Tests: 4 new regression tests in TestStreamingClosedFallback: - test_incomplete_chunked_read_falls_back_to_main - test_peer_closed_connection_falls_back_to_main - test_streaming_closed_on_main_uses_short_cooldown (stash-verified) - test_non_streaming_unknown_error_still_uses_long_cooldown Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(session): route OR-combined short CJK tokens to LIKE fallback (#20494) The FTS5 trigram tokenizer requires >=3 CJK characters per individual token to produce matchable trigrams. A query like "广西 OR 桂林 OR 漓江" has cjk_count=6 (passes the existing >=3 guard) but each token is only 2 CJK chars, so the trigram index returns 0 results. Fix: - Add per-token check: if any non-operator CJK token has <3 CJK chars, force the LIKE fallback path regardless of total cjk_count. - Expand the LIKE fallback to build one LIKE condition per non-operator token joined with OR, so each term is matched independently. Regression tests added in TestCJKSearchFallback: - test_cjk_or_combined_short_tokens_returns_results - test_cjk_short_token_or_query_preserves_filters * fix(checkpoint): guard _touch_project against non-dict project metadata Problem ======= `tools.checkpoint_manager._touch_project` reads the project metadata file with `json.loads(meta_path.read_text(...))`, then immediately does: meta["workdir"] = str(_normalize_path(working_dir)) The `except` block only catches `(OSError, ValueError)`. When the file parses successfully but returns a non-dict value (a list `[]`, `null`, or a scalar from a corrupted or hand-truncated write), `json.loads` succeeds without error and `meta` is set to, e.g., `[]`. The subsequent subscript assignment then raises `TypeError: list indices must be integers or slices, not str`, which is NOT caught by the narrow except clause. This TypeError propagates up through `_take` to `ensure_checkpoint`, where the broad `except Exception` safety net swallows it. The effect is that `ensure_checkpoint` silently returns False for the entire session — all checkpoints are skipped for the affected working directory without any user-visible error. Root cause ========== Missing `isinstance(meta, dict)` guard after `json.loads`, identical in pattern to bugs fixed in `cron/jobs.py` (#22569) and `tools/process_registry.py` (#22544). The same guard is already present one function below in `_list_projects` (line 506), but was inadvertently omitted in `_touch_project`. Fix === Add two lines after the try/except: ```python if not isinstance(meta, dict): meta = {} ``` This matches the existing guard in `_list_projects` and ensures a fresh empty dict is used whenever the persisted value is not a mapping — preserving the `created_at` semantics via `setdefault` on the next line. Tests ===== `TestTouchProjectMalformedMeta` covers four non-dict root values (`[]`, `null`, `42`, `"oops"`). Each writes a corrupted metadata file, calls `_touch_project`, and asserts: (a) no exception raised, (b) the metadata file is rewritten as a valid dict containing `last_touch` and `workdir`. All four fail on main with `TypeError`, pass with fix. Full `tests/tools/test_checkpoint_manager.py` regression: 77 passed. * fix(update): prebuild psutil on Termux Android via Linux path shim * fix(update): use termux-all uv fallback path on Termux * fix(install): also patch psutil on Termux fresh-install path The Termux update path (PR #22814) prebuilds psutil from a marker-patched sdist so 'platform android is not supported' doesn't kill it. The same psutil setup.py error blocks fresh installs via scripts/install.sh — only the update path was wired up. Without this, a brand-new Termux user can't get past the very first 'pip install -e .[termux-all]' call. - New scripts/install_psutil_android.py — standalone version of the same patcher hermes_cli/main.py uses, callable from bash. - scripts/install.sh detects sys.platform == 'android' and runs the patcher before pip install. - TODO note added to both copies pointing at upstream https://github.com/giampaolo/psutil/pull/2762; remove both when that ships. Note: we keep psutil as a base dep on Android (do not adopt the proposed sys_platform != 'android' marker in pyproject). Removing it would crash five unguarded 'import psutil' sites at runtime (tools/code_execution_tool.py, tools/tts_tool.py, tools/process_registry.py (2x), gateway/platforms/whatsapp.py). * fix(process_registry): kill orphaned Popen on post-spawn setup failure After Popen succeeds with os.setsid (detached process group), 5 things happen with no try/except: Thread construction, reader.start(), lock acquisition, prune+register, checkpoint write. If any raises, the Popen object goes unregistered and the detached process group leaks indefinitely. Wrap the post-spawn setup in try/except. On failure: - os.killpg(getpgid(pid), SIGKILL) takes down the entire process group (not just the shell - important because of detached PG + -lic shell wrapper that may have spawned children) - proc.kill() fallback for ProcessLookupError/PermissionError/OSError - proc.wait(timeout=5) reaps with a bound - re-raise to preserve original traceback Nested try/except around cleanup so a secondary failure can't mask the original. Closes #2749. * fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV Problem: terminal.docker_env set in config.yaml was silently ignored. Docker containers never received the user-specified env vars. Root cause: docker_env was missing from all three config→env bridging maps (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync) and from the terminal_tool _get_env_config() reader. _create_environment() consumed the key from container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV was never set. Also extend the list-serialisation branches in cli.py and gateway/run.py to handle dict values via json.dumps (lists already used json.dumps; plain str() on a dict produces undecodable output). Fix: - cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings; serialise dict values with json.dumps alongside existing list path - gateway/run.py: same additions to _terminal_env_map and serialisation - hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV" to _config_to_env_sync so `hermes config set terminal.docker_env …` persists to .env correctly - tools/terminal_tool.py: add docker_env key to _get_env_config() reading TERMINAL_DOCKER_ENV via _parse_env_var with default "{}" Tests: add test_docker_env_is_bridged_everywhere to tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on origin/main, passes with fix. Fixes #20537 * fix(gateway): degrade gracefully when all platform adapters are missing When connected_count == 0 AND enabled_platform_count > 0, the gateway treated 'all adapters returned None' identically to 'all adapters failed to connect' — both as fatal startup errors. The 'returned None' case happens when imports fail silently or when adapters are present in config but their dependencies aren't installed (e.g. discord.py missing). Cron jobs and other gateway-runtime work would unnecessarily fail to start. Split: only return False when startup_retryable_errors is non-empty (real connection attempt failed). When the list is empty AND enabled > 0, log a warning and continue running, matching the 'no platforms enabled' cron path. Salvage of #22642's gateway slice. Drops the bundled run_agent.py memory-nudge counter hydration block (issue #22357 territory) which wasn't mentioned in the PR description. Closes #5196. * fix(fallback): resolve api_key_env in fallback chain entries (carve-out of #22665) Fallback chain entries with 'api_key_env: ENV_VAR_NAME' weren't being resolved by either the init-time fallback path (line ~1660) or the runtime _try_activate_fallback path (line ~8045). Only literal 'api_key' was honored; the snake_case 'api_key_env' alias documented elsewhere in the config was silently dropped, so a 'provider: custom' fallback with base_url + api_key_env worked as primary but failed as fallback with 'no endpoint credentials found' / 401. Adds 'or fb.get("api_key_env")' to the existing 'key_env' lookup in both call sites, with empty-string-to-None coercion so unset env vars don't poison the resolver. Salvage of #22665's fallback portion. The original PR also bundled gateway-degrade-on-no-adapters changes (those land via the carve-out in #22853 which is the same code) and run_agent.py memory-nudge counter hydration (issue #22357 territory, not mentioned in the title). Drops both bundled pieces; keeps just the api_key_env fix. Closes #5392. * fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664) RuntimeError('claude CLI turn timed out') from a local OpenAI-compatible shim was falling through to FailoverReason.unknown, surfacing as 'Empty response from model' and burning 3 retry slots on the same failing endpoint. _classify_by_message had no timeout-message branch — only billing/rate_limit/auth/context_overflow/model_not_found patterns. The type-based check at line 565 also requires isinstance(error, (TimeoutError, ConnectionError, OSError)) — a plain RuntimeError doesn't match. Add _TIMEOUT_MESSAGE_PATTERNS for 'timed out', 'deadline exceeded', 'request timed out', 'operation timed out', 'upstream timed out', 'turn timed out'. _classify_by_message returns FailoverReason.timeout (retryable=True) when any pattern matches. Salvage of #22664's classifier portion. The original PR also bundled a fallback self-selection guard which is now redundant (already on main via #22780) plus DeepSeek thinking and session_search fixes that are their own separate concerns. Follow-up to #22780 — fixes the still-broken classification of generic-typed provider-shim timeouts that #22780's dedup didn't cover. * fix(test_gateway): stop run_gateway() tests from rewriting the dev's installed systemd unit (#22900) run_gateway() calls refresh_systemd_unit_if_needed() on every invocation so restart settings stay current after exit-code-75 respawns. The user-scope unit path resolves under Path.home() (NOT sandboxed by conftest, only HERMES_HOME is), and generate_systemd_unit() bakes the current HERMES_HOME into the unit's Environment= line. Result: any test that exercises run_gateway() end-to-end on a real Linux dev box silently rewrites the developer's installed ~/.config/systemd/user/hermes-gateway.service with a polluted HERMES_HOME pointing at /tmp/pytest-of-<user>/.../hermes_test. On the next reboot, systemd loads that unit, the gateway starts looking at an empty tmp dir, and Telegram/Discord/etc. all show as 'No messaging platforms enabled' even though the user's real config is fine. Three tests in tests/hermes_cli/test_gateway.py hit this path: test_run_gateway_exits_cleanly_on_keyboard_interrupt, test_run_gateway_exits_nonzero_when_start_gateway_reports_failure, and test_run_gateway_root_guard_has_escape_hatch. Two-layer fix: 1. _install_fake_gateway_run helper (covers all four run_gateway() call sites in test_gateway.py and any future ones) now also stubs supports_systemd_services and refresh_systemd_unit_if_needed. 2. refresh_systemd_unit_if_needed() itself sniffs the generated unit body for /pytest-of- and /hermes_test markers and refuses to write when present. Defense in depth so a future test that bypasses the helper still can't corrupt the dev's gateway. Tests that legitimately exercise the refresh flow (test_run_gateway_refreshes_outdated_unit_on_boot) patch generate_systemd_unit to return synthetic content that doesn't carry those markers, so they keep working. Adds test_refresh_refuses_to_bake_pytest_tmpdir_into_real_user_unit as a regression test for the source-side guard. * fix(gateway): detect gateway process via /proc in Docker without procps Salvage of NousResearch/hermes-agent#7622. Docker images often lack procps so `ps` is unavailable. Try reading /proc/*/cmdline first (works in any Linux container) and fall back to `ps -A eww` only when /proc is not present. PermissionError on individual PIDs is silently skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(gateway): stub /proc unavailability in find_gateway_pids fallback test Follow-up test fix for #22693 — the existing test for ps-failure + pid-file fallback needed the /proc walk path stubbed too since /proc is now consulted first. * fix(gateway): pass max_total_size_mb and max_file_size_mb to CheckpointManager The /rollback command handler in gateway/run.py was constructing CheckpointManager with only enabled and max_snapshots, omitting max_total_size_mb and max_file_size_mb that the __init__ expects. This caused a TypeError on every /rollback invocation when checkpoints were enabled. Fixes: NousResearch/hermes-agent#18841 * chore: add DanielLSM to AUTHOR_MAP * fix: use credential_pool for custom endpoint model listing probes Same-provider /model switches on a 'custom' endpoint kept stale credentials because (a) _resolve_named_custom_runtime's bare-custom + explicit_base_url path went straight to OPENAI_API_KEY/OPENROUTER_API_KEY env fallbacks without consulting the credential pool, and (b) switch_model() guarded against custom-provider re-resolution to preserve base_url, locking in the prior api_key. Now the bare-custom path queries the credential pool first (mirroring the named-custom-provider branch behavior), and the same-provider switch guard is removed since resolve_runtime_provider has since grown a robust custom-resolution path that preserves base_url from model_cfg. Refs #18681 (the gateway-side api_key wiring is still separate), #16254, #12919. * chore: add v1b3coder to AUTHOR_MAP * fix(cli): preserve config comments on setting writes * chore: add ming1523 to AUTHOR_MAP * feat(docs): richer info panels on the Skills Hub for built-in + optional skills (#22905) The Skills Hub at /skills had cards that, when expanded, showed only the one-line description, tags, author, version, and an install command. For the 163 bundled and optional skills shipped with the repo, this was thinner than the data we already have on disk. Three changes, all under website/: 1. extract-skills.py now pulls four extra fields per local skill: - 'overview' — first non-heading body paragraph from SKILL.md (stripped of admonitions/code fences, capped at ~500 chars at a sentence boundary) - 'envVars' / 'commands' — from the prerequisites: block in frontmatter - 'license' — from the top-level frontmatter - 'docsPath' — slug to the per-skill /docs/user-guide/skills/.../* page, computed with the same logic as generate-skill-docs.py 162 of 163 local skills get a non-empty overview automatically. The remaining one (media/heartmula) has only headings/code in its body and falls through to the description. 2. Skill TS interface + SkillCard expanded-panel render the new fields: - Overview paragraph at the top of the panel - Prerequisites box (env vars + required commands) when frontmatter declares them - License row alongside author/version - 'View full documentation →' link to the per-skill docs page Search now covers the overview text too, so users can find skills by matching content from inside SKILL.md, not just the one-line description. 3. styles.module.css gains six new classes (overviewBlock, detailLabel, overviewText, prereqBlock/Row/Kind/List/Item, docsLink) styled to match the existing dark panel aesthetic. External / community skills (Anthropic, LobeHub, Claude Marketplace cached indexes) keep the old behavior — overview is empty, no prereqs, no docsPath. Validation: 'npm run build' clean (exit 0); broken-link count unchanged at 155 baseline; all 163 generated docsPath values resolve to existing pages under website/docs/user-guide/skills/. * perf(cli): skip welcome banner on `chat -q` single-query mode (#22904) `hermes chat -q "..."` printed the full welcome banner before running the query — kawaii ASCII logo, available toolsets list, available skills list, model name, session ID, working directory, update-available notice. Building it took ~420 ms on cold start (~200 ms version-update probe, the rest is toolset / skill enumeration plus Rich panel rendering). For a one-shot `-q` query the banner is noise: the user already picked the prompt, doesn't need a toolset reference, and gets the session ID + resume hint from `_print_exit_summary()` after the response prints. The fully-quiet `-Q` / `--quiet` machine-readable path was already banner-free; this brings the human-facing single-query path in line so all non-interactive invocations are fast. Measured impact (`hermes chat -q "ok" --max-turns 1`, 10-run percentiles, 9950X3D): median: 1.90 → 1.75 s (-150 ms) min: 1.80 → 1.73 s ( -70 ms) P25: 1.82 → 1.74 s ( -80 ms) Wider variance than expected; the banner cost overlaps with API latency on real `chat -q` runs. Min-time delta of 70 ms is the cleanest signal — that's the deterministic banner-build cost gone. The 150 ms median delta picks up cases where the version-update probe also finishes during the wait. Interactive mode (`hermes` with no `-q`) and the `--list-tools` / `--list-toolsets` one-shot listing commands still show the banner — those are the contexts where it's actually wanted. Tests: 656/656 `tests/cli/` pass on top of latest main (modulo 5 pre- existing flakes in `test_cli_save_config_value.py` that fail with `No module named 'ruamel'` both with and without this change). * feat(curator): show rename map in user-visible summary (#22910) * feat(curator): show rename map (where skills went) in user-visible summary The full data has always been on disk in REPORT.md, but the user-visible curator summary (gateway 💾 line, CLI session-start panel, `hermes curator status`) was counts-only — "consolidated 4 into 2 umbrellas" with no names. Users only discovered renames when something they expected was gone. New `_build_rename_summary()` formats the rename map and appends it to `final_summary`: auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1 archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale-thing — pruned (stale) full report: hermes curator status Empty on no-op ticks (no archives), so most ticks add zero log noise. Cap of 10 entries keeps agent.log readable when a 50-skill consolidation lands; the full list is always in REPORT.md. `hermes curator status` indents continuation lines so the multi-line summary reads as one logical field. 5 new tests in tests/agent/test_curator_classification.py covering empty / consolidation / pruning / cap / mixed cases. * feat(curator): show recent run summary once on `hermes update` The rename map is now visible from where users actually look — the update flow they explicitly run, instead of just the live gateway log or transient CLI session-start panel. Behavior: - After `hermes update`, if the most recent curator run produced a rename map (multi-line summary) that the user hasn't seen yet, print it once with a 'last run Xh ago' header and a one-time-message footer. - Stamp `last_run_summary_shown_at = last_run_at` after printing so subsequent `hermes update` invocations are silent until a newer curator run lands. - Silent on no-op runs (single-line summary like 'auto: no changes; llm: no change'). Still stamps shown so we don't reconsider on every update. - Silent when the curator has never run (the existing first-run notice handles that case). Output: ℹ Skill curator — last run 4h ago auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1 archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale-thing — pruned (stale) full report: hermes curator status (This message shows once per curator run. View anytime: hermes curator status) State migration: - `_default_state()` gains `last_run_summary_shown_at: None`. Existing state files lack the field; `.get()` returns None; the comparison treats any prior run as 'not yet shown' and prints once on next update. Self-healing. Wiring: - Both `hermes update` paths in main.py call the new `_print_curator_recent_run_notice()` right after the existing first-run notice. Best-effort try/except so a state-load bug never breaks the update flow. 6 tests in tests/hermes_cli/test_curator_recent_run_notice.py: no-run / single-line / multi-line / show-once / new-run-resets / time-formatter buckets. * chore(skills): move heavy training skills + outlines to optional-skills (#22912) These skills require heavy GPU/CUDA stacks or are niche enough that they shouldn't be active by default. Moved to optional-skills/ where users opt-in via `hermes skills install official/...`. Moved: - mlops/training/axolotl - mlops/training/trl-fine-tuning - mlops/training/unsloth - mlops/inference/outlines Counts: 91 -> 87 built-in, 72 -> 76 optional. Auto-regenerated docs (per-skill pages + catalogs) reflect the move. * fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913) Linux's MAX_ARG_STRLEN caps any single argv element at 128 KB (32 * PAGE_SIZE). The previous heredoc-in-the-command-string approach in _write_to_sandbox put the entire tool result inside the 'bash -c' arg, so any result over ~128 KB raised OSError [Errno 7] 'Argument list too long' before the heredoc ever ran. The caller logged a warning, but quiet_mode (CLI default) sets tools.* to ERROR — so the warning never reached agent.log either, and the agent saw a 1.5 KB preview tagged 'Full output could not be saved to sandbox'. Hits delegate_task with 3+ subagent outputs routinely now. Switch to passing content via env.execute(stdin_data=...). cmd is now just 'mkdir -p X && cat > Y' (under 1 KB), and the heavyweight payload travels through stdin where there is no argv-element limit. E2E reproduced the user's exact 144,778-char delegate_task envelope: old code OSError'd, new code round-trips cleanly to disk with all three task summaries intact. * docs(skills): clarify kanban fan-out decomposition * chore: AUTHOR_MAP entry for eloklam (#22898) * fix(kanban): request default board explicitly (#21819) * test(kanban): assert re-block notification is delivered after unblock cycle Adds test_notifier_second_blocked_delivers to cover the case where a task is blocked, unblocked, then blocked again — the second blocked event must still deliver a gateway notification. Currently fails because blocked is treated as a terminal event kind, causing the subscription to be dropped after the first block. * fix(kanban): remove blocked kind from unsub * chore(test): comment of test case rewrite to english * docs(user-stories): add 18 verified social entries (99 → 117) (#22920) Found 18 real Hermes-Agent stories from HN, X, and Reddit not yet captured on the page. All URLs HTTP-verified to return 200 with matching titles. Reddit (15): r/hermesagent (Obsidian-as-memory writeup at 794 upvotes, LLM cheatsheet at 635 upvotes, Kanban game-changer post, OpenRouter #1 ranking, AMA from the Nous team, etc.); r/LocalLLaMA, r/Rag, r/openclaw, r/SideProject, r/LocalLLM threads where users describe their actual setups (Qwen3.5-9b on 16gb VRAM, 5060Ti + Telegram, smart routing tiers). X (3): @vmiss33's 'what I use Hermes for' guide, @HeyYanvi's X-to-NotebookLM podcast workflow, @ExileAI_0's spare-laptop Iris running RenPy + ComfyUI, @brucexu_eth's Hermes Inc. Telegram startup sim from the hackathon, Hype's deep-dive blog. HN (1): 'I'm using Hermes — sandbox it like any agent.' No component changes — all new entries fit the existing schema (real URL, real author, real date). * feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images). * fix(stream-retry): collapse two-line drop status, name provider, and let agent.log capture diagnostics (#22993) Subagent stream drops were spamming the parent terminal with two lines per blip ('Connection dropped...' + 'Reconnected...') while leaving zero breadcrumb in agent.log to debug them. Two underlying bugs, fixed together: 1. quiet_mode raised the run_agent/tools/etc. loggers to ERROR, which filters records before root-logger file handlers see them. The comment claimed 'File handlers still capture everything' — that was wrong. Removed in both run_agent.py and cli.py; console quietness already comes from hermes_logging not installing a console StreamHandler in non-verbose mode. 2. The stream-retry blocks emitted two _emit_status calls per drop ('⚠️ Connection dropped... Reconnecting...' + '🔄 Reconnected — resuming…') with no provider name, so multi-provider sessions had to dig through agent.log to attribute a drop. Replaced both call sites with a single _emit_stream_drop helper that emits ONE line naming the provider and error class, and always writes a structured WARNING to agent.log with subagent_id, depth, provider, base_url, error_type. Net UX change: 6 lines per triple-subagent drop → 3 lines, each naming the provider. agent.log now has a structured breadcrumb per retry that didn't exist before. Tests: 6 new tests in tests/run_agent/test_stream_drop_logging.py covering the logger-level guard, structured WARNING content, single status line per drop (no Reconnected follow-up), and provider naming. * fix(kanban): drop redundant init_db() in gateway watchers (#21378) Both `_kanban_notifier_watcher` and `_kanban_dispatcher_watcher`'s `_tick_once_for_board` called `_kb.connect(board=slug)` immediately followed by `_kb.init_db(board=slug)`. Since `connect()` already runs the schema + idempotent migration on first open per process, the explicit `init_db()` was redundant — and worse, `init_db()` deliberately busts the per-process `_INITIALIZED_PATHS` cache and re-runs the migration on a *second* connection that races the first. On every cold gateway start against a legacy DB this surfaced as either `sqlite3.OperationalError: duplicate column name: <col>` or intermittent `database is locked` errors logged at the first tick. The duplicate-column case is now tolerated by `_add_column_if_missing` (commit 78698381a), but the wasted second migration plus the database-is-locked race remain fixable by skipping the redundant call entirely. Drops `_kb.init_db(board=slug)` at both call sites and adds a regression test in `tests/hermes_cli/test_kanban_notify.py` that pins the absence via source inspection plus a runtime spy. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com> * chore: AUTHOR_MAP entry for li0near gmail (#21378) * chore(models): refresh OpenRouter + Nous fallback lists (#23001) Reorder Anthropic Opus 4.7/4.6 + Sonnet 4.6 to the top, cluster free models at the bottom of the OpenRouter list, and mirror the same ordering into the Nous portal list (paid models only). - Add inclusionai/ring-2.6-1t:free - Drop minimax-m2.5, minimax-m2.5:free, sonnet-4.5, mimo-v2.5, glm-5v-turbo, glm-5-turbo, trinity-large-preview:free, trinity-large-thinking, qwen3.5-plus-02-15 - Replace qwen3.5-35b-a3b with qwen3.6-35b-a3b - Drop x-ai/grok-4.20-beta from the Nous list * fix(kanban): /kanban slash command emits argparse garbage instead of help Closes #21794. `/kanban`, `/kanban help`, `/kanban --help`, and `/kanban <sub> -h` all returned broken output to the gateway and interactive CLI. Three underlying bugs in `hermes_cli.kanban.run_slash`: 1. argparse writes help to **stdout** but `run_slash` only captured stderr at parse time, so `-h` text was silently swallowed and replaced with the `(usage error: 0)` sentinel. 2. The wrapping parser used `prog="/"` and routed via a synthetic "_top → kanban" subparser, producing `usage: / kanban …` (stray space) and `usage: /kanban kanban …` (doubled token) in error text. 3. Bare `/kanban` and `/kanban help` dumped argparse's full ~3KB usage tree, which reads as visual garbage in a chat bubble. Fix: drive the kanban_parser directly (no double-wrap), rewrite prog strings on every leaf subparser, capture stdout AND stderr around parse_args, distinguish SystemExit(0) (help — return captured stdout) from SystemExit(2) (error — return single-line ⚠-prefixed message), and add an explicit chat-friendly short-help block returned for bare invocation and the help aliases (`help`, `--help`, `-h`, `?`). Added 5 regression tests covering bare invocation, every help alias, subcommand help, unknown action, and missing required arg. Affects every chat platform via gateway/run.py::_handle_kanban_command and the interactive CLI via cli.py::_handle_kanban_command. Co-Authored-By: Nagatha (Claude Opus 4.7) <noreply@anthropic.com> * chore: AUTHOR_MAP entry for tymrtn (#21794) * feat(stream-retry): add upstream + timing diagnostics to drop log (#23005) The previous PR (#22993) gave us a structured WARNING per stream drop but the only diagnostic was 'error_type=APIError error=Network connection lost.' — same nothing the user started with. To actually diagnose why subagents drop streams disproportionately we need to know WHERE the drop happened. Adds three breadcrumbs to the agent.log WARNING: 1. Inner exception chain. openai SDK wraps httpx errors as APIConnectionError / APIError so the catch site only sees the wrapper. _flatten_exception_chain walks __cause__/__context__ up to 4 levels deep and renders 'Outer(msg) <- Inner(msg)' so we can tell ConnectError vs RemoteProtocolError vs ReadError vs ProxyError without enabling verbose mode. 2. Upstream HTTP headers. Snapshots cf-ray, x-openrouter-provider, x-openrouter-model, x-openrouter-id, x-request-id, server, via, etc. from stream.response immediately after open (so they survive even when the stream dies before the first chunk). These answer 'is one CF edge / one downstream provider responsible, or random?' 3. Per-attempt counters. bytes streamed, chunk count, elapsed time on the dying attempt, and time-to-first-byte. Distinguishes 'couldn't connect at all' (0s, 0 bytes) from 'died after 30s mid-stream' (very different root causes — first is auth/routing, second is upstream idle-kill or proxy timeout). Plumbing: - _stream_diag_init / _stream_diag_capture_response live on AIAgent and produce a per-attempt dict held on request_client_holder['diag'] for closure access from the retry block. - _call_chat_completions and _call_anthropic both initialize the diag and increment counters per chunk/event (best-effort, never raises in the streaming hot path). - _log_stream_retry / _emit_stream_drop accept an optional diag and render the new fields. Final-exhaustion log goes through the same helper so it gets the same diagnostic dump. - UI status line gains a brief 'after Xs' suffix when timing is available — distinguishes 'connect failed' from 'died mid-stream' at a glance without grepping logs. Sample WARNING after this change: Stream drop mid tool-call on attempt 2/3 — retrying. subagent_id=sa-2-cafef00d depth=1 provider=openrouter base_url=https://openrouter.ai/api/v1 error_type=APIError error=Connection error. chain=APIError(Connection error.) <- RemoteProtocolError(peer closed connection without sending complete message body) http_status=200 bytes=12400 chunks=47 elapsed=12.00s ttfb=0.83s upstream=[cf-ray=8f1a2b3c4d5e6f7g-LAX x-openrouter-provider=Anthropic x-openrouter-id=gen-abc123 server=cloudflare] Tests: 10 covering diag init, header capture (whitelist enforced for PII), exception-chain walking + depth cap, log content with full diag, log content without diag (placeholders), UI elapsed-suffix on/off. * fix(review): tell background reviewer not to capture transient env failures as skills (#23004) Closes #6051. Reported failure mode: agent migrated to WSL2, browser launch failed because Playwright wasn't installed yet. Background reviewer captured the failure as a durable skill (`browser-tool-launch-issue`) and the agent kept refusing the browser tool for weeks after Playwright was installed and verified working. Negative claims also propagated into unrelated skills ("browser tools do not work", "cannot use Y from execute_code"). Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both lean hard on "be active, save things, a pass that does nothing is a missed learning opportunity." Neither distinguished durable knowledge from transient environment state. The reviewer was doing what it was told. Fix at the write site — both prompts now carry a "Do NOT capture" section calling out: • Environment-dependent failures (missing binaries, fresh-install errors, post-migration path mismatches, 'command not found', unconfigured credentials, uninstalled packages) • Negative claims about tools or features ("X does not work") that harden into self-cited refusals • Session-specific transient errors that resolved before the conversation ended • One-off task narratives ("summarize today's market", "analyze this PR") — also addresses the #12812 / #4538 family Plus a positive-reframing line: when a tool fails because of setup state, capture the FIX (install command, config step, env var) under an existing setup/troubleshooting skill — never "this tool doesn't work" as a standalone constraint. Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py (2 new + all existing review-prompt assertions). Substring-based checks so future prompt edits don't false-fail. * feat(codex): add gpt-5.3-codex-spark model * fix(model-metadata): restore gpt-5.3-codex-spark fallback context * fix(model-metadata): set codex-spark fallback context to 128k * fix: surface Codex CLI-only models * chore: add codex-spark salvage contributors to AUTHOR_MAP Maps olegwn@gmail.com → nederev (PR #18286) and vesper@askclaw.dev → askclaw-vesper (PR #19530) so the contributor attribution check passes when their commits land via this salvage. * docs(codex-spark): document ChatGPT Pro entitlement gating PR #12994 stripped gpt-5.3-codex-spark on the assumption that it was unsupported. It's actually research-preview, ChatGPT-Pro-only, exposed via the Codex OAuth backend at chatgpt.com/backend-api/codex/models — not via the public OpenAI API. Add explanatory comments in: - DEFAULT_CODEX_MODELS / _FORWARD_COMPAT_TEMPLATE_MODELS (codex_models.py) - _CODEX_OAUTH_CONTEXT_FALLBACK (model_metadata.py) - list_authenticated_providers' live-discovery branch (model_switch.py) so future maintainers don't strip the entry again. Also documents the intentional asymmetry that Spark stays out of the "openai" provider catalog (it isn't on the public API) and why the supported_in_api filter is *not* applied for the openai-codex route. * test(codex-spark): add live-API regression and make picker test deterministic Two follow-ups from self-review: 1. Add unit test for _fetch_models_from_api covering the live HTTP path. The salvaged PR #19530 dropped the supported_in_api:false filter in both _fetch_models_from_api and _read_cache_models, but only the cache path had a regression test. This adds the symmetric live-fetch test (mocked httpx) so a future drive-by cha…

NousResearch#22858) Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/, guides/, and integrations/ against the live registries and gateway code. messaging/ - index.md: API Server toolset is hermes-api-server (was 'hermes (default)'); Google Chat slug is hermes-google_chat (underscore — plugin name uses _). - google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such extra); list the actual deps (google-cloud-pubsub, google-api-python-client, google-auth, google-auth-oauthlib). - qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is silently ignored by the adapter); QQ_STT_BASE_URL is not read directly — baseUrl lives under platforms.qqbot.extra.stt. - teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline plugin must be enabled), not a built-in subcommand. - sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default SMS_WEBHOOK_HOST). - open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to per-profile .env, not 'hermes config set' (same pattern fixed in api-server.md last round). Also bumped example ports to 8650+ to dodge the default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646) collision. developer-guide/ - architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py, gateway/run.py replaced with 'large file' to stop drifting. - agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)'). - gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway platform tree updated (qqbot is a sub-package, not qqbot.py; added yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/ (always active)' was wrong — it's an empty extension point and _register_builtin_hooks() is a no-op stub. - acp-internals.md: drop fictional 'message_callback' from the bridged- callbacks list; clarify thinking_callback is currently set to None. - provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry, NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama Cloud, LM Studio, Tencent TokenHub. Fallback section described only the legacy single-pair model — corrected to the canonical list-form fallback_providers chain. - environments.md: parsers list missing llama4_json and the deepseek_v31 alias; both register via @register_parser. - browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py which doesn't exist in-repo. - contributing.md: tinker-atropos is a git submodule — note that 'git submodule update --init' is required if cloning without --recurse-submodules. guides/ - operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is positional (not --schedule), the script-only flag is --no-agent (not --script-only), and there's no --command flag. Replaced with a real example that creates the script under ~/.hermes/scripts/ and uses the actual flags. Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'. - automation-templates.md: 'cron create --skills "a,b"' doesn't work — the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST rewrite. - minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently fails because --region isn't registered on the auth-add argparse spec. Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for China-region access. - cron-script-only.md: 'hermes send' is fictional — replaced the comparison- table mention with a webhook-subscription pointer; also fixed the dead link to /guides/pipe-script-output (page doesn't exist). - cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed at 'hermes gateway' (foreground) / 'hermes gateway start' (service). - local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right knob is the HERMES_API_TIMEOUT env var. - python-library.md: run_conversation() return dict has only final_response and messages — task_id is stored on the agent instance, not echoed back. - use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in one quoted string, so cmd.exe gets a single arg instead of the multi-token command line it needs. Removed the surrounding quotes — argparse nargs='*' collects each token correctly. integrations/ - providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist); actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 -> api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai) refreshed. Fallback section rewritten to lead with the canonical fallback_providers list form (was leading with the legacy fallback_model single dict); supported-providers list extended to include azure-foundry, alibaba-coding-plan, lmstudio. index.md - '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with integrations/index.md ('19+') and undercounted — bumped to 20+ and added Weixin/QQ Bot/Yuanbao/Google Chat to the list. Validation: 'npm run build' clean (exit 0); broken-link count unchanged at 155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.

teknium1 force-pushed the docs-audit-round-2 branch from 995e188 to 825d37a Compare May 9, 2026 22:00

teknium1 merged commit fef1a41 into main May 9, 2026
4 checks passed

teknium1 deleted the docs-audit-round-2 branch May 9, 2026 22:00

alt-glitch added type/docs Documentation improvements P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery labels May 9, 2026

This was referenced May 31, 2026

[Docs]: 27 documentation/code inconsistencies — wrong commands, env vars, config keys & defaults (audit round 3) #36048

Open

docs: fix 25 documentation/code inconsistencies (audit round 3) #36051

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: round 2 audit — messaging, developer-guide, guides, integrations#22858

docs: round 2 audit — messaging, developer-guide, guides, integrations#22858
teknium1 merged 1 commit into
mainfrom
docs-audit-round-2

teknium1 commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented May 9, 2026

Summary

Highlights of what was wrong

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants