fix(security): close TOCTOU window when saving Claude Code OAuth credentials by Gutslabs · Pull Request #21152 · NousResearch/hermes-agent

Gutslabs · 2026-05-07T10:51:03Z

Summary

_write_claude_code_credentials in agent/anthropic_adapter.py writes ~/.claude/.credentials.json via Path.write_text → replace → post-write chmod(0o600). Both the temp file and the destination briefly inherit the process umask (commonly 0o644 = world-readable) between create/replace and the chmod, exposing the OAuth access/refresh tokens to other local users on multi-user hosts.

Replace the unconditional write_text + post-write chmod with os.open(O_WRONLY | O_CREAT | O_EXCL, mode=0o600) so the temp file is created atomically with restrictive mode. os.replace then carries the 0o600 mode onto the destination, so the trailing chmod becomes unnecessary. The temp filename also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a prior crashed write.

Parent dir (~/.claude/) is owned by Claude Code itself and shared with its native auth, so we deliberately don't tighten its mode here (unlike #21148 which owns the HERMES_HOME/mcp-tokens/ subtree).

Same pattern as the merged fix for agent/google_oauth.py in #19673 and the parallel fix for tools/mcp_oauth.py in #21148.

Type of Change

Bug fix
Security fix
Tests

Changes Made

agent/anthropic_adapter.py: Replace write_text + replace + post-write chmod in _write_claude_code_credentials with os.open(O_EXCL, mode=0o600) + os.fdopen + fsync + os.replace. Randomize temp filename per process. Add secrets and stat imports.
tests/agent/test_anthropic_adapter.py: Add test_credentials_file_created_with_0o600 to TestWriteClaudeCodeCredentials asserting the resulting file mode is 0o600 (skipped on Windows where POSIX mode bits aren't enforced).

How to Test

pytest -q tests/agent/test_anthropic_adapter.py::TestWriteClaudeCodeCredentials
Confirm the new test_credentials_file_created_with_0o600 passes.
Confirm pre-existing test_writes_new_file and test_preserves_existing_fields still pass (no behavior change for callers of _write_claude_code_credentials).

Reproducing the original race directly is timing-sensitive but observable on Linux with inotifywait -m -e create -e attrib ~/.claude/: the legacy code shows the temp file appearing at the process umask before the chmod, while the new code shows it created at 0o600 in a single step.

Validation

pytest -q tests/agent/test_anthropic_adapter.py::TestWriteClaudeCodeCredentials -> 3 passed
Tested on macOS 25.4 / Python 3.14.4. POSIX-only assertion is gated on sys.platform, so Windows/CI shouldn't see false failures.
Three unrelated tests in TestRunOauthSetupToken were already failing on main before this change (xdist cross-test pollution per the in-source comment); not addressed here.

…entials _write_claude_code_credentials wrote ~/.claude/.credentials.json via Path.write_text + replace + post-write chmod(0o600). Both the temp file and the destination briefly inherited the process umask (commonly 0o644 = world-readable) between create/replace and chmod, exposing the OAuth access/refresh tokens to other local users on multi-user hosts. Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR mode so the temp file is created atomically at 0o600. After os.replace, the destination inherits the temp's mode, so the post-write chmod is no longer needed. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Parent dir (~/.claude/) is owned by Claude Code itself and shared with its native auth, so we deliberately don't tighten its mode here (unlike the mcp_oauth fix which owns its own subtree under HERMES_HOME). Mirrors the fix shipped for agent/google_oauth.py in NousResearch#19673 and the parallel fix for tools/mcp_oauth.py in NousResearch#21148. Adds a regression test in TestWriteClaudeCodeCredentials asserting the resulting file mode is 0o600 (skipped on Windows where POSIX mode bits aren't enforced).

…entials (NousResearch#21152) _write_claude_code_credentials wrote ~/.claude/.credentials.json via Path.write_text + replace + post-write chmod(0o600). Both the temp file and the destination briefly inherited the process umask (commonly 0o644 = world-readable) between create/replace and chmod, exposing the OAuth access/refresh tokens to other local users on multi-user hosts. Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR mode so the temp file is created atomically at 0o600. After os.replace, the destination inherits the temp's mode, so the post-write chmod is no longer needed. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Parent dir (~/.claude/) is owned by Claude Code itself and shared with its native auth, so we deliberately don't tighten its mode here (unlike the mcp_oauth fix which owns its own subtree under HERMES_HOME). Mirrors the fix shipped for agent/google_oauth.py in NousResearch#19673 and the parallel fix for tools/mcp_oauth.py in NousResearch#21148. Adds a regression test in TestWriteClaudeCodeCredentials asserting the resulting file mode is 0o600 (skipped on Windows where POSIX mode bits aren't enforced).

* fix(cli): reuse canonical root model key normalization in load_cli_config * feat(cli): kanban promote verb for manual todo->ready recovery Adds `hermes kanban promote <task_id>` for manual lifecycle recovery when an auto-promote daemon misses the parent-done transition (issue #28822). Refuses promotion unless every parent dep is done/archived (override with --force). Emits a `promoted_manual` audit event distinct from the automatic `promoted` kind, so audit consumers can filter human-driven from system-driven promotions. Supports --dry-run and --json for orchestration. Does not mutate assignee/claim state — the dispatcher picks the card up via its normal ready polling path. Closes #28822. * feat(kanban): --ids bulk promote + AUTHOR_MAP entry for #29464 Adds an --ids flag to 'hermes kanban promote' mirroring the existing block/schedule convention, so the marquee use case from issue #28822 (promote all children of a closed organizational parent in one shot) doesn't require a shell loop. Single-id JSON output stays a flat object for back-compat; bulk emits a list. Dedupes positional + --ids so the same id can't be promoted twice in one call. 5 new CLI-level tests cover bulk happy path, partial-failure exit code, JSON shapes, and dedup. Also adds the thedavidmurray noreply-email -> github-login mapping in scripts/release.py so the salvage cherry-pick passes the AUTHOR_MAP contributor-credit check. * fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290) * fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint Adds a soft guard so an agent running under one Hermes profile cannot silently edit a different profile's skills/plugins/cron/memories. Three layers: A. agent/file_safety.classify_cross_profile_target Classifies a write target against the active HERMES_HOME. Returns a {active_profile, target_profile, area, target_path} dict when the path lands in another profile's scoped area. PROFILE_SCOPED_AREAS = (skills, plugins, cron, memories). get_cross_profile_warning() wraps it into a model-facing error string that names both profiles, names the area, and points at the cross_profile=True bypass. Defense-in-depth, NOT a security boundary — the terminal tool runs as the same OS user and can write any of these paths directly. The guard exists to prevent confused-agent corruption, not to stop a determined attacker. SECURITY.md §3.2 (terminal-bypass posture) still applies. Wired into tools/file_tools.write_file_tool and patch_tool with a cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both advertise cross_profile so the model can pass it after explicit user direction. patch_tool extracts target paths from V4A patch bodies before checking (same shape as the existing sensitive-path check). skill_manage is already scoped to the active profile's SKILLS_DIR by construction, so no extra guard wiring is needed there. The D-side error message (below) still names other profiles when the skill exists elsewhere. B. agent/system_prompt One deterministic line near the environment-hints block names the active profile and tells the model not to modify another profile's skills/plugins/cron/memories without explicit direction. Profile name is stable for the lifetime of the AIAgent, so the line is prompt-cache-safe. D. tools/skill_manager_tool._skill_not_found_error Replaces the bare "Skill 'X' not found." with a message that: - names the active profile, - searches OTHER profiles' skills dirs for the same name, - names the profile(s) where the skill exists and the path, - suggests `hermes -p <name>` to switch profiles, or cross_profile=True for an explicit edit. All 5 "not found" sites in skill_manager_tool (edit, patch, delete, write_file, remove_file) now go through the helper. Reference incident (May 2026): a hermes-security profile session edited skills under both ~/.hermes/profiles/hermes-security/skills/ AND ~/.hermes/skills/ (the default profile's skills) without realizing the second path belonged to a different profile. Three of the four skill files needed manual restoration afterward. What this PR does NOT do: * No hard block. The terminal tool can still touch any of these paths with no guard — same posture as the dangerous-command approval flow. SECURITY.md §3.2 applies. * No regex sweep on terminal commands for cross-profile paths. That direction is a Skills-Guard-style arms race (cd + relative paths, base64, etc.) and would false-positive on legitimate cross-profile reads. Filed as a follow-up. * No on-disk path migration. ~/.hermes/skills/ remains the default profile's skills dir; this PR is about telling the agent about that boundary, not changing the layout. Tests: tests/agent/test_file_safety_cross_profile.py (16 tests) - _resolve_active_profile_name covers default/named/failure paths - classify_cross_profile_target covers all four scoped areas, both directions (default → named, named → default, named → named), non-Hermes paths, and root-level config files - get_cross_profile_warning covers in-profile no-op, cross-profile message shape, and the defense-in-depth self-documentation tests/tools/test_cross_profile_guard.py (12 tests) - write_file: in-profile allow, cross-profile block, cross_profile=True bypass, non-Hermes pass-through - patch: replace-mode block, cross_profile=True bypass, V4A patch path extraction - skill_manage: error names the other profile (single + multiple), missing-everywhere falls back to skills_list hint - system prompt: contract-level checks (both branches present, cross_profile=True mentioned, ~/.hermes/profiles/ referenced) All 207 existing tests in file_safety/file_operations/skill_manager still pass. 10 system-prompt tests still pass. E2E verified: the exact incident scenario (security profile editing default's hermes-agent-dev skill) is now blocked with the warning message; cross_profile=True unblocks. * fix(code_execution): add cross_profile to write_file/patch stubs The cross_profile kwarg added to write_file_tool/patch_tool needs to flow through the execute_code sandbox stubs in _TOOL_STUBS so the test_stubs_cover_all_schema_params drift test passes. Without this, scripts running inside execute_code couldn't pass cross_profile=True through hermes_tools.write_file(). Caught by CI on PR #31290. * fix(wecom-callback): retry send with fresh token on errcode 40001/42001 When WeCom returns errcode=40001 (invalid credential) or 42001 (token expired), send() was returning a failure without evicting the bad token from _access_tokens. All subsequent sends then kept using the same invalid cached token until its TTL naturally expired (~7200s). Fix: on the first token-rejection errcode, evict the cache entry and retry once with a freshly fetched token. Non-token errcodes fail immediately as before. If the refreshed token also fails, the error is returned without looping further. Adds four regression tests covering: successful retry on 40001, successful retry on 42001, no retry on unrelated errcode, and clean failure when the refresh does not help. * gateway: debounce queued text follow-ups * chore: map pnascimento9596@gmail.com for PR #31235 salvage * fix(gateway): drop text snippet from debounce debug log (CodeQL) CodeQL py/clear-text-logging-sensitive-data flagged the candidate-accept debug log including event.text[:60]. Log text_len instead — sufficient for debugging burst behavior without surfacing message contents. Co-authored-by: Paulo Nascimento <pnascimento9596@gmail.com> * fix(wecom): guard flush task against cancel-delivery race to prevent message loss When asyncio.sleep() fires just before Task.cancel() is called, CPython sets _must_cancel=True but cannot cancel the already-completed sleep future, so CancelledError is delivered at the next await (handle_message) rather than at the sleep. By that point the superseded task has already popped the merged event from _pending_text_batches, so the superseding task sees an empty batch and silently drops the message. Fix: add a synchronous task-registry check between the sleep and the pop. No await between the check and the pop means no other coroutine can interleave, so the guard is race-free. * fix(cli): decouple tool_progress=verbose from global DEBUG logging (#31379) PR #6a1aa420e coupled `display.tool_progress: verbose` (a per-tool display toggle for full args / results / think blocks) to `self.verbose` — which controls root-logger DEBUG level. Result: setting tool_progress: verbose in config silently flipped every module in the process to DEBUG and flooded the terminal with internal logging, far beyond just full tool calls. The two concepts are separate: - `tool_progress_mode == 'verbose'` → display behavior (tool rendering) - `self.verbose` → logging behavior (root logger → DEBUG, line 9795) This change keeps PR #6a1aa420e's argparse.SUPPRESS / config-fallback plumbing but severs the verbose-display → debug-logging link. Changes: - cli.py:2868 — `self.verbose` only follows explicit `verbose=` arg; no longer auto-True when tool_progress_mode == 'verbose'. - cli.py:_toggle_verbose — slash-cycle through tool progress modes no longer flips `self.verbose` / `agent.verbose_logging` / `agent.quiet_mode`. - cli.py:9355 — fix misleading label (drop 'and debug logs'). - tui_gateway/server.py:_make_agent — same decoupling on the TUI side (verbose_logging no longer derived from tool_progress_mode). - tests/cli/test_tool_progress_scrollback.py — invert the test that asserted the broken coupling; add coverage for explicit `--verbose` still enabling DEBUG independent of tool_progress. Live verified: - tool_progress: verbose, no --verbose flag → 0 DEBUG/INFO log lines - --verbose flag explicit → 32 DEBUG/INFO log lines (as expected) * feat(secrets/bitwarden): EU Cloud + self-hosted server URL support (#31378) Closes #31370. bws defaults to the US identity endpoint, so EU Cloud and self-hosted machine-account tokens fail with [400 Bad Request] {"error":"invalid_client"} during 'hermes secrets bitwarden setup'. The token is valid — it's just being checked against the wrong region. Add a Bitwarden region step to the wizard between the access-token and project-list steps: Step 1 Install bws Step 2 Provide access token Step 3 Pick region <-- new (US / EU / self-hosted-custom-URL) Step 4 Pick project (now talks to the right endpoint) Step 5 Test fetch Region is stored in config.yaml as secrets.bitwarden.server_url and plumbed into every bws subprocess as BWS_SERVER_URL (project list, secret list, test fetch, and the env_loader startup pull). Also: - Non-interactive: 'hermes secrets bitwarden setup --server-url ...' - Pre-existing BWS_SERVER_URL in the shell is detected and reused - Cache key includes server_url so EU/US fetches don't collide - 'hermes secrets bitwarden status' shows the configured region - 'invalid_client' / '400 Bad Request' from bws now triggers a hint pointing at the region setting instead of looking like a bad token * fix(security): restrict dashboard websockets to loopback clients (#30741) * fix(gateway): stop enabling dingtalk allow-all during setup (#30743) * fix(gateway): remove discord role allowlist auth bypass (#30742) * security: restrict default webhook toolset capabilities (#30745) * fix(qqbot): authorize approval button interactions by session owner (#30737) * Harden msgraph webhook auth requirements (#30169) * fix(docker): keep dashboard side-process loopback by default (#30740) * security: harden API server key placeholder handling (#30738) * fix(feishu): authorize interactive exec approval callbacks (#30739) * fix(feishu): enforce auth and chat binding for approval buttons (#30744) * fix(feishu): require webhook auth secret and honor config extras (#30746) * fix(acp): deliver final_response after streaming — transform_llm_output hook now visible When streaming is active, streamed_message=True skipped the final_response update, causing plugin hooks like transform_llm_output to be silently invisible. Remove the `not streamed_message` guard so the final response (possibly transformed by plugins) is always delivered to the ACP client. * fix: propagate response_transformed flag — plugin hook output survives streaming suppression When a transform_llm_output hook modifies final_response after streaming, the gateway was silently discarding the transformed content because streamed=True / content_delivered=True triggered the final-send suppression. Three changes: 1. conversation_loop: set `_response_transformed=True` when a transform_llm_output hook returns a non-empty string, and expose it as `response_transformed` in the result dict. 2. gateway/run: skip the final-send suppression when `response_transformed` is True — the transformed response must reach the client even if streaming already sent the original text. 3. acp_adapter/server: remove `not streamed_message` guard so final_response is always delivered (ACP path fixed separately). * fix(gateway): propagate response_transformed flag through run_sync return dict run_sync() cherry-picks fields from the run_conversation result dict into a new response dict for the gateway. response_transformed was missing from the cherry-pick list, so the gateway always saw it as False and suppressed the final send even though a transform_llm_output hook had modified the content. * fix(gateway): edit streamed message instead of sending duplicate when response_transformed When a transform_llm_output hook appends content after streaming, the previous fix skipped the final-send suppression which caused the full response to be sent as a NEW message (duplicate). Instead, edit the existing streamed message in-place to append the transformed content, then set already_sent=True. Added stream_consumer.message_id and .accumulated_text public properties. * test(gateway): regression for plugin-transformed response after streaming Adds a test that fails without the gateway fix, exercising the response_transformed=True branch in _finalize_response: a streamed response whose final text was modified by a transform_llm_output plugin hook must be edit_message'd in place (not duplicate-sent), with already_sent=True so the normal final-send is skipped. Also drops two minor leftovers from the salvaged PR #29119: * accumulated_text property on GatewayStreamConsumer (unused) * duplicate _response_transformed=False inside the hook try block * chore: map kenyon1977@gmail.com for PR #29119 salvage * fix(acp): only deliver final_response after streaming when transformed PR #29119 dropped the 'not streamed_message' guard unconditionally so that plugin-transformed responses (transform_llm_output hook) would reach ACP clients. That regressed test_prompt_does_not_duplicate_streamed_final_message: when no transform happened, the streamed text was re-sent as a duplicate final delivery. Tighten the condition to mirror the gateway side: deliver after streaming only when response_transformed=True. Otherwise keep the old guard. Adds test_prompt_delivers_transformed_response_after_streaming so the transformed path stays covered. * fix(streaming): emit finish_reason=length on text-only partial-stream stub When the API connection drops mid-stream after text deltas have already been delivered, chat_completion_helpers returned a stub response with finish_reason=stop. The conversation loop then classified the stub as a clean text completion (text_response(finish_reason=stop)) and exited with iteration budget remaining — even when the goal-judge verdict came back as "continue" milliseconds later (issue #30963). Switch the text-only partial-stream stub to finish_reason=length. The existing length-continuation path (length_continue_retries up to 3, "continue exactly where you left off" prompt, partial parts merged into final_response) then fires automatically: the partial assistant content is persisted, the model is asked to continue from the cut point, and the loop keeps making progress against the goal. The mid-tool-call branch keeps finish_reason=stop on purpose — its user-facing warning ("Ask me to retry if you want to continue") asks the user to drive the retry rather than auto-replaying a tool call with possible side effects. #5544's "no duplicate message" contract is preserved verbatim: the partial content is reused, never re-emitted as a fresh API call, so the user never sees two copies of the same delta. Refs: NousResearch/hermes-agent#30963 * fix(conversation-loop): tailor length-continuation prompt for partial stream The length-continue path's user-facing vprint and continuation prompt both told the model "your response was truncated by the output length limit." That's a lie when the stub came from a partial-stream network error (issue #30963) — and a lie the model can detect, leading to "I wasn't truncated, I'm done" no-op responses that defeat the continuation entirely. Detect the partial-stream-stub via response.id and swap in: - vprint: "Stream interrupted by network error (finish_reason='length' on partial-stream-stub)" - prompt: "[System: The previous response was cut off by a network error mid-stream. Continue exactly where you left off. Do not restart or repeat prior text. Finish the answer directly.]" Real length truncations still see the original "truncated by output length limit" prompt — the model needs to know which class of failure it's recovering from. Same length_continue_retries=3 budget, truncated_response_parts merging, and final-response stitching infrastructure on both branches. Refs: NousResearch/hermes-agent#30963 * test(streaming): pin partial-stream-stub finish_reason + continuation contract Three test classes lock in the #30963 fix: 1. TestPartialStreamStubFinishReason — drives _interruptible_streaming_api_call through the two recovery branches and asserts: - text-only partial → finish_reason="length" (the new behaviour), - mid-tool-call partial → finish_reason="stop" (unchanged on purpose). 2. TestLengthContinuationPromptBranching — pure-Python check on the branch that picks the continuation prompt by response.id. Locks the network error wording for partial-stream-stub vs. the output-length wording for everything else. 3. TestConversationLoopPartialStreamContinuation — feeds a stub + continuation pair into run_conversation, verifies the loop makes a second API call (instead of exiting with text_response(stop)), confirms the network-error continuation prompt actually reaches the model on call #2, and that final_response stitches both halves. Refs: NousResearch/hermes-agent#30963 * fix(mcp): raise ImportError instead of NameError when stdio SDK missing (#31450) When the 'mcp' Python SDK isn't installed, _run_stdio leaked a bare 'NameError: name StdioServerParameters is not defined' because the top-level 'from mcp import ...' fails inside try/except ImportError, leaving the names unbound at module scope. Mirror the _MCP_HTTP_AVAILABLE gate that _run_http already had: raise a clear ImportError with install instructions instead. Fixes #30904 * fix(dashboard): require auth for plugin rescan (#27340) * fix(gateway): validate Svix webhook signatures (#30200) * fix: fail closed for webhook routes without secrets Reject unsigned webhook requests when a route has no effective HMAC secret, even if the request handler is reached without the normal connect-time validation. Add regression coverage for the direct-handler path. * fix(webhook): use 403 not 500 for missing-secret rejection Operator misconfiguration is a client/setup error, not an internal server exception. 403 "forbidden" more accurately reflects "this route refuses to authenticate" than 500 "internal server error" — the latter triggers incident alerting on operator monitoring and conflates real bugs with config drift. Follow-up tweak to PR #29629 by @m0n3r0. * chore(release): map m0n3r0 for PR #29629 salvage * fix(feishu): validate verification token before reflecting url_verification challenge When FEISHU_VERIFICATION_TOKEN is configured, an unauthenticated remote could previously prove endpoint control by sending a url_verification payload with any attacker-controlled challenge string — the handler reflected the challenge BEFORE running the token check. Move the verification_token check ahead of the url_verification echo so the challenge response is gated on a valid token. Add a regression test covering the wrong-token case. Also fix the stale test_connect_webhook_mode_starts_local_server fixture to set FEISHU_VERIFICATION_TOKEN (post #30746 webhook mode requires a secret). Salvaged from PR #29663 by @m0n3r0 — kept the url_verification reorder and its regression test; dropped the host-conditional weakening of the #30746 secret guard (we want webhook secrets required regardless of bind host, not only on 0.0.0.0/::). Docs updated to call out the gating. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com> * fix(state): restrict sensitive store file permissions response_store.db (api server) holds conversation history including tool payloads, prompts, and results. webhook_subscriptions.json holds per-route HMAC secrets. Under a permissive umask (e.g. 0o022, default on most distros) both files were created mode 0o644 — readable by other local users on shared boxes. - gateway/platforms/api_server.py: ResponseStore tightens itself + WAL/SHM sidecars to 0o600 after __init__, then trusts the inode. (Original contributor patch chmod'd after every _commit() — wasteful on a hot api_server path; chmod-on-create is sufficient since SQLite preserves mode bits across writes.) - hermes_cli/webhook.py: _save_subscriptions writes via tempfile.mkstemp (which itself creates the file with 0o600), chmods the temp before the atomic rename, and re-asserts 0o600 on the destination so an existing permissive file from before this fix gets narrowed. Tests cover (a) creation under permissive umask leaves 0o600 and (b) an existing 0o644 webhook_subscriptions.json gets narrowed on next save. Tests guarded with skipif os.name=='nt' since POSIX mode bits don't apply on Windows. Salvaged from PR #30917 by @Hinotoi-agent. Reworked the api_server.py side from chmod-on-every-commit to chmod-on-create. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com> * fix(tests): align CI tests with recent security hardening (#31470) Four recent security PRs landed on main with stale/missing test updates, breaking 4 test shards on every subsequent PR's CI run: - test_discord_bot_auth_bypass.py (PR #30742 c3caca658): DISCORD_ALLOWED_ROLES no longer bypasses _is_user_authorized. Inverted 3 tests to assert the new (correct) behavior: role config alone does NOT authorize at the gateway layer. - test_msgraph_webhook.py (PR #30169 4ca77f105): adapter.is_connected is a @property, not a method. Test was calling it with () after the connect() change; TypeError: 'bool' is not callable. Removed the parens. - test_feishu_approval_buttons.py (PR #30744 bdb97b857): Card-action callbacks now go through _allow_group_message authorization. 3 tests in TestCardActionCallbackResponse didn't populate adapter._allowed_group_users so the operator's open_id got rejected. Added the allowlist setup to each test, matching the existing pattern in test_returns_card_for_approve_action. Also raise tolerance on test_wait_for_process_kills_subprocess_on_keyboardinterrupt: the SIGTERM → 3s TimeoutStopSec → SIGKILL → reap chain can exceed 10s under loaded xdist (40 workers). Bumped _wait_for_pgid_exit timeout 10→30s and worker join timeout 5→15s. Passes 100% in isolation already; this just makes it tolerant of CI-host load. Validation: 270/270 tests pass across the 5 affected files. * fix: emit guardrail halt message to client before closing stream When the tool loop guardrail fires (max_tool_failures, etc.), the turn exits with guardrail_halt but no final assistant message was emitted to the client. The SSE stream closed silently — indistinguishable from a crash. The stream_delta_callback(None) before tool execution is a display flush, not a hard close. After generating the halt response, emit it through both _safe_print (CLI) and stream_delta_callback (SSE) so clients see the explanation. Fixes #30770 * test(guardrail): assert halt message reaches stream_delta_callback Regression guard for #30770 — verifies the guardrail-halt branch in agent/conversation_loop.py pushes the synthesized halt message through stream_delta_callback before breaking out of the loop. Without the emit, chat-completions SSE writers drain an empty queue and clients (Open WebUI, etc.) see a finish chunk with zero content delta — indistinguishable from a crash. Verified: the test fails when the production fix is reverted. * fix(dashboard): validate WebSocket Host and Origin * test(dashboard): send loopback headers for WebSocket sidecar test * fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main (#31452) * fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main for vision Fixes #31179. Three coupled fixes so a configured aux vision backend actually serves vision tasks instead of silently routing images to the user's main provider: 1. agent/auxiliary_client.py: `auxiliary.<task>.provider: openai` resolves to `custom` + `https://api.openai.com/v1`. "openai" was not in PROVIDER_REGISTRY (we have `openai-codex` for OAuth and `custom` for manual base_url), so the obvious config name silently failed to build a client. User-supplied base_url is still preserved; only the provider name normalises to `custom` so resolution doesn't hit the PROVIDER_REGISTRY-only path. 2. agent/auxiliary_client.py: the vision auto-detect chain now skips the user's main provider when models.dev reports `supports_vision=False`. Without this guard, a misconfigured aux provider would fall back to `auto`, which happily returned the main-provider client. The caller would then send image content to e.g. api.deepseek.com with model `gpt-4o-mini` and get a cryptic `unknown variant 'image_url', expected 'text'` from the provider's parser. 3. tools/vision_tools.py + tools/browser_tool.py: `check_vision_requirements` now mirrors the runtime fallback chain (explicit provider, then auto), so `vision_analyze` shows up whenever vision is actually serviceable. `browser_vision` gets a new `check_browser_vision_requirements` check_fn that AND-gates browser + vision availability, so it doesn't get advertised to the model when the call would fail at runtime. Reproduction (config from the bug report): model.provider: deepseek model.default: deepseek-v4-pro auxiliary.vision.provider: openai auxiliary.vision.model: gpt-4o-mini Before: resolve_vision_provider_client() returns None for the explicit provider, fallback auto returns the deepseek client with model='gpt-4o-mini', image hits api.deepseek.com → 'unknown variant image_url'. vision_analyze hidden from tool list; browser_vision exposed but fails at call time. After: resolves to custom + api.openai.com/v1 with model gpt-4o-mini. vision_analyze and browser_vision both gate correctly on capability. Tests: tests/agent/test_vision_routing_31179.py covers all three fixes (12 cases including the user's exact scenario, base_url preservation, text-only-main skip, capability-unknown permissive fallback, and tool gating parity). Existing 382 tests across auxiliary/vision/image_routing suites still pass. * test(vision): use exact hostname check to silence CodeQL substring-sanitization alert * fix(auxiliary): drop model name from vision-skip debug log to silence CodeQL The new `logger.debug(...)` added in the previous commit interpolated both `main_provider` and `vision_model` (a public model slug \u2014 not sensitive). CodeQL's `py/clear-text-logging-sensitive-data` heuristic re-flagged it twice because the rule mis-detects multi-value interpolations near tainted-via-config provider strings. Drop the model from the log args (provider alone is enough to diagnose the skip; the same sibling branch a few lines up already logs provider only). Behavior unchanged; CodeQL false positive cleared. * fix(gateway): swallow transient Telegram TimedOut at loop level Closes #31066. Closes #31110. An unhandled `telegram.error.TimedOut` (or peer `NetworkError` / `httpx` connection error) propagating to the asyncio event loop killed the entire gateway process, taking down every profile attached to the same runner. systemd restarted the service after ~5s but the active conversation turn was lost. Public adapter methods (`adapter.send`, `adapter.edit_message`, `adapter.send_voice`, …) are individually try/except-wrapped on current main, but at least one async path was reaching the loop with TimedOut unhandled — the report's traceback ends at the deepest httpx frame and doesn't pinpoint the caller. Rather than audit 30+ call sites blind, install a loop-level safety net: `_gateway_loop_exception_handler` is set as the loop's exception handler in `start_gateway()` after `asyncio.get_running_loop()`. It classifies the exception via `_is_transient_network_error()` (walks the __cause__/__context__ chain, matches on class name so the test suite doesn't need the real telegram/httpx packages installed). Transient errors are logged at WARNING with full traceback so the originating call site stays diagnosable; everything else forwards to `loop.default_exception_handler` so real bugs still surface. Tests cover the classifier (known transients accepted, real bugs rejected, cause/context chain unwrap, cyclic-cause termination) and the handler (swallow + log warning, forward unknowns, missing-exception context). One end-to-end test schedules an orphan task raising TimedOut and asserts `asyncio.run` returns cleanly. * fix(agent): abort on HTTP 402 after pool rotation and fallback fail (#31443) Closes #31273. HTTP 402 (insufficient credits) was retried up to agent.api_max_retries times (default 3), burning paid requests against an exhausted balance. Real-world impact: ~$40 in 48h on a 24/7 Telegram+Discord gateway. Root cause: FailoverReason.billing was in the is_client_error exclusion set in agent/conversation_loop.py, which prevents the non-retryable-abort branch from firing. By the time control reaches that predicate: * credential-pool rotation has already run for billing and either continued the loop or returned False (pool exhausted/absent) * the eager-fallback branch has also fired on billing and either continued the loop or fell through (no fallback configured) Falling through to the backoff retry from here has no recovery mechanism left — it just burns more paid requests. Removing billing from the exclusion set makes 402 abort cleanly once pool+fallback recovery has failed, mirroring how 401/403 (also should_fallback=True) already behave. Added tests/run_agent/test_31273_402_not_retried.py which mirrors the is_client_error predicate shape from the source and asserts the invariant (plus a source-inspection guard against accidental re-introduction). * feat(security): on-demand supply-chain audit via OSV.dev (#31460) Adds 'hermes security audit' — a one-shot vulnerability scan against OSV.dev covering three surfaces a Hermes user actually controls: 1. The running Python's installed PyPI dists (importlib.metadata) 2. Plugin requirements.txt / pyproject.toml pins under ~/.hermes/plugins/ 3. Pinned npx/uvx MCP servers in config.yaml Zero new dependencies (stdlib urllib + importlib.metadata + tomllib + concurrent.futures). No auth required for OSV's public batch API. Flags: --json, --fail-on {low,moderate,high,critical} (default: critical), --skip-venv, --skip-plugins, --skip-mcp Output groups findings by source, sorts by severity descending, surfaces fixed-versions inline. Exit 1 when any finding meets the --fail-on tier. Deliberately out of scope: globally-installed pip/npm, editor/browser extensions, daily background scans, auto-blocking of installs. The audit is on-demand by design — daily scans become noise the user trains themselves to ignore. * fix(transport): strip Hermes-internal scaffolding keys before chat.completions The empty-response recovery path in run_agent.py appends synthetic messages tagged with _empty_recovery_synthetic (and the agent loop uses _thinking_prefill / _empty_terminal_sentinel similarly). These are internal bookkeeping markers — they must never reach the wire. chat_completions' convert_messages only stripped Codex Responses leak fields (codex_reasoning_items, call_id, etc.), not these _-prefixed markers. Permissive providers (real OpenAI, Anthropic) silently ignore unknown message keys so the bug stayed hidden, but strict OpenAI-compatible gateways reject them outright. Observed against codex.nekos.me: 502: [ObjectParam] [input[617]._empty_recovery_synthetic] [unknown_parameter] Unknown parameter: '_empty_recovery_synthetic' Because the synthetic messages persist in the session, every subsequent request in that session carries the poisoned key and fails identically — a deterministic 502 the retry loop mistakes for a transient server error. Fix: convert_messages now drops any top-level message key starting with '_'. OpenAI's message schema has no '_'-prefixed fields, so this is safe and future-proofs against new internal markers. Origin: local-author Upstream-PR: none Patch-State: local-only * fix(error-classifier): treat 5xx request-validation errors as non-retryable Standard OpenAI returns request-validation failures (unknown/ unsupported parameter, malformed request) as 4xx. Some OpenAI-compatible gateways return them as 5xx instead — codex.nekos.me returns 502 for an unknown parameter. The generic '5xx -> retryable server_error' rule then misfires: the error is deterministic (every retry gets the identical rejection), so the retry loop burns all 3 attempts, the transport-recovery path resets the counter and burns 3 more, and the result is a request flood against a request that can never succeed. Fix: when a 500/502 body carries an unambiguous request-validation signal — 'unknown parameter' / 'unsupported parameter' / 'invalid_request_error' in the message text, or invalid_request_error / unknown_parameter / unsupported_parameter as the structured error code — classify as a non-retryable format_error so the loop fails fast and falls back. Genuine 502 Bad Gateway with no such signal stays retryable as before. Origin: local-author Upstream-PR: none Patch-State: local-only * chore: map soju06@users.noreply.github.com for PR #26054 salvage * fix(matrix,gateway): Matrix E2EE installs full dep set; plugins respect is_connected Fixes #31116 — two distinct bugs in fresh-install Matrix gateway: 1. Matrix E2EE setup installed only mautrix[encryption], leaving asyncpg / aiosqlite / Markdown / aiohttp-socks uninstalled. The first encrypted connect failed with 'No module named asyncpg' deep inside MatrixAdapter.connect(). Root cause: the setup wizard hand-rolled a pip install of one package instead of using lazy_deps.ensure( 'platform.matrix'), and check_matrix_requirements() short-circuited the runtime installer on 'import mautrix' alone — so the other 4 packages were never pulled in. 2. Discord auto-enabled itself on every gateway start, even when the user never selected Discord and had no DISCORD_BOT_TOKEN. Root cause: gateway/config.py plugin-enablement loop gated enablement on entry.check_fn() (just 'is the SDK importable?') and ignored entry.is_connected (the 'did the user configure credentials?' probe). Same bug class as commit 7849a3d73 fixed for _platform_status in the setup wizard; this is the runtime counterpart. Affects Discord, Teams, and Google Chat. Changes: - hermes_cli/setup.py::_setup_matrix — install via lazy_deps.ensure('platform.matrix') to pull the full feature group. - gateway/platforms/matrix.py::_check_e2ee_deps — verify asyncpg + aiosqlite + PgCryptoStore in addition to OlmMachine, so E2EE failures surface at startup instead of at first encrypted-room connect. - gateway/platforms/matrix.py::check_matrix_requirements — use feature_missing('platform.matrix') as the install gate instead of a single 'import mautrix' check, so partial installs trigger the lazy installer correctly. - gateway/config.py plugin-enablement loop — consult entry.is_connected before flipping enabled=True. Explicit YAML enabled=true still wins. Tests: 3 new in tests/gateway/test_matrix.py (asyncpg-required, aiosqlite-required, partial-install lazy-runs), 5 new in tests/gateway/test_platform_registry.py (is_connected=False blocks, is_connected=True enables, is_connected=None falls back to check_fn, raising probe doesn't enable, explicit YAML wins). Validation: 310 tests across affected test modules pass. * fix(telegram): gate send() on send-path health after reconnect storms (#31165) After sustained Bad Gateway / TimedOut reconnect cycles, the PTB httpx client can enter a state where bot.send_message() returns a valid Message (real message_id) but the message never reaches the recipient. TelegramAdapter.send returns SendResult(success=True) and cron's live-adapter branch marks the run delivered while the message is silently dropped. Add a _send_path_degraded flag. _handle_polling_network_error sets it on reconnect storms; the existing _verify_polling_after_reconnect heartbeat probe clears it once getMe() confirms the Bot client is healthy. While the flag is set, send() short-circuits with SendResult(success=False, retryable=True) so cron falls through to the standalone delivery path (fresh HTTP session). Closes #31165. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com> * fix(agent): only strip mcp_ prefix for OAuth-injected tools (GH-25255) When strip_tool_prefix=True (Anthropic OAuth path), normalize_response unconditionally stripped the mcp_ prefix from ALL tool names starting with mcp_. This broke Hermes-native MCP server tools (registered under their full mcp_<server>_<tool> name in the registry) because the stripped name doesn't match any registry entry. Fix: check the tool registry before stripping. Only strip when: - The stripped name EXISTS in the registry (OAuth-injected tool) - The full name does NOT exist in the registry This preserves backward compatibility for OAuth-injected tools while protecting native MCP server tools from incorrect prefix removal. 7 new tests covering: OAuth strip, native preserve, no-flag, non-mcp, unknown tools, mixed responses, and dual-registration edge case. Signed-off-by: HKPA <hayka-pacha@users.noreply.github.com> * fix(anthropic): skip mcp_ prefix on outgoing tool schemas when already prefixed Companion to the GH-25255 incoming-strip fix from @hayka-pacha. Without this, build_anthropic_kwargs unconditionally added 'mcp_' to every tool name in step 3, so a native MCP server tool registered as 'mcp_composio_X' was sent as 'mcp_mcp_composio_X' on the wire. The incoming strip only removes ONE prefix, which still worked on first call, but on subsequent calls the model pattern-matched the single-prefixed form from message history and produced names that stripped to 'composio_X' — registry miss, dispatch fail. The history-rewrite block (#4) already has this guard. Apply the same guard to the schema-rewrite block (#3) so round-trip is symmetric. Added 4 outgoing-side tests. Existing 7 incoming-side tests still pass. Author map: hayka-pacha added for PR #25270 salvage attribution. Refs GH-25255. * fix(telegram): preserve new DM topic lanes * test(telegram): add brand-new-topic regression for #31086 The cherry-picked fix from #28605 inverts an existing test (an unknown non-lobby thread_id no longer rewrites to the most-recent binding), but that test only seeds two bindings and queries a third thread_id. Add a second regression test that more closely mirrors the live failure mode: seed exactly one prior binding, then query a brand-new thread_id and assert recovery returns None — so the new topic is allowed to get its own session row instead of being silently merged into the previous topic's session. Co-authored-by: Fábio Siqueira <fabioxxx@gmail.com> Co-authored-by: dillweed <dillweed@users.noreply.github.com> * fix: show recap after in-session resume * fix(cli): skip tool-call-only entries in resume recap, expose limits as config options * chore(release): map zhangsamuel12@gmail.com to SamuelZ12 (PR #7480) * test(cli): reconcile resume-recap tests with skip-tool-only default and compression-chain helper - test_tool_calls_shown_as_summary: explicitly disable resume_skip_tool_only (#4434 made True the default; the legacy assertion relied on tool-only entries being rendered as a summary). - test_tool_only_message_skipped_by_default: add coverage for the new default skip behavior. - test_resume_command_*: mock_db.resolve_resume_session_id now returns the same id (no compression chain) so the post-#15000 redirect block doesn't shove a MagicMock into HERMES_SESSION_ID. * feat(config): document resume-recap tuning keys in DEFAULT_CONFIG The hardcoded constants in _display_resumed_history were exposed as config in PR #4434; declare them in DEFAULT_CONFIG and the CLI fallback dict so they show up in 'hermes config' diagnostics and the schema validator. * fix(debug): redact BlueBubbles webhook secrets * fix(gateway): seed plugin extras before is_connected gate (#31703) Follow-up to 54e61f933. The plugin enablement gate calls ``entry.is_connected(probe_cfg)`` BEFORE ``env_enablement_fn`` runs, and the probe is built as ``existing_cfg or PlatformConfig()`` — empty extras, ``enabled=False``. For plugins whose ``is_connected`` reads ``config.extra`` instead of env vars directly, that probe is a misrepresentation of what the platform will look like after enablement. Google Chat's ``_is_connected`` short-circuits on ``config.enabled`` and inspects ``config.extra["project_id"]`` / ``config.extra["subscription_name"]`` — both False on the default probe even when the user has set ``GOOGLE_CHAT_PROJECT_ID`` and ``GOOGLE_CHAT_SUBSCRIPTION_NAME``. Result: Google Chat silently fails the gate on every env-var-only setup. Build a candidate probe that mirrors what the platform will look like post-enablement: - pre-call ``env_enablement_fn`` and layer its result into the probe's ``extra`` (without mutating any existing platform config) - pass ``enabled=True`` on the probe — we're asking "would this BE configured if we let it in?" not "is it currently enabled?" - reuse the same seeded extras when we commit the platform to ``config.platforms`` (avoids calling ``env_enablement_fn`` twice) Discord/IRC/Teams/LINE/ntfy/Simplex ``_is_connected`` hooks read env vars directly, so they are unaffected. This change only restores Google Chat on env-var-only setups while keeping the original #31116 Discord-no-token block intact. All 6 shipped ``env_enablement_fn`` implementations were audited and are pure reads (no ``os.environ`` writes), so running them earlier in the loop has no observable side effects. Tests: 2 new in tests/gateway/test_platform_registry.py covering extras-seeded-before-is_connected and don't-leak-extras-on-gate-fail. 693 tests across 11 adjacent suites pass (platform_registry, config, google_chat, matrix, discord_connect, ntfy_plugin, simplex_plugin, line_plugin, irc_adapter, teams, gateway_platform_gating). Refs #31116. * fix(kanban): refuse to rmtree workspace_path outside managed scratch root (#28818) A board's ``default_workdir`` (e.g. ``hermes kanban boards set-default-workdir my-board /path/to/real/source``) is copied into ``tasks.workspace_path`` for tasks created without an explicit ``workspace_kind``. Those tasks default to ``workspace_kind='scratch'``, so completion calls ``_cleanup_workspace`` and unconditionally runs ``shutil.rmtree(wp, ignore_errors=True)`` — deleting the user's real source tree as if it were disposable scratch storage. Add ``_is_managed_scratch_path()`` and gate ``_cleanup_workspace`` on it: only delete paths under ``HERMES_KANBAN_WORKSPACES_ROOT`` (the worker-side override the dispatcher injects) or under the active kanban home's ``kanban/`` subtree (covering both the legacy default-board root and per-board ``kanban/boards/<slug>/workspaces`` roots). Anything else gets a warning log and is left alone, so a misconfigured ``default_workdir`` can no longer destroy user data on task completion. * fix(kanban): restrict managed-scratch roots to workspaces/ dirs only Copilot review on PR #28819 flagged that `_is_managed_scratch_path` accepted the entire `<kanban_home>/kanban` subtree as managed scratch storage. With that, a task whose `workspace_kind='scratch'` and `workspace_path` was mis-set to `<kanban_home>/kanban`, `.../kanban/logs`, or a board's metadata directory (e.g. `.../kanban/boards/<slug>` without the `workspaces/` child) would pass the containment guard and let task completion `shutil.rmtree` Hermes' own DB, metadata, and log subtrees. Tighten the guard: * Allowed roots are now exclusively `workspaces/` directories — the `HERMES_KANBAN_WORKSPACES_ROOT` override, `<kanban_home>/kanban/workspaces`, and each `<kanban_home>/kanban/boards/<slug>/workspaces` discovered on disk. * Require strict descendancy: a path equal to a root itself is rejected too, because deleting a workspaces root would wipe every task's scratch dir at once. Add a regression test covering the three Copilot-named attack paths (kanban root, kanban/logs, board root without `workspaces/`) plus the workspaces-root-itself case, and confirm the inner task-id dir still matches. * fix(kanban): scratch tasks must not inherit board.default_workdir (#28818) Board defaults represent persistent project checkouts. Scratch workspaces are auto-deleted on completion and must stay under the per-board scratch root that resolve_workspace() creates. Inheriting default_workdir for a scratch task pointed the cleanup path at the user's source tree — the data-loss vector documented in #28818. The containment guard in _cleanup_workspace (just added) is the safety rail. This commit prevents the bad state from being created in the first place: only persistent kinds (dir/worktree) inherit board defaults. Tests updated to cover the new semantics: scratch with default_workdir set keeps workspace_path=None; dir/worktree still inherits the board default. Salvaged from PR #31315 by @leeseoki0 — prevention layer on top of the #28819 containment fix by @briandevans. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com> * chore(release): map leeseoki0 for PR #31315 salvage * fix(cli): add inline --yes/now skip for destructive slash commands (#30768) Issue #30768 reports that on native Windows PowerShell the destructive-slash confirmation modal renders but never registers keypresses, leaving the user unable to confirm or cancel /reset, /new, /clear, or /undo. The modal works on macOS, Linux, and WSL; PR #23907 (merged May 11) replaced the daemon-thread input() pattern with a prompt_toolkit-native keybinding modal but the win32 input pipeline apparently doesn't dispatch keys to the filter-conditioned handlers. The modal investigation is ongoing. This change ships the immediate escape hatch: append `now`, `--yes`, or `-y` to any destructive slash command to bypass the modal and run the action immediately. Works on every platform without touching the broken Windows code path. /reset now -> reset, no modal /new --yes my-session -> new session titled "my-session", no modal /clear -y -> clear, no modal /undo -y -> undo, no modal The default behavior (modal prompts when approvals.destructive_slash_confirm is True) is unchanged for users who don't pass a skip token. Implementation: - New classmethod HermesCLI._split_destructive_skip(text) -> (remainder, skip) parses a destructive-slash command string, strips the leading "/cmd" word and any recognized skip tokens (case-insensitive exact match, not substring), and reports whether a skip was requested. - HermesCLI._confirm_destructive_slash gains an optional cmd_original= arg. When the arg contains a skip token, it returns "once" immediately — before the gate check and before any modal rendering. - The /clear, /new, /undo handlers in process_command pass cmd_original through. /new additionally uses _split_destructive_skip to strip skip tokens from the remaining text before deriving the session title, so "/new now My Session" yields title="My Session" (not "now My Session"). Tests: - 7 new unit tests in tests/cli/test_destructive_slash_confirm.py covering the helper (recognized tokens, command-word stripping, case-insensitive exact match, None/empty input) and the modal bypass (now and --yes both skip; no-skip-token still consults the modal). - 3 new integration tests in tests/cli/test_destructive_slash_inline_skip_e2e.py driving HermesCLI.process_command end-to-end and asserting (a) new_session is invoked, (b) the modal is never reached, (c) the skip token does not leak into the session title, and (d) the no-skip-token path still reaches the modal as a sanity check that we haven't accidentally short-circuited the normal flow. All 31 tests across the destructive-slash test surface pass. Docs: - website/docs/reference/slash-commands.md documents the new flags both in the destructive-commands table and the dedicated approval section, with a link back to issue #30768 explaining why the escape hatch exists. * fix(cli): show full session titles in /resume list * fix(file-safety): write-deny pairing/ directory to prevent approved-list injection The gateway pairing directory (~/.hermes/pairing/) stores per-platform access-control files (telegram-approved.json, discord-approved.json, etc.). A prompt-injected agent using write_file could add arbitrary user IDs to an approved file, granting persistent gateway access without going through the pairing code flow — the same threat class that motivated protecting webhook_subscriptions.json (#14157). The pairing directory was not included in the original control-plane protection because it postdates PR #14157. PR #30383 introduced the hashed-pending schema and made the approved files the sole source of truth for gateway access, raising the security sensitivity of the directory. Apply the same mcp-tokens pattern: block writes to pairing/ and any path within it, under both the active hermes_home and the root path (for profile-mode parity with the fix in #30382). Regression tests verify denial for pairing/telegram-approved.json, pairing/discord-pending.json, and the directory itself, in both normal and profile-mode layouts. * feat: support numbered resume selection in cli and gateway * chore(release): map 490408354@qq.com to daizhonggeng (PR #9020) * i18n+tests: add list_item_numbered, list_footer_numbered, out_of_range for 15 locales The numbered /resume feature added new i18n keys to en.yaml; the catalog parity tests require every locale to carry matching keys and placeholders, so add translations to all 15 supported locales. Also unblock tests/cli/test_cli_resume_command.py: - _make_cli stub now sets self.resume_display = 'minimal' since _handle_resume_command (post-#31695) calls _display_resumed_history. - mock_db.resolve_resume_session_id returns the input id (no compression chain) so HERMES_SESSION_ID is set to a real string, not a MagicMock. * test(cli): update resume usage-hint assertion for numbered selection PR #9020's salvage changed the /resume list footer from 'Use /resume <session id or title> to continue.' to 'Use /resume <number>, /resume <session id>, or /resume <session title> to continue.\n Example: /resume 2'. test_resume_without_target_lists_recent_sessions still pinned the old string verbatim and failed in CI. Relax to substring assertions that allow both the new numbered footer and any future tweaks while still verifying the hint is shown. * fix(security): add missing credential paths to write denylist (#27217) The write denylist already protects SSH keys, AWS, GPG, npm, PyPI, Docker, Azure, and GitHub CLI credentials. Two common credential stores were missing: ~/.git-credentials stores plaintext git tokens in the format https://username:token@github.com when using git credential-store. It is directly analogous to ~/.netrc which was already protected. ~/.config/gcloud/ contains Google Cloud OAuth tokens and service account credentials. It is directly analogous to ~/.aws/ which was already protected. Under prompt injection, an agent could be instructed to overwrite these files, destroying credentials or planting malicious ones. Verified before and after with is_write_denied() on both paths. * fix(file-safety): deny reads of Google OAuth tokens (#30972) * fix(security): close TOCTOU window when saving Claude Code OAuth credentials (#21152) _write_claude_code_credentials wrote ~/.claude/.credentials.json via Path.write_text + replace + post-write chmod(0o600). Both the temp file and the destination briefly inherited the process umask (commonly 0o644 = world-readable) between create/replace and chmod, exposing the OAuth access/refresh tokens to other local users on multi-user hosts. Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR mode so the temp file is created atomically at 0o600. After os.replace, the destination inherits the temp's mode, so the post-write chmod is no longer needed. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Parent dir (~/.claude/) is owned by Claude Code itself and shared with its native auth, so we deliberately don't tighten its mode here (unlike the mcp_oauth fix which owns its own subtree under HERMES_HOME). Mirrors the fix shipped for agent/google_oauth.py in #19673 and the parallel fix for tools/mcp_oauth.py in #21148. Adds a regression test in TestWriteClaudeCodeCredentials asserting the resulting file mode is 0o600 (skipped on Windows where POSIX mode bits aren't enforced). * ci(supply-chain): anchor install-hook regex at repo root (#31744) The SETUP_HITS check matched any file ending in setup.py/setup.cfg/ sitecustomize.py/usercustomize.py at any path depth. This produced false positives on every PR touching hermes_cli/setup.py (the CLI setup wizard), which is unrelated to pip/site install hooks. Only the top-level setup.py/setup.cfg execute during 'pip install', and only top-level sitecustomize.py/usercustomize.py are auto-loaded by site.py at interpreter startup. Anchor the regex with '^' so only repo-root matches fire. Symptom: PR #30916 (Mattermost plugin migration) flagged purely because it deletes _setup_mattermost() from hermes_cli/setup.py. Discord migration (#30591) hit the same false positive yesterday. * fix(security): restrict write access to Anthropic OAuth credential store * chore(release): map kronexoi for PR #30553 salvage * Protect dashboard OAuth credentials with the same file-safety guarantees as other auth paths The web dashboard's Anthropic OAuth helper wrote the credential file straight to its final destination and relied on the process umask for permissions. That left the dashboard-specific path weaker than the existing auth writers, which already use owner-only permissions and safer write semantics. This change keeps the scope narrow: make the dashboard helper write via a temp file + replace, chmod the final file to owner-only, and add a focused regression test for both permission handling and atomic-write behavior. Constraint: Must preserve the existing dashboard OAuth flow and credential-pool side effects Rejected: Broader auth-storage refactor | unnecessary scope for a single verified inconsistency Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep dashboard credential writes aligned with existing auth storage semantics; do not reintroduce direct write_text() here without matching chmod/atomic behavior Tested: pytest -o addopts='' tests/hermes_cli/test_web_server_oauth_write.py tests/hermes_cli/test_web_server.py -q (78 passed) Not-tested: Cross-platform permission semantics on Windows-managed filesystems * fix(security): redact credentials before persistence in session capture Two-layer redaction at the persistence boundary so credentials never reach state.db, session_*.json, or compression: 1. agent/chat_completion_helpers.py :: build_assistant_message - Redact assistant content before the message dict is constructed (catches PATs / API keys the model inlines into natural language) - Redact tool_call.function.arguments at the same site (catches secrets inlined into tool args, e.g. terminal command=curl -H 'Authorization: ...') Tool execution uses the raw API response object, not this dict, so redacting the persisted shape is safe. 2. run_agent.py :: _save_session_log - Add _redact_message_content() static helper that handles both string content and OpenAI/Anthropic multimodal list-of-parts (image parts pass through untouched, only text/content fields are redacted) - Apply to every message + the cached system prompt before writing session_*.json Both layers respect HERMES_REDACT_SECRETS via redact_sensitive_text — no-op when disabled. Tests (TestSaveSessionLogRedactsSecrets, 4 cases): - api key in tool content - api key in user message - api key in system prompt - multimodal list-of-parts (image part preserved, text redacted) Tests use an autouse fixture to force _REDACT_ENABLED=True because the hermetic conftest defaults the env var to false. Salvaged from PR #24758 by @vgocoder (build_assistant_message + session_log) + PR #19855 by @liuhao1024 (multimodal list helper, system_prompt redaction). Kept only the redaction concern from #19855; its unrelated whatsapp npm timeout + PATCH_SCHEMA changes are out of scope and dropped. Refs #19798 (PAT leak via assistant inline mention), #19845 (session capture credential leak). Co-authored-by: liuhao1024 <liuhao03@bilibili.com> Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com> * chore(release): map vgocoder for PR #24758 salvage * fix(google_chat): harden oauth credential persistence with atomic private writes (#24788) * feat(tts): add register_tts_provider() plugin hook (closes #30398) Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398 * docs(plans): add s6-overlay supervision plan (v3) Replace tini with s6-overlay as PID 1 in the Hermes Docker image so that main hermes, the dashboard, and dynamically-created per-profile gateways all run as supervised services. Includes container-boot reconciliation (Task 4.0) so per-profile gateways survive docker restart. Plan history: - v1: 2026-05-07 — original design (subagent gateways scope) - v2: 2026-05-18 — re-validated, scope narrowed to per-profile gateways, WindowsServiceManager added to protocol - v3: 2026-05-21 — re-validated in docker_s6 worktree, install-method stamp preservation noted in Task 2.3, Task 4.0 added for container restart survival 12.5 engineering days estimated across 7 phases. * test(docker): add conftest fixtures for docker harness Task 0.1 of the s6-overlay supervision plan. Establishes the test infrastructure for tests/docker/: skip-on-missing-Docker collection hook, session-scoped image-build fixture (overridable via the HERMES_TEST_IMAGE env var for faster local iteration), and a container_name fixture that ensures cleanup on test exit. Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md * test(docker): lock baseline behavior for Phase 0 harness Tasks 0.2-0.6 of the s6-overlay supervision plan. Locks the user-visible behavior we must preserve through the Phase 2 init- system swap: - test_main_invocation.py (Task 0.2): docker run <image> with no args, chat subcommand passthrough, bare executable passthrough, bash pattern, exit-code propagation - test_tui_passthrough.py (Task 0.3): T…

…entials (NousResearch#21152) _write_claude_code_credentials wrote ~/.claude/.credentials.json via Path.write_text + replace + post-write chmod(0o600). Both the temp file and the destination briefly inherited the process umask (commonly 0o644 = world-readable) between create/replace and chmod, exposing the OAuth access/refresh tokens to other local users on multi-user hosts. Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR mode so the temp file is created atomically at 0o600. After os.replace, the destination inherits the temp's mode, so the post-write chmod is no longer needed. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Parent dir (~/.claude/) is owned by Claude Code itself and shared with its native auth, so we deliberately don't tighten its mode here (unlike the mcp_oauth fix which owns its own subtree under HERMES_HOME). Mirrors the fix shipped for agent/google_oauth.py in NousResearch#19673 and the parallel fix for tools/mcp_oauth.py in NousResearch#21148. Adds a regression test in TestWriteClaudeCodeCredentials asserting the resulting file mode is 0o600 (skipped on Windows where POSIX mode bits aren't enforced).

…entials (NousResearch#21152) _write_claude_code_credentials wrote ~/.claude/.credentials.json via Path.write_text + replace + post-write chmod(0o600). Both the temp file and the destination briefly inherited the process umask (commonly 0o644 = world-readable) between create/replace and chmod, exposing the OAuth access/refresh tokens to other local users on multi-user hosts. Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR mode so the temp file is created atomically at 0o600. After os.replace, the destination inherits the temp's mode, so the post-write chmod is no longer needed. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Parent dir (~/.claude/) is owned by Claude Code itself and shared with its native auth, so we deliberately don't tighten its mode here (unlike the mcp_oauth fix which owns its own subtree under HERMES_HOME). Mirrors the fix shipped for agent/google_oauth.py in NousResearch#19673 and the parallel fix for tools/mcp_oauth.py in NousResearch#21148. Adds a regression test in TestWriteClaudeCodeCredentials asserting the resulting file mode is 0o600 (skipped on Windows where POSIX mode bits aren't enforced). #AI commit#

…entials (NousResearch#21152) _write_claude_code_credentials wrote ~/.claude/.credentials.json via Path.write_text + replace + post-write chmod(0o600). Both the temp file and the destination briefly inherited the process umask (commonly 0o644 = world-readable) between create/replace and chmod, exposing the OAuth access/refresh tokens to other local users on multi-user hosts. Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR mode so the temp file is created atomically at 0o600. After os.replace, the destination inherits the temp's mode, so the post-write chmod is no longer needed. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Parent dir (~/.claude/) is owned by Claude Code itself and shared with its native auth, so we deliberately don't tighten its mode here (unlike the mcp_oauth fix which owns its own subtree under HERMES_HOME). Mirrors the fix shipped for agent/google_oauth.py in NousResearch#19673 and the parallel fix for tools/mcp_oauth.py in NousResearch#21148. Adds a regression test in TestWriteClaudeCodeCredentials asserting the resulting file mode is 0o600 (skipped on Windows where POSIX mode bits aren't enforced).

Gutslabs mentioned this pull request May 7, 2026

fix(security): close TOCTOU windows when saving credentials in hermes_cli/auth #21154

Closed

3 tasks

alt-glitch added type/security Security vulnerability or hardening P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder area/auth Authentication, OAuth, credential pools labels May 7, 2026

teknium1 merged commit 223a397 into NousResearch:main May 25, 2026
5 of 7 checks passed

teknium1 mentioned this pull request May 25, 2026

fix(security): salvage #30553 + #11004 OAuth file-safety hardening #31747

Merged

Haderach-Ram mentioned this pull request May 25, 2026

Ecosystem Digest — 2026-05-25 Haderach-Ram/openclaw-radar#18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): close TOCTOU window when saving Claude Code OAuth credentials#21152

fix(security): close TOCTOU window when saving Claude Code OAuth credentials#21152
teknium1 merged 1 commit into
NousResearch:mainfrom
Gutslabs:fix/anthropic-claude-creds-toctou

Gutslabs commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants