fix(security): restore approval/sudo context in execute_code RPC threads + guard entry points#34131
Conversation
Worker threads that dispatch Hermes tools started with an empty contextvars.Context and no thread-local approval/sudo callbacks. Add tools/thread_context.propagate_context_to_thread factoring that capture/install/clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern), and refactor agent/tool_executor onto it so the security-critical logic lives in one audited place. Update the contextvar-propagation source guard for the new call shape. Refs NousResearch#33057
…+ guard entry Wrap both execute_code RPC threads (local UDS + remote file-RPC) with propagate_context_to_thread so gateway sessions no longer fall into check_dangerous_command's non-interactive auto-approve branch and the CLI approval prompt stays reachable. Add check_execute_code_guard: one-shot fail-closed approval of the whole script in gateway/ask/cron-deny before the child spawns (skips isolated backends; command-string built only past the early returns). Drop the broad HERMES_ env passthrough for an explicit operational allowlist plus DSN/WEBHOOK secret substrings, and update the POSIX-equivalence oracle. Refs NousResearch#4146, NousResearch#27303, NousResearch#30882, NousResearch#33057
When approvals.mode=manual with security.tirith_enabled off and no auxiliary.approval model, dangerous commands and execute_code scripts can only be gated by live in-chat approval; with routing fixed they now fail closed (block) rather than silently auto-run. Surface that at startup so operators knowingly enable tirith or auxiliary.approval for unattended gateways. Refs NousResearch#30882
Cover context+callback propagation and teardown-clears, a source guard that both RPC threads stay wrapped, the check_execute_code_guard decision matrix (isolated backend, headless-local, cron-deny, gateway approve/deny/timeout/missing-notify, smart mode, session-yolo), the env-scrub allowlist/secret rules, and a behavioral test that execute_code() blocks before spawning on denial. Refs NousResearch#4146, NousResearch#27303, NousResearch#30882, NousResearch#33057
|
Closing as redundant — the fix this PR implements has already landed on main via a parallel contribution, so there's nothing left to merge here. Your diagnosis was exactly right: the It landed across two commits, both by @firefly:
That closed #4146, #27303, #30882, and #33057. Your PR is now This was a genuine multi-contributor race on a real P0, and your independent diagnosis confirming the same root cause and fix direction was valuable corroboration. Thank you for the thorough writeup — sorry it got pipped at the post by a parallel PR. For visibility, I've also documented the one user-facing behavior change that came with the env-scrub hardening (the narrowed |
Closes #4146 · Closes #27303 · Closes #33057 · Closes #30882 · supersedes #27304, #30893, #33246
execute_code's RPC worker threads were rawthreading.Threads — they began with an empty context and dropped both theHERMES_SESSION_PLATFORMContextVar (routing gateway sessions intocheck_dangerous_command's non-interactive auto-approve branch) and the thread-local approval/sudo callbacks (so the CLI prompt couldn't reach the user). Dangerous commands a sandbox script issued therefore ran without approval.The approval gate is a documented heuristic in
SECURITY.md, not a security boundary; this restores its intended behavior across execution entry points — it doesn't add one.Fixes
terminal()skips the approval guardterminalstays in the allowlistHERMES_*env passthrough leaks config + unguardedterminal()How
tools/thread_context.py— newpropagate_context_to_threadfactoring the ContextVar + thread-local approval/sudo capture→install→clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern). Applied to both execute_code RPC threads;agent/tool_executor.pyrefactored onto it (one audited copy, not three). Strictly more complete than [codex] Preserve context in execute_code RPC threads #33246, which propagated the ContextVar only and would leave the CLI callbacks unrestored.check_execute_code_guard(adapted from [codex] Guard execute_code behind gateway approvals #30893) — one-shot whole-script approval in gateway/ask/cron-deny before the child spawns; fail-closed on deny/timeout/missing-notify; skips isolated container backends.HERMES_prefix → explicit operational allowlist (HERMES_HOME/PROFILE/CONFIG/ENV) plusDSN/WEBHOOKsecret substrings. Skill/configenv_passthroughvars remain an intentional opt-in (checked first).approvals.mode=manualwith neither tirith norauxiliary.approvalconfigured.Limitation
Purely-local, non-interactive, non-gateway sessions still auto-run code — there is no operator surface to prompt against, matching the existing
terminalauto-approve contract; the hardline floor still blocks catastrophic patterns. This makes the gateway case consistent with interactive sessions; it does not make headless-local fail-closed (see below).Out of scope
tirithas default when a gateway platform is activeNotes
terminalstays in the sandbox allowlist and is guarded, not removed.Validation
451 passed / 3 skipped across the cluster suites; ruff clean. Repro (#30882): a gateway session whose script ran
terminal("<dangerous>")was silently auto-approved before this change; now it routes to gateway approval and blocks on deny/timeout.