You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This preserves Hermes session/approval context when execute_code sandbox scripts call back into Hermes tools through the parent-process RPC bridge.
Root Cause
execute_code starts raw threading.Thread workers for its RPC paths:
the local Unix-domain-socket RPC server thread;
the remote file-polling RPC thread.
Raw Python threads start with an empty contextvars.Context. That means any ContextVar bound around the agent turn, including approval/session routing state, is not visible when the sandbox calls back into model_tools.handle_function_call(...).
For approval-sensitive flows, that is the wrong boundary: a sandboxed script can call hermes_tools.terminal(...), and the resulting terminal approval checks need the same session context as the agent turn that launched execute_code.
What Changed
Added a small _context_thread_target(...) helper in tools/code_execution_tool.py.
Wrapped both execute-code RPC thread targets in contextvars.copy_context().run(...).
Added regression coverage proving a sandbox tool call sees:
#30893 guards the execute_code entry point in gateway approval contexts: the script itself must be approved before it starts.
This PR is complementary. It keeps the already-running sandbox's RPC tool calls attached to the same turn context, so approval/session-sensitive tool dispatch inside execute_code does not fall back to missing or stale context.
Note: the focused/full execute_code pytest runs were executed outside my local Codex filesystem sandbox because the sandbox blocks Unix-domain-socket bind with PermissionError: [Errno 1] Operation not permitted. The same tests pass when run in the normal local environment.
Good catch — the ContextVar leakage through RPC threads is a real correctness issue.
I verified the two threading.Thread call sites at lines 908 and 1155 in code_execution_tool.py — both are RPC-handling threads (poll loop and server loop) that invoke handle_function_call, which depends on session context. The two other threads at lines 1307/1313 are stdout/stderr drain threads that don't call tool functions, so they correctly don't need the wrapper. The fix is well-scoped.
One note: _context_thread_target returns a closure that captures the context at call time. If execute_code is ever called from multiple threads concurrently (unlikely today but worth noting), each call site will snapshot its own context independently — which is the correct behavior.
The test properly validates both ContextVar propagation and session_key inheritance. LGTM.
Superseded by #34497 (merged). Same root cause you identified — raw RPC threads start with an empty contextvars.Context. The merged fix also restores the thread-local approval/sudo callbacks (not just the ContextVar) via a shared propagate_context_to_thread helper, so the CLI approval prompt reaches the user too. Thanks; credited in the salvage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
comp/agentCore agent loop, run_agent.py, prompt builderP2Medium — degraded but workaround existstool/code-execexecute_code sandboxtype/securitySecurity vulnerability or hardening
4 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Disclaimer: this is a fully-automated contribution trying to expand on the success of #30432.
Fixes #33057.
Summary
This preserves Hermes session/approval context when
execute_codesandbox scripts call back into Hermes tools through the parent-process RPC bridge.Root Cause
execute_codestarts rawthreading.Threadworkers for its RPC paths:Raw Python threads start with an empty
contextvars.Context. That means anyContextVarbound around the agent turn, including approval/session routing state, is not visible when the sandbox calls back intomodel_tools.handle_function_call(...).For approval-sensitive flows, that is the wrong boundary: a sandboxed script can call
hermes_tools.terminal(...), and the resulting terminal approval checks need the same session context as the agent turn that launchedexecute_code.What Changed
_context_thread_target(...)helper intools/code_execution_tool.py.contextvars.copy_context().run(...).ContextVar;tools.approval;task_id.Relationship To #30893
#30893 guards the
execute_codeentry point in gateway approval contexts: the script itself must be approved before it starts.This PR is complementary. It keeps the already-running sandbox's RPC tool calls attached to the same turn context, so approval/session-sensitive tool dispatch inside
execute_codedoes not fall back to missing or stale context.Validation
.venv/bin/ruff check tools/code_execution_tool.py tests/tools/test_code_execution.py.venv/bin/python -m py_compile tools/code_execution_tool.py tests/tools/test_code_execution.py.venv/bin/python -m pytest tests/tools/test_code_execution.py::TestExecuteCodeEdgeCases::test_rpc_thread_preserves_contextvars -q --tb=short.venv/bin/python -m pytest tests/tools/test_code_execution.py -q --tb=shortNote: the focused/full
execute_codepytest runs were executed outside my local Codex filesystem sandbox because the sandbox blocks Unix-domain-socket bind withPermissionError: [Errno 1] Operation not permitted. The same tests pass when run in the normal local environment.