[codex] Guard execute_code behind gateway approvals#30893
Closed
egilewski wants to merge 2 commits into
Closed
Conversation
bf39ea6 to
b3941fd
Compare
Contributor
Author
|
Follow-up CI check: current PR head The signed follow-up commit stabilized the two unrelated failures from the previous run:
Verification observed on GitHub Actions:
Non-blocking note: the Docker workflow still emits GitHub's Node.js 20 action deprecation warning for the pinned |
This was referenced May 26, 2026
Contributor
|
Superseded by #34497 (merged). The whole-script entry guard ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a fully-automated contribution trying to expand on the success of #30432.
Fixes #30882.
Summary
This closes the remaining
execute_codebypass in gateway manual approval mode. Currentmainalready has a blocking gateway approval queue for terminal commands, butexecute_codecan spawn a local Python child before any terminal guard sees the subprocess/os calls inside that script.Root Cause
execute_codescripts run arbitrary Python. A script can callsubprocess.run(...),os.system(...),ctypes, or other process/file APIs directly, so dangerous shell behavior can happen without going throughterminal()and without matchingDANGEROUS_PATTERNSas a shell command.What Changed
check_execute_code_guard()intools/approval.py.execute_codespawns the child process, gateway/ask contexts now submit a one-shot approval request through the existing blocking gateway approval queue.approvals.mode: offand session/process YOLO still bypass this guard intentionally.approvals.cron_mode: denynow block local/SSHexecute_code, because no user is present to approve arbitrary script execution.Regression Coverage
Added
tests/tools/test_code_execution.pycoverage for:execute_codebefore the child process is spawned, verified with a marker file that never appears;Validation
.venv/bin/python -m ruff check tools/approval.py tools/code_execution_tool.py tests/tools/test_code_execution.py.venv/bin/python -m py_compile tools/approval.py tools/code_execution_tool.py tests/tools/test_code_execution.pyHOME=/tmp/hermes-test-home scripts/run_tests.sh tests/gateway/test_approve_deny_commands.pyHOME=/tmp/hermes-test-home scripts/run_tests.sh tests/tools/test_cron_approval_mode.pyHOME=/tmp/hermes-test-home .venv/bin/python -m pytest tests/tools/test_code_execution.py::TestExecuteCodeEdgeCases::test_gateway_execute_code_denial_blocks_child_process -qHOME=/tmp/hermes-test-home .venv/bin/python -m pytest tests/tools/test_code_execution.py::TestExecuteCodeEdgeCases::test_gateway_execute_code_runs_after_one_shot_approval -qHOME=/tmp/hermes-test-home scripts/run_tests.sh tests/tools/test_code_execution.pypassed outside the local filesystem sandbox; inside the sandbox this file hit the existing UDS bind restriction (PermissionError: [Errno 1] Operation not permitted) across pre-existing execute_code tests.Assumptions
execute_code;/approve sessionor/approve alwaysresolves the current wait but this guard does not persist a broad allowlist for future scripts.