Skip to content

[codex] Guard execute_code behind gateway approvals#30893

Closed
egilewski wants to merge 2 commits into
NousResearch:mainfrom
egilewski:codex/fix-gateway-execute-code-approval
Closed

[codex] Guard execute_code behind gateway approvals#30893
egilewski wants to merge 2 commits into
NousResearch:mainfrom
egilewski:codex/fix-gateway-execute-code-approval

Conversation

@egilewski

Copy link
Copy Markdown
Contributor

This is a fully-automated contribution trying to expand on the success of #30432.

Fixes #30882.

Summary

This closes the remaining execute_code bypass in gateway manual approval mode. Current main already has a blocking gateway approval queue for terminal commands, but execute_code can spawn a local Python child before any terminal guard sees the subprocess/os calls inside that script.

Root Cause

execute_code scripts run arbitrary Python. A script can call subprocess.run(...), os.system(...), ctypes, or other process/file APIs directly, so dangerous shell behavior can happen without going through terminal() and without matching DANGEROUS_PATTERNS as a shell command.

What Changed

  • Added check_execute_code_guard() in tools/approval.py.
  • Before local/SSH execute_code spawns the child process, gateway/ask contexts now submit a one-shot approval request through the existing blocking gateway approval queue.
  • User denial, timeout, missing notify callback, or guard failure all fail closed before the script runs.
  • approvals.mode: off and session/process YOLO still bypass this guard intentionally.
  • Container/cloud backends keep the existing approval behavior, matching terminal command approval's existing skip for isolated backends.
  • Cron sessions with approvals.cron_mode: deny now block local/SSH execute_code, because no user is present to approve arbitrary script execution.
  • Shared the gateway approval wait helper with the existing terminal approval path so approval waits continue feeding the inactivity watchdog.

Regression Coverage

Added tests/tools/test_code_execution.py coverage for:

  • gateway denial blocks execute_code before the child process is spawned, verified with a marker file that never appears;
  • one-shot gateway approval allows the script to continue and return normal output.

Validation

  • .venv/bin/python -m ruff check tools/approval.py tools/code_execution_tool.py tests/tools/test_code_execution.py
  • .venv/bin/python -m py_compile tools/approval.py tools/code_execution_tool.py tests/tools/test_code_execution.py
  • HOME=/tmp/hermes-test-home scripts/run_tests.sh tests/gateway/test_approve_deny_commands.py
  • HOME=/tmp/hermes-test-home scripts/run_tests.sh tests/tools/test_cron_approval_mode.py
  • HOME=/tmp/hermes-test-home .venv/bin/python -m pytest tests/tools/test_code_execution.py::TestExecuteCodeEdgeCases::test_gateway_execute_code_denial_blocks_child_process -q
  • HOME=/tmp/hermes-test-home .venv/bin/python -m pytest tests/tools/test_code_execution.py::TestExecuteCodeEdgeCases::test_gateway_execute_code_runs_after_one_shot_approval -q
  • HOME=/tmp/hermes-test-home scripts/run_tests.sh tests/tools/test_code_execution.py passed outside the local filesystem sandbox; inside the sandbox this file hit the existing UDS bind restriction (PermissionError: [Errno 1] Operation not permitted) across pre-existing execute_code tests.

Assumptions

  • The intended security contract for gateway manual approvals is fail-closed for local/SSH arbitrary code execution, because the generated script can bypass command-string inspection.
  • Docker, Singularity, Modal, Daytona, and Vercel Sandbox should retain the existing container/cloud approval behavior used by terminal commands.
  • A one-shot approval is the right scope for execute_code; /approve session or /approve always resolves the current wait but this guard does not persist a broad allowlist for future scripts.

@alt-glitch alt-glitch added type/security Security vulnerability or hardening comp/gateway Gateway runner, session dispatch, delivery tool/code-exec execute_code sandbox P1 High — major feature broken, no workaround labels May 23, 2026
@egilewski egilewski force-pushed the codex/fix-gateway-execute-code-approval branch from bf39ea6 to b3941fd Compare May 23, 2026 11:46
@egilewski egilewski marked this pull request as ready for review May 23, 2026 11:46
@egilewski

Copy link
Copy Markdown
Contributor Author

Follow-up CI check: current PR head 72ff96b16 is green.

The signed follow-up commit stabilized the two unrelated failures from the previous run:

  • tests/tools/test_browser_supervisor.py: Chrome startup failures now force-kill the process before skipping when CDP never becomes available.
  • tests/acp/test_server.py: ACP model-switch handoff tests now patch the ACP resolver directly, so unrelated provider registry state cannot shadow the requested provider.

Verification observed on GitHub Actions:

  • test (1) through test (6): success
  • ruff enforcement (blocking) and ruff + ty diff: success
  • Nix on Ubuntu and macOS: success
  • Docker build amd64 and arm64: success
  • Supply-chain, attribution, history, and e2e checks: success

Non-blocking note: the Docker workflow still emits GitHub's Node.js 20 action deprecation warning for the pinned docker/setup-buildx-action; it did not fail the run.

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #34497 (merged). The whole-script entry guard (check_execute_code_guard) is adapted from your approach — one-shot gateway approval before the child spawns, fail-closed on deny/timeout/missing-notify, container/cloud backends skipped, cron-deny blocks. Thanks; credited in the salvage.

#34497

@teknium1 teknium1 closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround tool/code-exec execute_code sandbox type/security Security vulnerability or hardening

Projects

None yet

3 participants