feat(gateway): add gateway.terminal_backend sandbox override for messaging sessions#7
feat(gateway): add gateway.terminal_backend sandbox override for messaging sessions#7dsr-restyn wants to merge 2 commits into
Conversation
dsr-restyn
left a comment
There was a problem hiding this comment.
Hermes Agent Code Review
Verdict: COMMENT — draft is solid; two real issues to fix before upstream submission, plus a few warnings.
🔴 Issues (fix before upstream)
1. Zombie containers on timeout (tools/execute_code_docker.py:43-47)
proc.kill() kills the Docker CLI client process, but the container itself keeps running inside Docker. This is a resource leak — long timeout scripts will accumulate orphaned containers. Fix: use docker run --name <uuid> and docker kill <uuid> on timeout.
2. container_id = None is declared but never used (tools/execute_code_docker.py:30)
Stale variable from incomplete kill logic (the intended approach was to capture the container ID for targeted kill). Remove it, or implement the container-ID-based kill properly.
⚠️ Warnings
3. Warning block re-reads config.yaml from disk (gateway/run.py:1040-1052)
The should_warn_insecure_gateway() call inside GatewayRunner.start() opens and parses config.yaml a second time, even though it was already read at module import as _cfg. Use _cfg (module-level) or pass the already-loaded config into GatewayRunner instead of re-reading from disk. If the file changes between startup and start(), the warning can see a different config than what was actually applied.
4. No validation of terminal_backend value (gateway/sandbox_config.py)
apply_gateway_backend_to_env() sets TERMINAL_ENV to whatever the config says, including typos. A value like "docer" would silently break all terminal execution. Add validation against the known set: {"local", "docker", "modal", "daytona", "ssh", "singularity"}.
💡 Suggestions
5. Docker path can't be interrupted by user messages (tools/code_execution_tool.py)
The local subprocess path checks _interrupt_event.is_set() in its poll loop. The Docker path uses proc.communicate(timeout=...) which blocks without checking the interrupt event. This means a user sending a message mid-execution won't cancel the Docker job. Acceptable for a first pass, but worth a docstring note and a follow-up issue.
6. No log message when gateway override is applied (gateway/sandbox_config.py)
apply_gateway_backend_to_env() sets TERMINAL_ENV silently. A logger.info("Gateway terminal backend override: %s (image: %s)", backend, image) would make startup logs clearly show that sandboxing is active.
✅ Looks Good
gateway/sandbox_config.pyis a clean, well-tested helper module — good separation of concerns- The if/else structure in
code_execution_tool.pycorrectly scopes the docker path; ANSI stripping and secret redaction still apply to both paths (they come after the if/else) - No collision with existing
gateway/config.pykeys — our new DEFAULT_CONFIGgateway:block is additive - 26 tests, good coverage of sandbox_config helpers and the docker subprocess wrapper
- Commit is atomic and well-scoped — no rewrites of working internals
Reviewed by Hermes Agent
|
|
||
| cmd.extend([image, "python3", script_path]) | ||
|
|
||
| container_id = None |
There was a problem hiding this comment.
🔴 Stale variable — container_id is declared here but never assigned or used. This is a leftover from planned container-kill-by-ID logic that wasn't completed. Either remove it or implement the intended pattern:
import uuid
container_name = f"hermes-sandbox-{uuid.uuid4().hex[:8]}"
cmd = ["docker", "run", "--name", container_name, "--rm", ...]
# then on timeout: subprocess.run(["docker", "kill", container_name])| except subprocess.TimeoutExpired: | ||
| logger.warning("Docker sandbox timed out after %s seconds, killing container", timeout) | ||
| # Try to kill via docker kill using the process | ||
| proc.kill() |
There was a problem hiding this comment.
🔴 Zombie container leak — proc.kill() terminates the Docker CLI process (the client), but the container itself keeps running on the Docker host. On a long timeout, orphaned containers accumulate.
Fix: use a named container so you can kill it:
container_name = f"hermes-sandbox-{uuid.uuid4().hex[:8]}"
# Add --name container_name to the docker run cmd
# Then on timeout:
subprocess.run(["docker", "kill", container_name], timeout=5, capture_output=True)
proc.kill()| # Warn if running with local backend (no sandbox isolation) | ||
| try: | ||
| from gateway.sandbox_config import should_warn_insecure_gateway | ||
| import yaml as _yaml |
There was a problem hiding this comment.
_cfg (around line 165). The two reads could see different values if the file changes between startup and .start(). Consider passing _cfg into GatewayRunner.__init__ or referencing the module-level variable:
if should_warn_insecure_gateway(_cfg): # use already-loaded config
logger.warning(...)|
|
||
|
|
||
| def apply_gateway_backend_to_env(config: dict) -> None: | ||
| """Set TERMINAL_ENV to gateway.terminal_backend if configured.""" |
There was a problem hiding this comment.
"docer") will be blindly set as TERMINAL_ENV and silently break all terminal tool execution for the session. Add:
_VALID_BACKENDS = {"local", "docker", "modal", "daytona", "ssh", "singularity"}
if backend not in _VALID_BACKENDS:
import logging
logging.getLogger(__name__).warning(
"Unknown gateway.terminal_backend %r — ignoring. Valid values: %s",
backend, ", ".join(sorted(_VALID_BACKENDS))
)
return| exit_code = _exit_code | ||
| status = "timeout" if _exit_code == -1 else "success" | ||
| else: | ||
| proc = subprocess.Popen( |
There was a problem hiding this comment.
💡 Missing interrupt support — the local path checks _interrupt_event.is_set() in the poll loop so a user message can cancel mid-run. The Docker path uses blocking proc.communicate(timeout=...) and has no interrupt check. Consider wrapping run_script_in_docker in a thread and polling _interrupt_event alongside it, or at minimum add a docstring note marking this as a known limitation.
8bd36cd to
2cd39e7
Compare
2cd39e7 to
e597c1b
Compare
…aging sessions Closes NousResearch#4281. When gateway.terminal_backend is set in config.yaml, all gateway sessions use that backend instead of the global terminal.backend. The execute_code tool also respects the Docker backend when TERMINAL_ENV=docker, running the Python subprocess inside the configured container. A startup warning is emitted when the gateway runs with the local backend and no gateway override is configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- execute_code_docker: use named containers for proper timeout kill (zombie fix) - execute_code_docker: remove stale container_id variable - execute_code_docker: document interrupt limitation in docstring - sandbox_config: validate terminal_backend against known backends before applying - sandbox_config: log info when gateway backend override is applied - gateway/run.py: use module-level _gateway_raw_cfg instead of re-reading config.yaml - tests: add assertions for named container, docker kill on timeout, invalid backend rejection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e597c1b to
b13061a
Compare
Summary
gateway.terminal_backendconfig key that overridesterminal.backendfor all gateway (Telegram, Discord, etc.) sessionsexecute_code— whenTERMINAL_ENV=docker, the Python subprocess runs inside the container via a UDS socket bind-mountgateway.sandbox_imageandgateway.sandbox_lifetimeconfig keysMotivation
Closes NousResearch#4281.
The dangerous-command approval system explicitly skips approval checks when a container backend is active (the container is the security boundary). This means running the gateway with
terminal.backend: localgives no sandbox isolation AND no approval gate. This PR makes it easy to enforce Docker sandboxing for all gateway sessions without affecting local CLI usage.Changes
gateway/sandbox_config.py(new): helpers —apply_gateway_backend_to_env(),should_warn_insecure_gateway(),get_gateway_terminal_backend()tools/execute_code_docker.py(new): runs theexecute_codechild subprocess inside Docker with UDS socket bind-mounted for tool RPCgateway/run.py: apply gateway backend override at startup; emit warning when running with local backendhermes_cli/config.py: addgateway.terminal_backend,gateway.sandbox_image,gateway.sandbox_lifetimedefaultshermes_cli/status.py: show gateway sandbox backend + image in/statusoutputtests/gateway/test_sandbox_config.py(new): 12 unit tests for sandbox config helperstests/tools/test_execute_code_docker.py(new): 14 unit tests for Docker subprocess wrapper and routing logicTest Plan
pytest tests/tools/test_execute_code_docker.py tests/gateway/test_sandbox_config.py)TERMINAL_ENV=dockerenv varUsage
Notes for Reviewers
execute_code_docker.pywrapper uses--network=hostso the UDS RPC socket path is identical inside and outside the containerTERMINAL_ENV=dockerANDTERMINAL_DOCKER_IMAGEare set — so local CLI sessions are never affectedgateway.sandbox_lifetimekey is stored in config but lifetime enforcement (container cleanup) is left for a follow-up (the per-session task_id already provides isolation via existing DockerEnvironment)