fix(security): fail-closed when client_host is empty in WebSocket localhost check#15544
fix(security): fail-closed when client_host is empty in WebSocket localhost check#15544memosr wants to merge 1 commit into
Conversation
… is empty When client_host is empty (proxy/unknown), reject the WebSocket connection rather than silently allowing it. This prevents potential SSRF via WebSocket connections when the client host cannot be determined. Refs: PR NousResearch#15544
|
I recommend rework before merge. The security intent is valid: on current Checked PR head The blocker is current-main drift: Signed: GPT-5.5-xhigh in Codex |
Per @egilewski's audit on this PR (NousResearch#15544), the original fix was correct but the file has refactored since: the four endpoint-local empty-peer checks have been consolidated into _ws_client_is_allowed and _ws_client_reason, but the helpers were left fail-open ('no peer host known means allow' / 'no reason to block'). On a loopback-bound dashboard with auth disabled, an ASGI server behind a misconfigured proxy or a unix-socket transport can deliver ws.client == None or ws.client.host == ''. The helpers were treating that as 'allowed', so the loopback-only peer gate could be bypassed by anything that suppressed the client tuple in transit. All four WebSocket endpoints (/api/pty, /api/ws, /api/pub, /api/events) route through _ws_request_is_allowed -> _ws_client_is_allowed, so the gap applied uniformly. Fix: * _ws_client_is_allowed: return False when client_host is empty instead of True. Only reached on loopback bind with auth disabled (auth_required=True and explicit non-loopback binds short-circuit earlier), so the fail-closed behavior is scoped to the surface that needs it. * _ws_client_reason: return a 'missing_or_empty_peer bound=...' block reason instead of None, so the dispatcher's existing reason-based rejection path picks it up and the close gets logged with a machine-parseable token for diagnosability. Behavior unchanged for: * gated mode (auth_required=True) — early-returns True before the empty-peer check runs. The OAuth ticket is the auth at that point. * explicit non-loopback bind (--host 0.0.0.0/::, or a specific LAN address, always with --insecure) — early-returns True before the empty-peer check runs. DNS-rebinding is still blocked by the Host/Origin guard in _ws_host_origin_is_allowed. * legitimate loopback peers (client_host == '127.0.0.1' / '::1') — not affected by the empty-peer branch. Regression tests added in tests/hermes_cli/test_dashboard_auth_ws_auth.py: * test_empty_client_host_rejected_in_loopback_mode * test_missing_client_object_rejected_in_loopback_mode * test_empty_client_host_reason_is_block Plus two regression guards to ensure the fix does not over-reach: * test_empty_client_host_still_allowed_in_insecure_public_mode * test_empty_client_host_still_allowed_in_gated_mode All three new fail-closed tests fail without this patch (the helpers return True / None for an empty peer) and pass with it. The 45 pre-existing tests in test_dashboard_auth_ws_auth.py continue to pass.
196b471 to
225ac49
Compare
…ded in TUI shell.exec
tui_gateway/server.py exposes a shell.exec JSON-RPC method that
runs shell commands via subprocess.run(shell=True). It tries to
block dangerous commands using tools.approval.detect_dangerous_command,
but on ImportError silently falls through to executing the command:
try:
from tools.approval import detect_dangerous_command
is_dangerous, _, desc = detect_dangerous_command(cmd)
if is_dangerous:
return _err(rid, 4005, f'blocked: {desc}...')
except ImportError:
pass # <-- silently disables the guard, then runs anyway
If tools.approval cannot be imported (missing module, circular
import, syntax error in a dependency, partial deploy), the screener
is skipped and arbitrary shell commands reach subprocess.run.
Fix:
* Replace the silent ImportError pass with a fail-closed return:
error code 4006 with message 'dangerous-command guard unavailable;
refusing to execute'. Consistent with the fail-closed pattern
applied to the cron SSRF check in NousResearch#10180 and the WebSocket
empty-peer guard in NousResearch#15544.
No behavior change when tools.approval is available -- the existing
detect_dangerous_command path runs unchanged, dangerous commands
still return 4005, and safe commands still flow through to
subprocess.run.
Regression tests added in tests/test_tui_gateway_server.py:
* test_shell_exec_fails_closed_when_guard_unavailable -- injects an
ImportError for tools.approval via a patched builtins.__import__
and a tripwire on subprocess.run; asserts the response is error
4006 with 'guard unavailable' in the message and that
subprocess.run is never reached.
* test_shell_exec_still_runs_when_guard_available -- guards against
over-reach: with tools.approval present and the command not
flagged dangerous, shell.exec must still execute. Without this
the fail-closed change could regress into 'refuses everything'.
Verified locally that the fail-closed test fails without this
patch (the response carries error code 5003 from the tripwire-raised
AssertionError inside subprocess.run, proving the silent-skip path
was reached) and passes with the fix in place. The 205 pre-existing
tests in test_tui_gateway_server.py are unaffected by this change.
Per @egilewski's audit on this PR (NousResearch#15542), the original April patch
targeted the same fail-open path; the file has refactored heavily
since (tui_gateway/server.py is now ~8k lines vs ~4.6k at the
original PR head), so this commit re-applies the fail-closed
behavior at the current shell.exec call site and adds the
regression coverage that the original PR was missing.
|
Looks mergeable from this re-review. Checked current Validation:
Residual note: CodeRabbit could not complete because the service reported its review limit was reached. Signed: GPT-5.5-xhigh in Codex |
What does this PR do?
Four WebSocket endpoints in
hermes_cli/web_server.pyhad a fail-openlocalhost check:
The
if client_host and ...short-circuits whenclient_hostis anempty string. This happens when:
that strips the original client info
In those cases the localhost check is silently bypassed, allowing
remote attackers to connect to endpoints that should be localhost-only:
/api/pty— terminal access/api/ws— JSON-RPC sidecar (full TUI gateway access)/api/pub— publish channel/api/events— event streamFix
Made the check fail-closed: if
client_hostis empty, reject theconnection rather than allowing it through:
Applied to all 4 affected WebSocket endpoints.
Type of Change
Checklist