Skip to content

fix(security): fail-closed when client_host is empty in WebSocket localhost check#15544

Open
memosr wants to merge 1 commit into
NousResearch:mainfrom
memosr:fix/tui-gateway-ws-localhost-check
Open

fix(security): fail-closed when client_host is empty in WebSocket localhost check#15544
memosr wants to merge 1 commit into
NousResearch:mainfrom
memosr:fix/tui-gateway-ws-localhost-check

Conversation

@memosr

@memosr memosr commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Four WebSocket endpoints in hermes_cli/web_server.py had a fail-open
localhost check:

# Before (vulnerable)
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
    await ws.close(code=4403)
    return

The if client_host and ... short-circuits when client_host is an
empty string. This happens when:

  • WebSocket arrives via a reverse proxy (nginx, Caddy, Cloudflare)
    that strips the original client info
  • Connection metadata is unavailable for any reason

In those cases the localhost check is silently bypassed, allowing
remote attackers to connect to endpoints that should be localhost-only:

  • /api/pty — terminal access
  • /api/ws — JSON-RPC sidecar (full TUI gateway access)
  • /api/pub — publish channel
  • /api/events — event stream

Fix

Made the check fail-closed: if client_host is empty, reject the
connection rather than allowing it through:

# After (safe)
if not client_host or client_host not in _LOOPBACK_HOSTS:
    await ws.close(code=4403)
    return

Applied to all 4 affected WebSocket endpoints.

Type of Change

  • 🔒 Security fix (HIGH — fail-open WebSocket auth)

Checklist

  • Read the Contributing Guide
  • Commit messages follow Conventional Commits
  • Fail-closed default — safer when client metadata is unavailable
  • Consistent with the fail-closed pattern used in other security checks
  • No behavior change for legitimate localhost connections

@alt-glitch alt-glitch added type/security Security vulnerability or hardening P0 Critical — data loss, security, crash loop comp/cli CLI entry point, hermes_cli/, setup wizard labels Apr 25, 2026
Cyrene963 pushed a commit to Cyrene963/hermes-agent that referenced this pull request May 4, 2026
… is empty

When client_host is empty (proxy/unknown), reject the WebSocket connection
rather than silently allowing it. This prevents potential SSRF via
WebSocket connections when the client host cannot be determined.

Refs: PR NousResearch#15544
@egilewski

Copy link
Copy Markdown
Contributor

I recommend rework before merge. The security intent is valid: on current main (899ee8c23dfd029fdfd7b669ac3ac82fd8388f55), a synthetic loopback-mode WebSocket with missing or empty ws.client.host still returns allowed from _ws_client_is_allowed, while the PR replay on its base rejects missing/empty peers in /api/pty, /api/ws, /api/pub, and /api/events.

Checked PR head 196b47104ad51afaec938064ede5c0ffbcfa6bce. Focused validation used synthetic WebSocket-shaped objects only, plus python -m pytest -o addopts='' -p no:cacheprovider tests/hermes_cli/test_dashboard_auth_ws_auth.py -q on current main (45 passed) and python -m compileall -q hermes_cli/web_server.py on both current main and the PR replay. CodeRabbit on the uncommitted replay against current main reported No findings.

The blocker is current-main drift: git apply --check for this patch fails at hermes_cli/web_server.py:2390, and current main now routes these checks through _ws_client_is_allowed / _ws_request_is_allowed. Please rebase and port the fail-closed empty-peer hardening into that helper flow so all four WebSocket endpoints inherit it consistently.

Signed: GPT-5.5-xhigh in Codex

Per @egilewski's audit on this PR (NousResearch#15544), the original fix was
correct but the file has refactored since: the four endpoint-local
empty-peer checks have been consolidated into _ws_client_is_allowed
and _ws_client_reason, but the helpers were left fail-open ('no peer
host known means allow' / 'no reason to block').

On a loopback-bound dashboard with auth disabled, an ASGI server
behind a misconfigured proxy or a unix-socket transport can deliver
ws.client == None or ws.client.host == ''. The helpers were treating
that as 'allowed', so the loopback-only peer gate could be bypassed
by anything that suppressed the client tuple in transit. All four
WebSocket endpoints (/api/pty, /api/ws, /api/pub, /api/events) route
through _ws_request_is_allowed -> _ws_client_is_allowed, so the gap
applied uniformly.

Fix:

* _ws_client_is_allowed: return False when client_host is empty
  instead of True. Only reached on loopback bind with auth disabled
  (auth_required=True and explicit non-loopback binds short-circuit
  earlier), so the fail-closed behavior is scoped to the surface
  that needs it.

* _ws_client_reason: return a 'missing_or_empty_peer bound=...'
  block reason instead of None, so the dispatcher's existing
  reason-based rejection path picks it up and the close gets logged
  with a machine-parseable token for diagnosability.

Behavior unchanged for:

* gated mode (auth_required=True) — early-returns True before the
  empty-peer check runs. The OAuth ticket is the auth at that point.
* explicit non-loopback bind (--host 0.0.0.0/::, or a specific LAN
  address, always with --insecure) — early-returns True before the
  empty-peer check runs. DNS-rebinding is still blocked by the
  Host/Origin guard in _ws_host_origin_is_allowed.
* legitimate loopback peers (client_host == '127.0.0.1' / '::1') —
  not affected by the empty-peer branch.

Regression tests added in tests/hermes_cli/test_dashboard_auth_ws_auth.py:

* test_empty_client_host_rejected_in_loopback_mode
* test_missing_client_object_rejected_in_loopback_mode
* test_empty_client_host_reason_is_block

Plus two regression guards to ensure the fix does not over-reach:

* test_empty_client_host_still_allowed_in_insecure_public_mode
* test_empty_client_host_still_allowed_in_gated_mode

All three new fail-closed tests fail without this patch (the helpers
return True / None for an empty peer) and pass with it. The 45
pre-existing tests in test_dashboard_auth_ws_auth.py continue to pass.
@memosr memosr force-pushed the fix/tui-gateway-ws-localhost-check branch from 196b471 to 225ac49 Compare June 5, 2026 13:45
memosr added a commit to memosr/hermes-agent that referenced this pull request Jun 5, 2026
…ded in TUI shell.exec

tui_gateway/server.py exposes a shell.exec JSON-RPC method that
runs shell commands via subprocess.run(shell=True). It tries to
block dangerous commands using tools.approval.detect_dangerous_command,
but on ImportError silently falls through to executing the command:

    try:
        from tools.approval import detect_dangerous_command
        is_dangerous, _, desc = detect_dangerous_command(cmd)
        if is_dangerous:
            return _err(rid, 4005, f'blocked: {desc}...')
    except ImportError:
        pass    # <-- silently disables the guard, then runs anyway

If tools.approval cannot be imported (missing module, circular
import, syntax error in a dependency, partial deploy), the screener
is skipped and arbitrary shell commands reach subprocess.run.

Fix:

* Replace the silent ImportError pass with a fail-closed return:
  error code 4006 with message 'dangerous-command guard unavailable;
  refusing to execute'. Consistent with the fail-closed pattern
  applied to the cron SSRF check in NousResearch#10180 and the WebSocket
  empty-peer guard in NousResearch#15544.

No behavior change when tools.approval is available -- the existing
detect_dangerous_command path runs unchanged, dangerous commands
still return 4005, and safe commands still flow through to
subprocess.run.

Regression tests added in tests/test_tui_gateway_server.py:

* test_shell_exec_fails_closed_when_guard_unavailable -- injects an
  ImportError for tools.approval via a patched builtins.__import__
  and a tripwire on subprocess.run; asserts the response is error
  4006 with 'guard unavailable' in the message and that
  subprocess.run is never reached.

* test_shell_exec_still_runs_when_guard_available -- guards against
  over-reach: with tools.approval present and the command not
  flagged dangerous, shell.exec must still execute. Without this
  the fail-closed change could regress into 'refuses everything'.

Verified locally that the fail-closed test fails without this
patch (the response carries error code 5003 from the tripwire-raised
AssertionError inside subprocess.run, proving the silent-skip path
was reached) and passes with the fix in place. The 205 pre-existing
tests in test_tui_gateway_server.py are unaffected by this change.

Per @egilewski's audit on this PR (NousResearch#15542), the original April patch
targeted the same fail-open path; the file has refactored heavily
since (tui_gateway/server.py is now ~8k lines vs ~4.6k at the
original PR head), so this commit re-applies the fail-closed
behavior at the current shell.exec call site and adds the
regression coverage that the original PR was missing.
@egilewski

egilewski commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Looks mergeable from this re-review.

Checked current upstream/main c3055d61857751ad82a2bf9e4f5de5d26a8f2a16 against PR head 225ac495637a6c1aad94e52c45b5b6dc9ecb1e10. The baseline still allows empty/missing ws.client in loopback/no-auth mode, while the PR head rejects both with missing_or_empty_peer and keeps loopback, explicit --insecure public bind, gated mode, and Host mismatch behavior intact.

Validation:

  • git diff --check upstream/main...HEAD: pass.
  • git merge-tree --write-tree upstream/main HEAD: pass.
  • /home/mac/hermes-agent/.venv/bin/python -m compileall -q hermes_cli/web_server.py tests/hermes_cli/test_dashboard_auth_ws_auth.py: pass.
  • /home/mac/hermes-agent/.venv/bin/python -m pytest -o addopts="" -p no:cacheprovider tests/hermes_cli/test_dashboard_auth_ws_auth.py -q: 50 passed.
  • GitHub currently reports mergeable=MERGEABLE / mergeStateStatus=CLEAN, with required checks successful or skipped.

Residual note: CodeRabbit could not complete because the service reported its review limit was reached.

Signed: GPT-5.5-xhigh in Codex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P0 Critical — data loss, security, crash loop type/security Security vulnerability or hardening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants