Skip to content

fix(web_server): propagate --insecure flag to WebSocket security guards#31779

Closed
pranvgarg wants to merge 2 commits into
NousResearch:mainfrom
pranvgarg:fix/insecure-flag-ws-propagation
Closed

fix(web_server): propagate --insecure flag to WebSocket security guards#31779
pranvgarg wants to merge 2 commits into
NousResearch:mainfrom
pranvgarg:fix/insecure-flag-ws-propagation

Conversation

@pranvgarg

Copy link
Copy Markdown

Problem

When the dashboard is bound to a non-loopback address using --insecure, all WebSocket connections to /api/pty, /api/ws, /api/pub, and /api/events fail with 403 Forbidden. HTTP endpoints work fine.

Root cause

start_server() accepts allow_public (the runtime value of --insecure) and uses it to skip the startup SystemExit guard — but never writes it to app.state:

app.state.bound_host = host
app.state.bound_port = port
# app.state.insecure is never set

Every getattr(app.state, "insecure", False) call in the runtime security guards therefore always returns False, making --insecure a no-op at request time.

The direct consequence: _ws_client_is_allowed() enforces loopback-only peer IPs unconditionally. When the server is bound to a non-loopback address, the connecting client also has a non-loopback IP, so every WebSocket upgrade is rejected. HTTP works because _is_accepted_host() resolves the host match from app.state.bound_host directly, which is correctly set.

Fix

  • Set app.state.insecure = allow_public in start_server().
  • Short-circuit _ws_client_is_allowed() when app.state.insecure is True. The operator has already accepted the security trade-off by passing --insecure; enforcing the loopback IP gate on top of that contradicts the intent of the flag.

Tests

Three new cases in TestInsecureMode:

Test What it checks
test_non_loopback_client_rejected_without_insecure Baseline — non-loopback IP is still blocked in normal mode
test_non_loopback_client_allowed_with_insecure Non-loopback IP passes when --insecure is active
test_start_server_writes_insecure_to_app_state start_server() correctly persists allow_public to app.state

@alt-glitch alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists labels May 25, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Resubmit of closed #31772 by the same author. Same root cause: start_server() never writes allow_public to app.state.insecure. Prior merged fix #18633 only handled 0.0.0.0/:: bind via _is_public_bind(), not specific non-loopback IPs (e.g. Tailscale).

pranvgarg added 2 commits May 29, 2026 16:34
start_server() accepts allow_public (the value of --insecure) and uses it
to skip the startup SystemExit guard, but never writes it to app.state.
As a result, every getattr(app.state, "insecure", False) call in the
runtime security guards always evaluates to False — making the flag a
no-op at request time.

The practical effect: when the dashboard is bound to a non-loopback
address via --insecure, _ws_client_is_allowed() still enforces the
loopback-only IP check and rejects every WebSocket upgrade with 403.
HTTP endpoints are unaffected because _is_accepted_host() resolves the
bound host match independently of app.state.insecure.

Fix:
- Write app.state.insecure = allow_public in start_server() so the flag
  is visible to all request handlers at runtime.
- Short-circuit _ws_client_is_allowed() when insecure mode is active.
  The operator has already accepted the security trade-off by passing
  --insecure; enforcing the loopback IP restriction on top of that is
  inconsistent with the intent of the flag.

Tests added to TestInsecureMode:
- Baseline: non-loopback client is blocked without --insecure.
- Non-loopback client is accepted when --insecure is active.
- start_server() correctly persists allow_public to app.state.insecure.
The original TestInsecureMode.test_start_server_writes_insecure_to_app_state
patched "hermes_cli.web_server.uvicorn.run", which fails because uvicorn
is imported lazily inside start_server() — there is no module-level
attribute named uvicorn on hermes_cli.web_server. Patch uvicorn.run on
the uvicorn module directly; the test assertions are unchanged.

Caught when rebasing onto v0.15.1 and running the test suite locally.
@pranvgarg pranvgarg force-pushed the fix/insecure-flag-ws-propagation branch from 20eb675 to a6443c7 Compare May 29, 2026 21:48
@pranvgarg

Copy link
Copy Markdown
Author

Force-pushed a rebase onto current main — the prior version was conflicting against the v0.15.1 restructure of _ws_client_is_allowed. All 14 tests in tests/hermes_cli/test_web_server_host_header.py pass locally, including the 3 new TestInsecureMode cases.

Relationship to other open PRs

This overlaps with #33278, #32614, and #25072 — all targeting the --insecure WebSocket gate. Brief comparison of the two competing approaches:

Bind This PR #33278 / #32614
--host 0.0.0.0 --insecure
--host :: --insecure
--host <specific-non-loopback> --insecure

#33278 / #32614 short-circuit on bound_host in {"0.0.0.0", "::"}, which mirrors the prior merged fix #18633 (later lost in refactor). That covers the wildcard bind but leaves --insecure as a no-op when the operator pins the dashboard to a specific interface. The semantic of --insecure is "operator has explicitly opted out of localhost-only mode" — that intent should hold regardless of which specific non-loopback address is bound.

This PR instead writes app.state.insecure = allow_public in start_server() and short-circuits _ws_client_is_allowed() on that flag. One logical guard covers all --insecure binds rather than a literal-host list that needs maintenance as new bind patterns appear.

User-visible symptom is tracked in #33265 (Chat WebSocket rejected, [session ended] on the embedded TUI).

Happy to close this in favor of any consolidating fix the maintainers prefer, or to have it superseded if a broader PR (e.g. #25072) lands first. Otherwise, would appreciate workflow approval to verify CI on the rebase.

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by PR #35141 (commit e8076c1), already on main. It solves the same all-interfaces insecure-bind case your PR targets — the merged version keys off app.state.bound_host in {"0.0.0.0", "::"} rather than a separate app.state.insecure flag, but the outcome is identical: non-loopback WS peers are accepted when the operator explicitly binds --host 0.0.0.0 --insecure.

Closing as a duplicate. Thanks for the fix and the thorough host-header test coverage.

@teknium1 teknium1 closed this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants