Skip to content

fix(dashboard): validate WebSocket proxy peers and origins#20136

Open
MrJPlayGround wants to merge 2 commits into
NousResearch:mainfrom
MrJPlayGround:fix/dashboard-trusted-proxy-hosts
Open

fix(dashboard): validate WebSocket proxy peers and origins#20136
MrJPlayGround wants to merge 2 commits into
NousResearch:mainfrom
MrJPlayGround:fix/dashboard-trusted-proxy-hosts

Conversation

@MrJPlayGround

@MrJPlayGround MrJPlayGround commented May 5, 2026

Copy link
Copy Markdown

Summary

Narrows this PR to the dashboard WebSocket hardening portion of the original change, leaving the broader HTTP Host-header allowlist overlap for separate review / the narrower competing PRs.

  • Add configurable dashboard WebSocket peer allowlisting via dashboard.trusted_proxy_hosts.
  • Support exact host/IP values and CIDR ranges for trusted WebSocket peers.
  • Keep defaults loopback-only.
  • Validate WebSocket Host and Origin headers for /api/pty, /api/ws, /api/pub, and /api/events.
  • Add tests for default rejection, trusted proxy host/CIDR matching, Origin rejection, and configured reverse-proxy WS access.

Motivation

The dashboard can be safely exposed inside a private reverse-proxy/tailnet setup by keeping Hermes bound to loopback and explicitly trusting the reverse proxy for dashboard WebSocket upgrades.

The previous version bundled this with a broader HTTP Host-header allowlist. That overlapped with narrower Host-header PRs, so this version focuses on the WebSocket peer / Origin validation security boundary.

Test plan

scripts/run_tests.sh tests/hermes_cli/test_web_server.py -q

Result:

149 passed

@StartupBros

Copy link
Copy Markdown

Hey @MrJPlayGround — the duplicate-detector bot grouped several dashboard-host PRs together; wanted to acknowledge this one specifically because it's the most ambitious scope of the bunch and IMO mostly orthogonal to the others.

The Host-header allowlist piece overlaps with the narrower env-only PRs (#29195, #25173, #27113, #28578, #28954) — but the WebSocket peer allowlist (allowed_ws_clients with CIDR) and Origin-header validation are work nobody else is tackling. Those feel like they deserve their own PR rather than being grouped with the Host-header allowlist, which is a smaller change.

If you wanted to split:

  • PR1: Host-header allowlist via config — competes with the env-only PRs, maintainer picks one.
  • PR2: WebSocket peer / Origin validation — much harder to land bundled, much easier to land focused.

(I opened #29195 for the env-only narrow case with loopback-only placement + tests. Not trying to displace this PR's scope — just sharing in case splitting helps land at least the WebSocket-peer piece.)

@MrJPlayGround

Copy link
Copy Markdown
Author

Thanks, this is helpful context.

I agree the Host-header allowlist overlaps with the narrower env-only PRs, while the WebSocket peer allowlist / CIDR support and Origin validation are a separate concern.

I’ll take a look at splitting this so the maintainer can evaluate:

  • the Host-header config piece separately from the existing narrower PRs
  • the WebSocket peer + Origin validation as a focused security hardening PR

Appreciate you calling out the overlap without treating the whole PR as duplicate.

@MrJPlayGround MrJPlayGround force-pushed the fix/dashboard-trusted-proxy-hosts branch from 551a5d7 to 25cb2cc Compare May 22, 2026 10:30
@MrJPlayGround MrJPlayGround changed the title fix(dashboard): support trusted reverse proxy hosts fix(dashboard): validate WebSocket proxy peers and origins May 22, 2026
@MrJPlayGround

Copy link
Copy Markdown
Author

Thanks, this was good guidance.

I split the scope and force-pushed this PR down to the WebSocket peer / Origin validation piece only:

  • dashboard.trusted_proxy_hosts for configured WS peers
  • exact host/IP + CIDR matching
  • Host/Origin validation on /api/pty, /api/ws, /api/pub, and /api/events
  • tests for default rejection, configured proxy access, CIDR matching, and Origin rejection

I intentionally left the broader HTTP Host-header allowlist out of this PR now, since that overlaps with the narrower Host-header PRs you called out.

Tested with:

scripts/run_tests.sh tests/hermes_cli/test_web_server.py -q

Result: 149 passed.

@pxdsgnco

Copy link
Copy Markdown
Contributor

Adding a real-world deployment datapoint here in case it helps evaluation.

We hit this class of failure on a Hermes Dashboard deployment that is intentionally tailnet-only and served through Tailscale Serve. Hermes was bound to an explicit tailnet hostname/IP path rather than exposed publicly.

Observed failure mode before the fix:

  • /api/events disconnected
  • /api/ws disconnected
  • /api/pty disconnected
  • the dashboard page itself could still load, but live chat/tool-call WebSocket behaviour broke

The important part was that the breakage showed up in the actual browser/Tailscale Serve path, not just in a synthetic localhost-only setup.

After applying the narrower bound-host peer acceptance fix locally, we verified successful WebSocket upgrades end-to-end from the tailnet path:

  • /api/events -> 101 Switching Protocols
  • /api/ws -> 101 Switching Protocols
  • /api/pty -> 101 Switching Protocols

So from an operator standpoint, there is definitely a real deployment need here for reverse-proxy / tailnet WebSocket handling beyond the current loopback-only peer assumption.

Not trying to bikeshed the exact implementation shape — just confirming that this problem is reproducible in practice and worth landing in some upstream form.

…ed-proxy-hosts

# Conflicts:
#	hermes_cli/web_server.py
#	tests/hermes_cli/test_web_server.py
@thunderchick3n

Copy link
Copy Markdown

Cross-linking from related PRs

I have a tested --allowed-hosts CLI-flag implementation of the HTTP Host-header allowlist portion of this broader change. Tracking the cluster: #27113, #25173, #28954, #29195, #31304 (closed), #32109 (closed), #18415 (closed), #34390.

My patch (local, not pushed):

  • hermes_cli/web_server.py — adds allowed_hosts: Optional[Iterable[str]] to _is_accepted_host() and start_server(). HTTP host_header_middleware and WebSocket _ws_host_origin_reason() both honor it. The allowlist is loopback-specific: the DNS-rebinding defense is preserved for 0.0.0.0 binds, explicit non-loopback binds, and lookalike subdomains (evil.ts.net) / suffix tricks (host.ts.net.attacker.test).
  • hermes_cli/main.py — adds --allowed-hosts CLI flag (comma-separated) to dashboard_parser, parsed in cmd_dashboard.
  • tests/hermes_cli/test_web_server_host_header.py — 4 new unit tests: positive accept, lookalike-subdomain reject, 0.0.0.0 ignore, explicit-non-loopback ignore. All 15 tests in the file pass.

Why I didn't open a competing PR:

  1. Two prior CLI-flag PRs (fix(dashboard): allow explicit proxy hostnames #31304, Allow reverse-proxied hostnames via dashboard --trusted-host allowlist #32109) were closed without merging.
  2. This PR (fix(dashboard): validate WebSocket proxy peers and origins #20136) is the canonical config-based approach; the maintainers' apparent preferred shape is a config setting, not a CLI flag.

What I'd offer:

  1. If you want to expand this PR to cover the HTTP Host-header piece, my diff is ~125 lines across 3 files and uses the same _is_accepted_host() helper you're already touching. I can produce a unified diff against this branch.
  2. If you want me to add a dashboard.allowed_hosts config-setting variant to match the maintainers' preferred shape, I'm happy to do that too.
  3. Otherwise, I'll stay out of the way and let this PR land as-is.

Let me know which (or none).

@MrJPlayGround MrJPlayGround force-pushed the fix/dashboard-trusted-proxy-hosts branch from e9668c9 to cd27694 Compare June 5, 2026 10:07
@MrJPlayGround

Copy link
Copy Markdown
Author

@thunderchick3n
Thanks for mapping the related PRs and for not opening a competing one.

I agree with your read: based on the earlier closed CLI-flag PRs, I think the safer shape here is config-based rather than another dashboard CLI flag. This PR is intentionally scoped to the WebSocket peer/origin validation piece, but the HTTP Host-header allowlist is clearly adjacent and probably wants to share the same helper semantics.

If you’re willing, the most useful next step would be the dashboard.allowed_hosts config-setting variant against this branch. I’d prefer that over --allowed-hosts, mainly to keep the dashboard exposure policy in one config surface and avoid adding another CLI-only path.

A unified diff is totally fine. I can either fold it into this PR if it stays small and keeps the current DNS-rebinding behavior intact, or we can split it into a follow-up if maintainers want this PR to remain WebSocket-only.

The constraints I’d want to preserve are exactly the ones you called out:

  • default loopback-only behavior remains unchanged
  • 0.0.0.0 / explicit non-loopback binds do not become broadly trusted just because an allowlist exists
  • lookalike/suffix tricks like evil.ts.net and host.ts.net.attacker.test stay rejected
  • HTTP and WebSocket validation share the same host acceptance semantics where possible

So: yes, please send the config-setting variant/diff if you have it handy.

@tmchow

tmchow commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Adding another real-world Tailscale Serve datapoint.

Setup:

  • macOS
  • Homebrew tailscaled running as root LaunchDaemon, no official Tailscale macOS app
  • MagicDNS resolves my-box.mytailnet.ts.net -> 100.99.63.21
  • Hermes dashboard auth enabled
  • Goal: tailnet-only HTTPS via Tailscale Serve, no Funnel, no --insecure

Observed:

  • hermes dashboard --host 127.0.0.1 --port 9119

    • tailscale serve --https=443 http://127.0.0.1:9119
      fails with:
      {"detail":"Invalid Host header. Dashboard requests must use the hostname the server was bound to."}
      because browser requests arrive with Host: my-box.mytailnet.ts.net.
  • hermes dashboard --host my-box.mytailnet.ts.net --port 9119
    works over plain HTTP at http://amos.hippo-vibe.ts.net:9119.

  • Putting Tailscale Serve directly in front of that tailnet IP/hostname was unreliable/hung locally, likely because Serve's happy path expects a localhost backend.

Working workaround:

  • Hermes bound to my-box.mytailnet.ts.net:9119
  • local TCP bridge 127.0.0.1:9120 -> 100.99.63.21:9119
  • tailscale serve --https=443 http://127.0.0.1:9120
  • Browser: https://my-box.mytailnet.ts.net/

This proves the desired deployment works if Hermes can accept a configured public/reverse-proxy Host while preserving the dashboard security boundary.

I think the durable product shape should be config-based rather than another CLI-only flag, e.g.:

dashboard:
  allowed_hosts:
    - my-box.mytailnet.ts.net
  trusted_proxy_hosts:
    - 127.0.0.1

The important security properties:

  • default remains fail-closed
  • no wildcard hosts
  • allowed hosts do not widen explicit non-loopback binds unintentionally
  • suffix/lookalike hosts stay rejected
  • WebSocket Host/Origin checks use the same policy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P2 Medium — degraded but workaround exists type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants