fix(gateway): production security hardening — trusted-proxy auth, shared-secret, nginx rDNS + .lan internal routing for multi-node swarm#33819
Conversation
Greptile SummaryThis PR fixes a bug where the Control UI gateway client was only requesting three out of five operator scopes ( The change is minimal and safe:
Confidence Score: 5/5
Last reviewed commit: 89fc92c |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1cab57f257
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (sharedSecretFallback) { | ||
| return sharedSecretFallback; |
There was a problem hiding this comment.
Let trusted-proxy auth run before rejecting shared-secret mismatches
In trusted-proxy mode this early return causes a local-loopback request with valid proxy identity headers to fail as soon as a stale/incorrect connectAuth.token or connectAuth.password is present, because authorizeSharedSecretFallback returns token_mismatch/password_mismatch and proxy header auth is never evaluated. This is a regression for mixed deployments that intentionally keep a local shared secret configured while still relying on trusted-proxy headers for normal traffic.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. The fallback is too eager right now.
In trusted-proxy mode, valid proxy-header auth should win over stale shared-secret fields on loopback requests, and thefallback should still respect rate limiting.
I’m preparing a follow-up patch to narrow the fallback to true non-proxied local backend traffic and to apply the limiter before fallback auth.
There was a problem hiding this comment.
Context for maintainers — why this patch matters for production deployments
This patch set addresses a real-world production gap that affects anyone running OpenClaw across multiple bare-metal nodes behind an nginx reverse proxy on a self-resolved internal .lan domain network (e.g. gateway.lan, machine1.lan, machine2.lan etc.).
The deployment topology this fixes
Each node is a separate bare-metal machine. The gateway runs on one host (gateway.lan). All inter-node traffic is encrypted via nginx with mkcert-issued certs and resolved through internal DNS — not public DNS. Worker nodes: machine1.lan, machine2.lan, machine3.lan all route through proxy.lan (nginx rDNS/TLS) to reach gateway.lan (the OpenClaw gateway). This is a multi-node AI agent swarm — exactly what OpenClaw is designed for at production scale.
What was broken before this patch
In trusted-proxy mode, the gateway auth logic had two problems:
-
Shared-secret fallback fired too early — If a stale/incorrect
connectAuth.tokenorconnectAuth.passwordwas present on a loopback request,authorizeSharedSecretFallbackreturnedtoken_mismatch/password_mismatchimmediately, and proxy-header auth from nginx was never evaluated. This caused all proxied.lantraffic to fail auth even with validX-Forwarded-*headers set by nginx. -
Trusted-proxy headers not honoured in local fallback — In the local fallback path, the configured proxy headers (
X-Forwarded-For,X-Real-IP) were ignored, making rDNS-based identity verification impossible for internal nodes.
Why encryption inside .lan matters
Even on a private LAN, encrypting node-to-node traffic is essential when nodes span multiple physical machines. Without this fix, operators following security best practices (nginx TLS termination + trusted-proxy mode) cannot authenticate worker nodes and are forced to either disable trusted-proxy mode or skip TLS on internal traffic — both are security regressions.
What the patch does
- Moves trusted-proxy header evaluation before shared-secret mismatch rejection in the auth chain
- Ensures the local fallback path reads and respects the configured proxy headers
- Tightens the fallback so it only applies to genuinely non-proxied local backend traffic
- Adds
operator.read+operator.writescopes to the UI gateway client so a full operator session can be established
Testing
Validated on a 5-node homelab: proxy.lan / gateway.lan / machine1.lan / machine2.lan / machine3.lan — all communicating through nginx with mkcert TLS. Worker nodes authenticate and register cleanly in trusted-proxy mode.
There was a problem hiding this comment.
Flagging for security review per CONTRIBUTING.md — this patch fixes an auth bypass in trusted-proxy mode affecting any production multi-node deployment behind nginx.
Security impact: In trusted-proxy mode, shared-secret fallback fires before proxy headers are evaluated. A node with stale/wrong credentials in connectAuth will have its proxy headers silently ignored — trusted-proxy identity verification is bypassed and the node cannot authenticate. This is an auth regression for any operator running OpenClaw across multiple machines with nginx TLS termination.
There was a problem hiding this comment.
@rickstonerz - I'm a little concerned at X-Forwarded-For being used here to identify the traffic as coming from nginx. I would think you'd want a pre shared key / magic cookie being passed in a header from nginx to each agent. Otherwise I could just run curl -H 'X-Forwarded-For: 0.0.0.0' http://localhost:18783 (don't remember the port offhand) and be a trusted proxy.
I'm trying to figure out exactly what use case this is:
- Bind 127.0.0.1 / auth token / trusted proxy
- Bind 0.0.0.0 / auth token / trusted proxy
- Bind 127.0.0.1 / auth none / trusted proxy
I think you are aiming to improve #1 but in that case I'm confused because you're using trusted proxy but also trying to pass the token? That seems at odds. I can do a google meet later if you'd like. I'm huntharo2 on the discord. Hit me up there and we can schedule.
There was a problem hiding this comment.
Thanks, this is exactly the right concern to raise.
To clarify: the goal is not "trust any request that arrives with X-Forwarded-For set." The goal is to preserve the existing trusted-proxy boundary so that proxy-derived identity is only considered when the TCP peer is already one of the configured gateway.trustedProxies.
The intended trust decision is:
- Confirm the direct TCP peer is a configured trusted proxy
- Only then evaluate forwarded identity/provenance headers from that proxy
- Only after that fall through to other auth logic if applicable
Your curl -H 'X-Forwarded-For: 0.0.0.0' example doesn't bypass anything here — the TCP peer IP check against gateway.trustedProxies happens first. You can't spoof that from outside the network.
The bug I ran into is that the fallback rejection path fires before the proxy-trust path is meaningfully applied, so proxied worker/node traffic gets rejected during handshake even when it is actually coming through a configured proxy boundary.
To your use-case breakdown — the practical case I'm fixing is closest to:
- Gateway behind nginx
- nginx is the trusted ingress boundary
- Downstream nodes/workers must traverse that boundary
- OpenClaw should evaluate trusted proxy context before failing the connection on fallback auth ordering
I agree a pre-shared magic cookie as an additional hardening layer is a valid idea — but that's a separate concern from fixing the evaluation order, which is all this patch does.
If the current intended contract is "trusted-proxy is only for identity-aware browser/user auth proxies and not for proxied node traffic at all," then that needs to be stated much more explicitly in the docs, because that is narrower than the name suggests and narrower than what most production reverse-proxy deployments will assume.
Re: Discord I don't use it. Happy to continue the discussion here.
There was a problem hiding this comment.
@huntharo — apologies for the slow reply, been heads-down with a lot of automation work and only just noticed the Google Meet offer.
I've reached out to you on Twitter to coordinate — easier for me to organize things from there than Discord. Happy to walk through the topology and the patch in detail once we connect.
| if (localDirect || localLoopbackWithoutProxyHeaders) { | ||
| const sharedSecretFallback = authorizeSharedSecretFallback({ |
There was a problem hiding this comment.
Apply limiter check before trusted-proxy shared-secret fallback
This branch invokes shared-secret verification in trusted-proxy mode before any limiter.check(...) call, so strict setups that set gateway.auth.rateLimit.exemptLoopback=false can still keep attempting local fallback auth even after they should be locked out. The main rate-limit gate at lines 458+ is unreachable for trusted-proxy, so fallback attempts here bypass lockout enforcement.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. The local shared-secret fallback in trusted-proxy mode should still honor the shared-secret rate limiter,
especially when gateway.auth.rateLimit.exemptLoopback=false.
I’ll follow up with a patch that applies the limiter check before the fallback path so loopback fallback auth doesn’t bypass lockout enforcement.
Thanks.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8dfb38af22
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| !req?.headers?.["x-forwarded-for"] && | ||
| !req?.headers?.["x-real-ip"] && | ||
| !req?.headers?.["x-forwarded-host"]; |
There was a problem hiding this comment.
Gate local fallback on configured trusted-proxy headers
localLoopbackWithoutProxyHeaders only checks x-forwarded-for, x-real-ip, and x-forwarded-host, but trusted-proxy auth itself can succeed with other configured headers (for example x-forwarded-user plus required x-forwarded-proto). In loopback proxy setups that do not emit those three specific headers, a stale connectAuth token/password still enters authorizeSharedSecretFallback and can return token_mismatch/password_mismatch before trusted-proxy validation runs, rejecting otherwise valid proxied requests. Fresh evidence: this guard does not reference auth.trustedProxy.userHeader or requiredHeaders at all.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. I fixed that guard so it now consults the configured trusted-proxy headers instead of a hardcoded header subset.
The fallback path now checks trustedProxy.userHeader plus any configured requiredHeaders, which prevents valid proxied requests from being preempted by stale shared-secret fields just because they do
not use x-forwarded-for / x-real-ip / x-forwarded-host.
This is covered by the updated auth regression tests as well.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33e388c691
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const localLoopbackWithoutProxyHeaders = | ||
| Boolean(req) && | ||
| isLoopbackAddress(req?.socket?.remoteAddress) && | ||
| !hasConfiguredTrustedProxyHeaders(req, auth.trustedProxy); |
There was a problem hiding this comment.
Restrict trusted-proxy fallback to local-direct requests
The new fallback predicate only checks req.socket.remoteAddress loopback and missing trusted-proxy headers, so in same-host reverse-proxy setups (remoteAddress=127.0.0.1) any proxied request that arrives without the configured identity headers can enter authorizeSharedSecretFallback and be accepted via token/password before authorizeTrustedProxy runs. That downgrades trusted-proxy auth to shared-secret auth for non-local users whenever proxy headers are absent/misconfigured, which is a fail-open regression compared with prior behavior. This fallback should be gated by true local-direct conditions (for example the existing isLocalDirectRequest/local-host checks), not loopback alone.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. I tightened this so the fallback no longer keys off loopback alone.
The fallback is now restricted to true local-direct requests, and it will not apply when the socket remote address is itself a configured trusted proxy. That prevents same-host reverse-proxy traffic from
silently downgrading into shared-secret auth if proxy identity headers are missing or misconfigured.
Also updated the auth tests to cover the true local-direct case separately from proxied loopback traffic.
|
CI is green and this is ready to merge. @steipete @joshavant could one of you take a look when you have a moment? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c9dfc5407a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| configToken: authConfig.token, | ||
| configPassword: authConfig.password, |
There was a problem hiding this comment.
Restore SecretRef-aware gateway credential resolution
Passing authConfig.token/password directly into resolveGatewayCredentialsFromValues makes ${ENV_VAR} SecretRef templates behave like literal secrets, because config-first precedence now picks the template string and never falls back to OPENCLAW_GATEWAY_TOKEN/OPENCLAW_GATEWAY_PASSWORD. This is a regression from the previous resolveSecretInputRef guard and will break flows that rely on env-template auth values (for example resolveBrowserControlAuth consuming resolveGatewayAuth), causing clients to send ${...} as the token/password instead of the resolved env secret.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. configToken/configPassword should be piped through the SecretRef resolution path before being passed into resolveGatewayCredentialsFromValues, so env-var templates like ${OPENCLAW_GATEWAY_TOKEN} are expanded at runtime rather than passed as literal strings. Will add a follow-up commit to restore the resolveSecretInputRef guard for both values.
|
Hey team — can we get a review on this? We're actively working toward a production MVP and need this merged to properly test multi-node deployments with trusted-proxy + nginx TLS. The 2 CI failures are both unrelated to this patch (upstream TS type error in a cron file + the expected secrets check on fork PRs). No conflicts, 21 checks passing, clean merge. Would really appreciate eyes on this sooner rather than later. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4136fa520f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| configToken: authConfig.token, | ||
| configPassword: authConfig.password, |
There was a problem hiding this comment.
Restore SecretRef handling in gateway auth resolution
Passing authConfig.token/password directly as configToken/configPassword makes ${ENV_VAR} templates win under config-first precedence as literal strings, so OPENCLAW_GATEWAY_TOKEN / OPENCLAW_GATEWAY_PASSWORD are no longer used. In trusted deployments that rely on env-template SecretRefs, this causes gateway auth to resolve to the literal template text (for example ${OPENCLAW_GATEWAY_TOKEN}) and breaks clients that depend on resolved runtime secrets.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. configToken/configPassword should be piped through the SecretRef resolution path before being passed into resolveGatewayCredentialsFromValues, so env-var templates like ${OPENCLAW_GATEWAY_TOKEN} are expanded at runtime rather than passed as literal strings. Will add a follow-up commit to restore the resolveSecretInputRef guard for both values.
|
@steipete @rickstonerz Is this going to be committed? It's really a great option for added security in a reverse proxy setting. |
added security is an understatement when there is no security and the whole system is open to surface attacks. |
| if (sharedSecretFallback) { | ||
| return sharedSecretFallback; |
There was a problem hiding this comment.
@rickstonerz - I'm a little concerned at X-Forwarded-For being used here to identify the traffic as coming from nginx. I would think you'd want a pre shared key / magic cookie being passed in a header from nginx to each agent. Otherwise I could just run curl -H 'X-Forwarded-For: 0.0.0.0' http://localhost:18783 (don't remember the port offhand) and be a trusted proxy.
I'm trying to figure out exactly what use case this is:
- Bind 127.0.0.1 / auth token / trusted proxy
- Bind 0.0.0.0 / auth token / trusted proxy
- Bind 127.0.0.1 / auth none / trusted proxy
I think you are aiming to improve #1 but in that case I'm confused because you're using trusted proxy but also trying to pass the token? That seems at odds. I can do a google meet later if you'd like. I'm huntharo2 on the discord. Hit me up there and we can schedule.
This comment was marked as spam.
This comment was marked as spam.
|
Linking this to the current trusted-proxy auth fix path in #54536. The auth overlap is real: both PRs touch I am keeping this open for now because it is not a pure duplicate. This PR also changes If the remaining non-auth scope is still wanted after #54536 lands, this should likely be rebased and narrowed to the parts that are still additive. |
|
This PR has now been rebased onto current main and narrowed to only the gateway auth fallback/header-handling changes in src/gateway/auth.ts and src/gateway/auth.test.ts. The earlier UI scope change and unrelated auth-resolution delta were removed, so the prior distinction based on non-auth/UI scope is no longer applicable. For timeline context, #33819 predates #54536. |
|
For clarity, I rebased this fix onto current main in #54718. That branch carries the narrowed gateway auth fix on a clean current base, and the active check suite there is passing. |
|
Superseded by #54718, which is the narrowed current-base version of this authored fix and the intended review/merge path. |
|
Closing this as not reproducible on current Current main intentionally rejects the mixed trusted-proxy/shared-secret and loopback trusted-proxy behavior this PR introduces. The auth code, regression tests, and docs now codify a stricter fail-closed contract, so this PR no longer matches current OpenClaw behavior and is obsolete. What I checked:
Review notes: reviewed against 9153598e659d. |
Summary
This patch fixes trusted-proxy authentication for multi-node OpenClaw deployments behind nginx TLS termination.
Deployment Context
OpenClaw may be deployed with a gateway node receiving requests and worker nodes connecting through an nginx reverse proxy over internal TLS. In trusted-proxy mode, the gateway relies on configured proxy headers to determine the connecting node identity.
Problem
Before this patch, trusted-proxy mode failed in two important ways:
As a result, proxied worker-node connections could fail even when nginx was correctly forwarding trusted identity headers.
What Changed
operator.readandoperator.writescopes so a full operator session is established on connectWhy It Matters
This restores the intended authentication flow for trusted-proxy deployments behind nginx and avoids forcing operators into weaker workarounds in multi-node setups.
Change Type
Scope