You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User auth — who is the human on the other end (none / token / trusted-proxy / Tailscale).
Internal service auth — who is the gateway's own child process (subagents, browser tool, cron, exec-approvals, CLI on the same host).
Local operator auth — who is the person at 127.0.0.1 running openclaw status, openclaw cron list, etc.
All three flow through the same branch in authorizeGatewayConnect, gated by isLocalDirectRequest, the loopback guard in authorizeTrustedProxy, sharedAuthOk, controlUi.allowInsecureAuth, and pairing/device-identity checks — layered over time as exceptions. Every pass that tightens one of these knobs has ended up loosening or breaking a different one. The March–April 2026 trusted-proxy regressions (#59167, #43300, #26007, #59045, #59702, #60265, #62767, #63381, #63548, #63344, #67703, #67524, #67799) are all the same underlying coupling surfacing in different deployment shapes.
This RFC proposes making internal service identity a first-class, orthogonal axis, leaving existing user-auth modes untouched, and outlines a phased path that ships value week-one and does not require a single large PR.
Happy to drive the implementation — looking for agreement on shape and phasing first so the five in-flight PRs can be sequenced instead of racing each other.
The problem, in one picture
flowchart TD
REQ[Incoming WS connection] --> MODE{auth.mode?}
MODE -->|none| N[accept]
MODE -->|token| T{token valid?}
MODE -->|trusted-proxy| TP{trusted source?<br/>loopback guard?<br/>required headers?<br/>allowed user?}
MODE -->|tailscale| TS[tailscale flow]
T -->|no| TFX{localDirect?<br/>sharedAuthOk?<br/>allowInsecureAuth?<br/>device paired?}
TP -->|fail| TPFX{localDirect?<br/>non-proxy failure?<br/>password set?<br/>loopback allowed?<br/>loopbackUser set?}
subgraph PATCH[same decision tree reached by 3 different concerns]
direction LR
U[external user<br/>browser / remote CLI]
S[internal service<br/>subagent / browser tool / cron]
L[local operator<br/>same-host CLI]
end
U -.-> REQ
S -.-> REQ
L -.-> REQ
style PATCH fill:#fff3cd,stroke:#856404
style TFX fill:#f8d7da,stroke:#721c24
style TPFX fill:#f8d7da,stroke:#721c24
Loading
Every red box is a patch added in response to one concern breaking, and every patch has at least once caused a different concern to break. #45264 → #54536 → #59167 is the cleanest example: each PR was correct in isolation for the case it targeted, and the composition shipped a real outage for trusted-proxy users.
Why the current hotfix wave isn't enough on its own
Five PRs are in flight, each extending the same decision tree with a different fallback path:
Password fallback for local-direct clients when trusted-proxy identity is absent (non-proxy failures only)
Existing deployments with a password already configured
Each is internally coherent. Merged together they interfere — different PRs put the fallback in different places with different trust assumptions guarded by different invariants. Merging any one of them makes the others harder to review, because the trust model has to be re-derived from scratch each time.
The short-term user pain is real and needs Phase 0 relief. The medium-term cost of continuing this pattern — more patches, more exceptions, more combinations nobody has tested — is what this RFC is trying to stop.
User-facing modes keep their current semantics. The new check runs first, is configured independently, and is the only path the gateway's own children use. Tightening either axis stops leaking into the other.
Service identity, concretely
Token material. Gateway generates gateway.service.token at first start, stored alongside the existing auto-generated password with owner-only permissions. Rotatable via openclaw gateway rotate-service-token. No operator action required.
Injection. The gateway already spawns its own children and already injects OPENCLAW_GATEWAY_PORT. Extend that with OPENCLAW_SERVICE_TOKEN. GatewayClient picks it up from env.
Client class. Children authenticate with the service token regardless of upstream auth.mode. External CLI callers still follow the existing resolution chain.
Scopes.backend role, with scope sets per caller class (operator.read/write for subagents and CLI, narrower for browser tool). No unbounded admin.
Surface. Loopback accepted unconditionally. Off-host accepted only with explicit opt-in (gateway.service.allowRemote: true) for multi-node deployments. Off-host default deny.
What this removes
With service identity in place, the localDirect special-casing inside trusted-proxy, the sharedAuthOk shortcut, and the password-fallback path all stop being necessary for internal callers. User auth stays focused on users. Service auth stays focused on the gateway's own children. The loopback guard in trusted-proxy can stay strict without breaking anything, because internal callers no longer depend on it.
What this does not change
gateway.auth.mode values, semantics, or on-the-wire behavior for user traffic.
Trusted-proxy header handling for actual reverse-proxied user traffic.
Device-pairing flow for operator browser sessions.
Tailscale header auth.
Phased plan
Ship value early. Keep each phase independently reviewable. Avoid a single large PR in this area — that pattern has a bad track record here.
Phase 0 — Hotfix triage (now)
Goal: unblock the people reporting breakage today without waiting for Phase 1.
Of the five in-flight PRs, two cover the two most-reported shapes with the clearest trust stories:
Recommendation: merge both as complementary opt-ins. #59190 can slot in alongside for same-host reverse proxies that forward real external traffic. #51070 and #54426 are superseded by Phase 1 and can close with a pointer to this RFC.
Phase 0 is explicitly tactical. Nothing in it constrains the Phase 1 design — all of it is removed cleanly in Phase 2.
Phase 1 — Internal service identity (first release after Phase 0)
Auto-generate and persist gateway.service.token at first start.
GatewayClient picks up the token from env when spawned by the gateway.
authorizeGatewayConnect gains a service-identity check that runs before user auth. Loopback-only in this phase.
Outcome: the whole class of "subagent/internal RPC fails with pairing required / trusted_proxy_loopback_source" reports disappears regardless of auth.mode.
Phase 2 — Retire legacy fallbacks
Remove localDirect fallback inside the trusted-proxy branch.
Redesigning device pairing. Pairing stays as-is for operator browser sessions; service identity routes around it for internal callers, which is the minimal change.
[RFC] Separate internal service identity from user auth in the OpenClaw gateway
Labels suggestion:
rfc,area:gateway,area:auth,discussion,trusted-proxyKeywords (for search): gateway auth, trusted-proxy, service identity, service account, loopback, localDirect, internal RPC, subagent auth, pairing,
trusted_proxy_loopback_sourceSummary
gateway.auth.modeis doing three jobs at once:none/token/trusted-proxy/ Tailscale).127.0.0.1runningopenclaw status,openclaw cron list, etc.All three flow through the same branch in
authorizeGatewayConnect, gated byisLocalDirectRequest, the loopback guard inauthorizeTrustedProxy,sharedAuthOk,controlUi.allowInsecureAuth, and pairing/device-identity checks — layered over time as exceptions. Every pass that tightens one of these knobs has ended up loosening or breaking a different one. The March–April 2026 trusted-proxy regressions (#59167, #43300, #26007, #59045, #59702, #60265, #62767, #63381, #63548, #63344, #67703, #67524, #67799) are all the same underlying coupling surfacing in different deployment shapes.This RFC proposes making internal service identity a first-class, orthogonal axis, leaving existing user-auth modes untouched, and outlines a phased path that ships value week-one and does not require a single large PR.
Happy to drive the implementation — looking for agreement on shape and phasing first so the five in-flight PRs can be sequenced instead of racing each other.
The problem, in one picture
flowchart TD REQ[Incoming WS connection] --> MODE{auth.mode?} MODE -->|none| N[accept] MODE -->|token| T{token valid?} MODE -->|trusted-proxy| TP{trusted source?<br/>loopback guard?<br/>required headers?<br/>allowed user?} MODE -->|tailscale| TS[tailscale flow] T -->|no| TFX{localDirect?<br/>sharedAuthOk?<br/>allowInsecureAuth?<br/>device paired?} TP -->|fail| TPFX{localDirect?<br/>non-proxy failure?<br/>password set?<br/>loopback allowed?<br/>loopbackUser set?} subgraph PATCH[same decision tree reached by 3 different concerns] direction LR U[external user<br/>browser / remote CLI] S[internal service<br/>subagent / browser tool / cron] L[local operator<br/>same-host CLI] end U -.-> REQ S -.-> REQ L -.-> REQ style PATCH fill:#fff3cd,stroke:#856404 style TFX fill:#f8d7da,stroke:#721c24 style TPFX fill:#f8d7da,stroke:#721c24Every red box is a patch added in response to one concern breaking, and every patch has at least once caused a different concern to break. #45264 → #54536 → #59167 is the cleanest example: each PR was correct in isolation for the case it targeted, and the composition shipped a real outage for trusted-proxy users.
Why the current hotfix wave isn't enough on its own
Five PRs are in flight, each extending the same decision tree with a different fallback path:
localDirecttrusted-proxy failure, fall through withmethod: "none"X-Forwarded-Forresolves to a non-loopback clienttrustedProxy.allowLoopback+loopbackUser+ skiprequiredHeaderson loopbackEach is internally coherent. Merged together they interfere — different PRs put the fallback in different places with different trust assumptions guarded by different invariants. Merging any one of them makes the others harder to review, because the trust model has to be re-derived from scratch each time.
The short-term user pain is real and needs Phase 0 relief. The medium-term cost of continuing this pattern — more patches, more exceptions, more combinations nobody has tested — is what this RFC is trying to stop.
Proposal
Three axes, evaluated in order
User-facing modes keep their current semantics. The new check runs first, is configured independently, and is the only path the gateway's own children use. Tightening either axis stops leaking into the other.
Service identity, concretely
gateway.service.tokenat first start, stored alongside the existing auto-generated password with owner-only permissions. Rotatable viaopenclaw gateway rotate-service-token. No operator action required.OPENCLAW_GATEWAY_PORT. Extend that withOPENCLAW_SERVICE_TOKEN.GatewayClientpicks it up from env.auth.mode. External CLI callers still follow the existing resolution chain.backendrole, with scope sets per caller class (operator.read/write for subagents and CLI, narrower for browser tool). No unbounded admin.gateway.service.allowRemote: true) for multi-node deployments. Off-host default deny.What this removes
With service identity in place, the
localDirectspecial-casing inside trusted-proxy, thesharedAuthOkshortcut, and the password-fallback path all stop being necessary for internal callers. User auth stays focused on users. Service auth stays focused on the gateway's own children. The loopback guard in trusted-proxy can stay strict without breaking anything, because internal callers no longer depend on it.What this does not change
gateway.auth.modevalues, semantics, or on-the-wire behavior for user traffic.Phased plan
Ship value early. Keep each phase independently reviewable. Avoid a single large PR in this area — that pattern has a bad track record here.
Phase 0 — Hotfix triage (now)
Goal: unblock the people reporting breakage today without waiting for Phase 1.
Of the five in-flight PRs, two cover the two most-reported shapes with the clearest trust stories:
isNonProxyFailure && localDirectgating plus rate-limiting. Zero config change required for the existing broken shape (deployment already has a password). Smallest diff.allowLoopback+loopbackUser, explicit opt-in, closes the original feature request (Feature request: trustedProxy.loopbackUser for CLI/sub-agent access without proxy #26007, [Bug]: #43300). Right shape for K8s / Docker sidecar deployments where no password exists.Recommendation: merge both as complementary opt-ins. #59190 can slot in alongside for same-host reverse proxies that forward real external traffic. #51070 and #54426 are superseded by Phase 1 and can close with a pointer to this RFC.
Phase 0 is explicitly tactical. Nothing in it constrains the Phase 1 design — all of it is removed cleanly in Phase 2.
Phase 1 — Internal service identity (first release after Phase 0)
gateway.service.tokenat first start.GatewayClientpicks up the token from env when spawned by the gateway.authorizeGatewayConnectgains a service-identity check that runs before user auth. Loopback-only in this phase.gateway.service.enabled, defaultfalse→trueafter one release) to catch platform surprises before defaulting on. Windows in particular has live issues around token persistence (Bug: macOS gateway install --force resolves SecretRef values into plaintext LaunchAgent plist and triggers token mismatch loop #53742, [Bug] Upgrade: entry.js hardcode + non-atomic service restart + TOKEN plist pollution #66038, Bug: gateway install --force persists secrets into user systemd unit files #61340, openclaw gateway install --force still embeds OPENCLAW_GATEWAY_TOKEN in LaunchAgent plist; openclaw channels list throws TypeError: fetch failed with no channels configured #67595) that we should not inherit.Outcome: the whole class of "subagent/internal RPC fails with
pairing required/trusted_proxy_loopback_source" reports disappears regardless ofauth.mode.Phase 2 — Retire legacy fallbacks
localDirectfallback inside the trusted-proxy branch.insecureAllowNoUpstreamAuth: trueopt-in plus runtime warnings, now possible without breaking internal callers.Gated on Phase 1 having been the default for at least one release. Deprecation notes in changelog,
openclaw doctorflagging affected configs.Phase 3 (optional, future)
Listed so Phase 1's design doesn't foreclose this, not committed to in this RFC.
Compatibility and migration
auth.modeconfigs work unchanged in every phase.gateway.service.enabled: falsereproduces today's behavior exactly.Non-goals
Open questions
%LOCALAPPDATA%; want a second opinion from people who hit those bugs.openclaw gateway install --forcebehavior: regenerate or preserve the service token? Current token-handling behavior (openclaw gateway install --force still embeds OPENCLAW_GATEWAY_TOKEN in LaunchAgent plist; openclaw channels list throws TypeError: fetch failed with no channels configured #67595) suggests preserve, with an explicit--rotate-service-tokenopt-in.service-accountcarries K8s / GCP baggage;service-identityis more neutral but less discoverable. Bikeshed welcome.References
Reports attributable to the current coupling
#26007 · #43300 · #59167 · #59045 · #59702 · #60265 · #62767 · #63381 · #63548 · #63344 · #67703 · #67524 · #67799 · #55218 · #46897 · #48847 · #49201 · #52647 · #57434 · #59882
Complementary concerns (same axis, different direction)
#57087 (external-side guardrails) · #63344 (local backend client class) · #43786 (auth.mode=none still required by some deployments) · #50751 (CLI host resolution) · #56982 (doctor output for trusted-proxy)
PRs this RFC would supersede or subsume (Phase 2 onward)
#51070 · #54426 · #59190 · #63379 · #64122
Context
#45264 · #54536 (regression boundary) · #44044 · #49107 · #33819 · #54718 · #9271
Ask
cc @vincentkoc · @nickytonline · @mrosmarin