Fix local gateway detail probes on loopback#46556
Conversation
Greptile SummaryThis PR fixes two related issues with local gateway probes in token-auth mode: device identity was being suppressed for loopback calls (causing
Confidence Score: 4/5
|
There was a problem hiding this comment.
💡 Codex Review
Lines 55 to 59 in 7bbaad2
By dropping the loopback deviceIdentity: null override in probeGateway, even includeDetails: false probes now instantiate GatewayClient with its default identity loading path. That makes the lightweight reachability path used by src/cli/daemon-cli/restart-health.ts depend on creating/reading identity files, so on hosts with unwritable state dirs the probe throws before handshake and the restart health check can incorrectly treat a healthy local gateway as unhealthy. Since this mode intentionally skips detail RPCs, it should not require device-identity setup.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Validated locally: removing the loopback What we confirmedBefore local patch:
After local patch:
Additional observation from post-update automationAfter gateway restart, probe can still be a little cold-start-sensitive for a short window. In our post-update hook, the most reliable behavior was:
So this PR direction looks right, and I’d strongly support also keeping an eye on the local-loopback probe budget / cold-start behavior. In short:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5207d1fc4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| mode: GATEWAY_CLIENT_MODES.PROBE, | ||
| instanceId, | ||
| deviceIdentity: disableDeviceIdentity ? null : undefined, | ||
| deviceIdentity: opts.includeDetails === false ? null : undefined, |
There was a problem hiding this comment.
Fall back when local probe identity cannot be persisted
probeGateway now sends deviceIdentity: undefined for detail probes, which makes GatewayClient eagerly call loadOrCreateDeviceIdentity() (src/gateway/client.ts:146-149). On first run in environments where the state dir is not writable (for example read-only HOME in CI/containers), that helper throws while writing identity/device.json (src/infra/device-identity.ts:108-117), so openclaw gateway status/detail probes fail before any RPC even when token/password auth is valid. Previously loopback probes passed null and avoided this failure path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d2e8a493a3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| function shouldAttachDeviceIdentityForGatewayCall(): boolean { | ||
| // Even when local CLI calls authenticate with a shared gateway token/password, | ||
| // we still want to attach device identity so paired operator scopes remain | ||
| // available for detail RPCs such as status / system-presence / last-heartbeat. | ||
| return true; |
There was a problem hiding this comment.
Add fallback for device identity write failures
shouldAttachDeviceIdentityForGatewayCall() now always returns true, so local loopback calls that use shared token/password always execute loadOrCreateDeviceIdentity(). In read-only or restricted state directories (for example CI containers with unwritable HOME), that helper throws while writing identity/device.json, causing callGateway to fail before any RPC even though token/password auth is valid; this regresses local token-auth CLI calls compared to the prior loopback bypass and unlike probeGateway, which now explicitly falls back to null on this failure path.
Useful? React with 👍 / 👎.
|
I hit this locally on High-signal findings from runtime reproduction:
That exposed the second bug: the CLI status/probe wrapper was masking the real failure with an artificial timeout because of the per-target budget caps. On my box, the detail probe took ~2.6s, so the existing 800ms/1500ms caps could turn the real scope-limited failure into a fake I applied a local dist patch equivalent to the intent of this PR:
After that:
So from a live installed-runtime perspective: yes, this is the right fix direction. One practical note: on this machine, ~3000ms would likely pass, but only barely. I saw ~2596ms on a healthy localhost detailed probe, so any extra noise could make that threshold a little tight. |
|
One more thing after reading the actual diff: I think there are still two gaps here. 1) The budget bump only fixes
|
|
@jeffrey4341 Thanks — I agreed with both gaps and pushed a verified follow-up here:
This follow-up:
Verified with:
I couldn't push directly to |
Cherry-picked-by-hand from heavenlost/openclaw@d2b0ea9 after review and local test verification.
|
Picked this up and integrated the verified follow-up from @heavenlost into this PR branch. What I updated:
Local verification after the update:
All passed locally. CI is rerunning on the updated branch now. |
|
I dug through the PR checks against the base The red CI on this PR does not look specific to the gateway change itself. This branch appears to be sitting on top of a red baseline. Evidence:
That same March 16, 2026 As of March 20, 2026,
Also, the loopback probe / device-identity fix still does not appear to be present in current My suggestion: instead of reasoning from the March 16 red run, reapply or rebase the final gateway-only diff onto current green References:
|
|
Opened a clean replacement PR based on current green Link: #51087 This carries only the gateway-related diff from this branch and avoids the stale red CI baseline from March 16, 2026. Targeted local verification on the replacement branch passed:
|
|
Closing this as implemented after Codex review. Current What I checked:
So I’m closing this as already implemented rather than keeping a duplicate issue open. Review notes: reviewed against 56fe2aab9c87. |
Summary
status,system-presence,last-heartbeat) working in local token-auth modeCloses #46557.
Problem
On a local gateway configured with
gateway.auth.mode=token, the CLI currently suppressesdeviceIdentityon loopback calls whenever a shared token/password is present. That means simple health checks work, but detail RPCs that rely on paired operator scopes can fail withmissing scope: operator.read.Separately,
openclaw gateway probeonly budgets 800ms for thelocalLoopbacktarget, which can be too short for a full detail probe even when the gateway is healthy.Fix
deviceIdentityfor local token/password gateway callslocalLoopbackprobe budget to 3000msTests
corepack pnpm exec vitest run --config vitest.gateway.config.ts src/gateway/call.test.ts src/gateway/probe.test.tscorepack pnpm exec vitest run src/commands/gateway-status/helpers.test.ts