fix(gateway): restore loopback detail probes and identity fallback#51087
Conversation
Greptile SummaryThis PR fixes two related local-gateway probe regressions in token-auth mode:
The implementation is clean and the new/updated tests cover the main scenarios well. One minor note: the single Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/gateway/probe.ts
Line: 50-63
Comment:
**Catch block silently absorbs unrelated errors alongside expected ones**
The single `catch` swallows two distinct error classes:
1. `new URL(opts.url)` throwing — a genuinely unexpected input/programming error (the caller presumably validated the URL).
2. `loadOrCreateDeviceIdentity()` throwing — the documented, expected "read-only dir" case.
Collapsing them into the same `return null` means a future bug inside `loadOrCreateDeviceIdentity` (e.g. a corrupt identity file that throws a JSON parse error) will silently degrade to no-identity instead of surfacing the problem. Consider splitting the two error sites so only the identity-persistence path is intentionally caught:
```
try {
const hostname = new URL(opts.url).hostname;
if (isLoopbackHost(hostname) && !(opts.auth?.token || opts.auth?.password)) {
return null;
}
} catch {
return null;
}
try {
return loadOrCreateDeviceIdentity();
} catch {
// Read-only or restricted environments should still be able to run
// token/password-auth detail probes without crashing on identity persistence.
return null;
}
```
This is low priority since the impact is the same (`null` either way and the probe itself would likely fail for a malformed URL), but it makes the intent clearer and prevents programming errors from being silently absorbed.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "fix(gateway): restor..." |
| try { | ||
| const hostname = new URL(opts.url).hostname; | ||
| // Local authenticated probes should stay device-bound so read/detail RPCs | ||
| // are not scope-limited by the shared-auth scope stripping hardening. | ||
| return isLoopbackHost(hostname) && !(opts.auth?.token || opts.auth?.password); | ||
| if (isLoopbackHost(hostname) && !(opts.auth?.token || opts.auth?.password)) { | ||
| return null; | ||
| } | ||
| return loadOrCreateDeviceIdentity(); | ||
| } catch { | ||
| return false; | ||
| // Read-only or restricted environments should still be able to run | ||
| // token/password-auth detail probes without crashing on identity persistence. | ||
| return null; | ||
| } | ||
| })(); |
There was a problem hiding this comment.
Catch block silently absorbs unrelated errors alongside expected ones
The single catch swallows two distinct error classes:
new URL(opts.url)throwing — a genuinely unexpected input/programming error (the caller presumably validated the URL).loadOrCreateDeviceIdentity()throwing — the documented, expected "read-only dir" case.
Collapsing them into the same return null means a future bug inside loadOrCreateDeviceIdentity (e.g. a corrupt identity file that throws a JSON parse error) will silently degrade to no-identity instead of surfacing the problem. Consider splitting the two error sites so only the identity-persistence path is intentionally caught:
try {
const hostname = new URL(opts.url).hostname;
if (isLoopbackHost(hostname) && !(opts.auth?.token || opts.auth?.password)) {
return null;
}
} catch {
return null;
}
try {
return loadOrCreateDeviceIdentity();
} catch {
// Read-only or restricted environments should still be able to run
// token/password-auth detail probes without crashing on identity persistence.
return null;
}
This is low priority since the impact is the same (null either way and the probe itself would likely fail for a malformed URL), but it makes the intent clearer and prevents programming errors from being silently absorbed.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/probe.ts
Line: 50-63
Comment:
**Catch block silently absorbs unrelated errors alongside expected ones**
The single `catch` swallows two distinct error classes:
1. `new URL(opts.url)` throwing — a genuinely unexpected input/programming error (the caller presumably validated the URL).
2. `loadOrCreateDeviceIdentity()` throwing — the documented, expected "read-only dir" case.
Collapsing them into the same `return null` means a future bug inside `loadOrCreateDeviceIdentity` (e.g. a corrupt identity file that throws a JSON parse error) will silently degrade to no-identity instead of surfacing the problem. Consider splitting the two error sites so only the identity-persistence path is intentionally caught:
```
try {
const hostname = new URL(opts.url).hostname;
if (isLoopbackHost(hostname) && !(opts.auth?.token || opts.auth?.password)) {
return null;
}
} catch {
return null;
}
try {
return loadOrCreateDeviceIdentity();
} catch {
// Read-only or restricted environments should still be able to run
// token/password-auth detail probes without crashing on identity persistence.
return null;
}
```
This is low priority since the impact is the same (`null` either way and the probe itself would likely fail for a malformed URL), but it makes the intent clearer and prevents programming errors from being silently absorbed.
How can I resolve this? If you propose a fix, please make it concise.|
Quick CI follow-up:
I also tried to rerun the failed job from my side, but the current token does not have repository-admin rerun permission. So at this point this looks like an unrelated baseline / merge-ref formatting issue, not a gateway regression in this PR. |
|
Refreshed this PR on top of current What changed in this update:
Local verification on the refreshed branch:
Both passed locally after syncing dependencies to the current lockfile. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8da0328815
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (detailLevel === "none") { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Keep device identity for authenticated lightweight probes
probeGateway now forces deviceIdentity to null whenever includeDetails disables detail RPCs, even if token/password auth is provided. That removes the stored-device-token retry path in GatewayClient.selectConnectAuth (it only loads cached tokens when opts.deviceIdentity exists), so a loopback health probe can regress from success to failure when OPENCLAW_GATEWAY_TOKEN is stale but a valid paired device token is cached. This affects restart/lifecycle reachability checks that call probeGateway(..., includeDetails: false) and can report a healthy gateway as unreachable.
Useful? React with 👍 / 👎.
|
Addressing the latest review note: lightweight probes now keep device identity when auth is present. Why:
What changed in
Local verification after the fix:
|
|
Quick CI note after the latest follow-up (
So the latest lightweight-probe/device-identity follow-up did not introduce a new CI regression; it left the same pre-existing |
|
Refreshed this PR on top of the current Why this refresh:
Local verification on the refreshed branch:
Both passed locally before pushing this update. |
|
Quick CI follow-up after the latest refresh:
I also checked current So at this point the gateway-specific CI problem is cleared, and the remaining red looks aligned with current main rather than introduced by this PR. |
31057a9 to
37405ac
Compare
37405ac to
b15d27c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b15d27c18b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| clientName: GATEWAY_CLIENT_NAMES.CLI, | ||
| mode: GATEWAY_CLIENT_MODES.CLI, | ||
| ...(opts.configPath ? { configPath: opts.configPath } : {}), | ||
| includeDetails: false, |
There was a problem hiding this comment.
Keep daemon RPC probe performing an actual RPC
Switching this path to probeGateway(..., includeDetails: false) makes daemon status --require-rpc a connect-only check rather than an RPC check. In token/password mode, operator connects can succeed while scopes are stripped when device identity is unavailable (see the compatibility baseline around shared-token connects without device identity), so this now returns ok: true even though a real status call would fail with missing scope. That can produce false-green health checks/exits in read-only or restricted environments where identity persistence falls back to null.
Useful? React with 👍 / 👎.
b15d27c to
accc86d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: accc86daa9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| await probeGateway({ | ||
| url: opts.url, | ||
| token: opts.token, | ||
| password: opts.password, | ||
| auth: { | ||
| token: opts.token, | ||
| password: opts.password, |
There was a problem hiding this comment.
Reapply ws:// non-loopback guard before daemon probing
This switched daemon status probing from callGateway to a direct probeGateway connect, which bypasses the URL safety check in call.ts that rejects plaintext ws:// on non-loopback hosts. In practice, daemon status can now send configured token/password auth to an insecure remote URL (for example via --rpc.url ws://...) instead of failing fast, which is a credential-exposure regression on this command path.
Useful? React with 👍 / 👎.
|
Addressing the
I left the default lightweight probe path intact otherwise. The separate Local validation:
|
5ad2345 to
7b04cd9
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b04cd9748
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return await probeGateway({ | ||
| url: opts.url, | ||
| token: opts.token, | ||
| password: opts.password, | ||
| auth: { | ||
| token: opts.token, | ||
| password: opts.password, | ||
| }, | ||
| tlsFingerprint: opts.tlsFingerprint, | ||
| method: "status", | ||
| timeoutMs: opts.timeoutMs, | ||
| clientName: GATEWAY_CLIENT_NAMES.CLI, | ||
| mode: GATEWAY_CLIENT_MODES.CLI, | ||
| ...(opts.configPath ? { configPath: opts.configPath } : {}), | ||
| }), | ||
| includeDetails: false, |
There was a problem hiding this comment.
Report immediate probe close errors instead of generic timeouts
This lightweight path now relies on probeGateway(..., includeDetails: false), but that probe only settles on hello or timer expiry. When the gateway rejects auth/pairing during connect (common with stale token or unpaired device), the socket closes quickly yet this code still waits the full timeoutMs and later surfaces a generic timeout rather than the actionable close reason. That is a regression from the previous callGateway("status") behavior, which failed immediately with the underlying RPC/connect error, and it can make daemon status appear slow and unhelpful in broken-auth environments.
Useful? React with 👍 / 👎.
2a82459 to
f8a66ff
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f8a66ffde2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| : null; | ||
| return { | ||
| ok: false, | ||
| error: result.error ?? closeHint ?? "gateway probe failed", |
There was a problem hiding this comment.
Prefer close hint before generic probe timeout
probeGatewayStatus still returns the generic timeout string when the lightweight probe records both error: "timeout" and a close frame, because result.error is prioritized over closeHint. In the connect-failure path used by probeGateway, close-only failures can carry actionable reasons (for example auth/pairing close codes), but this branch masks them and keeps daemon status output unhelpful in exactly the failure cases this change intended to improve.
Useful? React with 👍 / 👎.
|
Merged via squash in #51087. Thanks @heavenlost. |
* main: (344 commits) test(cleanup): isolate session lock queue coverage fix(test): stabilize windows lock and cache paths fix: honor test planner cache paths by target platform test: remove duplicate voice-call stdout assertion test: fix voice-call cli stdout assertions test(voice-call): capture cli stdout fix: route ollama helpers through plugin sdk fix(agents): restore ollama public seam Reduce script logging suppressions and Feishu any casts Reduce lint suppressions in core tests and runtime test: fix feishu batch insert test syntax test: fix feishu test typings fix(gateway): restore loopback detail probes and identity fallback (openclaw#51087) fix(test-planner): shrink local extension batches on constrained hosts fix(feishu): restore tsgo test typing refactor: remove ollama legacy shims refactor(mattermost): type config seams build: refresh plugin sdk api baseline test(feishu): type bot interaction fixtures test(feishu): type reply lifecycle fixtures ...
|
Thanks @mbelinky and @mariano for the quick review and merge!
Really happy to see the loopback probes and identity fallback restored. Glad I could make a small contribution to the project. 😊
Best regards,
heavenlost
发自我的iPhone
…------------------ Original ------------------
From: Mariano ***@***.***>
Date: Fri,Mar 27,2026 3:11 PM
To: openclaw/openclaw ***@***.***>
Cc: heavenlost ***@***.***>, Mention ***@***.***>
Subject: Re: [openclaw/openclaw] fix(gateway): restore loopback detail probesand identity fallback (PR #51087)
mbelinky left a comment (openclaw/openclaw#51087)
Merged via squash in #51087.
Prepared head SHA: f8a66ff
Merge commit: 3cbd4de
Thanks @heavenlost.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
…penclaw#51087) Merged via squash. Prepared head SHA: f8a66ff Co-authored-by: heavenlost <70937055+heavenlost@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
|
Following up on the remaining live Codex comments around daemon-status probe reporting in #51087. Opened #56282 to tighten that path so lightweight probes surface immediate close reasons instead of waiting for a generic timeout, and so |
|
Follow-up merged. The daemon-status close-reason cleanup landed in #56282 as commit 0afd73c975af8dafee2f201210ca76c523af4307, from prepared head c356980aa4d8bba5d8c6d92220676ca2c21b38ce. |
|
Thanks for opening #56282 and getting the daemon-status probe cleanup merged!
Appreciate you handling the remaining Codex comments and making the close reasons much clearer. Looks good!
…----------回复的邮件信息----------
Mariano ***@***.***> 在 Sat,Mar 28,2026 4:28 PM写道:
mbelinky left a comment (openclaw/openclaw#51087)
Following up on the remaining live Codex comments around daemon-status probe reporting in #51087.
Opened #56282 to tighten that path so lightweight probes surface immediate close reasons instead of waiting for a generic timeout, and so openclaw daemon status prefers a concrete close reason over timeout when both are present.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
…penclaw#51087) Merged via squash. Prepared head SHA: f8a66ff Co-authored-by: heavenlost <70937055+heavenlost@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…penclaw#51087) Merged via squash. Prepared head SHA: f8a66ff Co-authored-by: heavenlost <70937055+heavenlost@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…penclaw#51087) Merged via squash. Prepared head SHA: f8a66ff Co-authored-by: heavenlost <70937055+heavenlost@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…penclaw#51087) Merged via squash. Prepared head SHA: f8a66ff Co-authored-by: heavenlost <70937055+heavenlost@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…penclaw#51087) Merged via squash. Prepared head SHA: f8a66ff Co-authored-by: heavenlost <70937055+heavenlost@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
…penclaw#51087) Merged via squash. Prepared head SHA: f8a66ff Co-authored-by: heavenlost <70937055+heavenlost@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky
Summary
deviceIdentity: nullwhen identity persistence is unavailable instead of failing pre-RPCProblem
Local gateway detail probes could fail in token-auth mode for two related reasons:
ws://127.0.0.1:18789could still time out under the tighter non-local budget even when the gateway was healthyThere was also an inconsistency between the probe path and the call path: probe could tolerate device-identity persistence failure, while call could fail before RPC.
Fix
call.tswithprobe.tsby falling back todeviceIdentity: nullwhen identity persistence failsTests
node node_modules/vitest/dist/cli.js run src/gateway/call.test.ts src/gateway/probe.test.ts src/commands/gateway-status/helpers.test.tsNotes
main.