(fix) gateway restart health timeout#48472
Conversation
c382b9c to
950c33e
Compare
Greptile SummaryThis PR fixes gateway restart/status health checks by switching the daemon probe path from the heavier Key changes:
Confidence Score: 4/5
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc03cd3e25
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
🔒 Aisle Security AnalysisWe found 2 potential security issue(s) in this PR:
1. 🟠
|
| Property | Value |
|---|---|
| Severity | High |
| CWE | CWE-319 |
| Location | src/gateway/net.ts:468-479 |
Description
isSecureWebSocketUrl() is intended to allow plaintext ws:// only for loopback by default, with an opt-in “break-glass” mode (allowPrivateWs) for private networks only.
However, when allowPrivateWs is enabled, any non-IP hostname is accepted, including public domains (e.g. ws://evil.com) and deceptive userinfo URLs (e.g. ws://127.0.0.1@evil.com). This undermines the new CLI-side enforcement (and other call sites) by permitting cleartext WebSocket connections to non-private remote hosts, exposing credentials and chat data to interception.
Why this happens:
- For
ws://URLs, if the host is not loopback and not a private IP literal, the code returnsnet.isIP(hostForIpCheck) === 0. net.isIP(x) === 0means “x is not an IP address” (i.e., it is a hostname), so the function returnstruefor essentially all hostnames.
Vulnerable code:
if (opts?.allowPrivateWs) {
if (isPrivateOrLoopbackHost(parsed.hostname)) {
return true;
}
const hostForIpCheck =
parsed.hostname.startsWith("[") && parsed.hostname.endsWith("]")
? parsed.hostname.slice(1, -1)
: parsed.hostname;
return net.isIP(hostForIpCheck) === 0; // <-- hostnames always pass
}Recommendation
Tighten the break-glass behavior so it does not implicitly allow all hostnames.
Minimum safe fix (recommended): when allowPrivateWs is enabled, only allow private/loopback IP literals (and optionally a small, explicit hostname allowlist if you truly need it). For everything else, require wss://.
Example:
if (opts?.allowPrivateWs) {
// Allow private/loopback *IP literals* only
if (isPrivateOrLoopbackHost(parsed.hostname)) {
return true;
}
return false; // hostnames must use wss://
}If hostname support is required for private DNS (VPN/tailnet), introduce an async validator that resolves DNS and verifies that all returned A/AAAA records are private/loopback and ensure the WebSocket connection uses the resolved IP (to mitigate DNS rebinding / TOCTOU).
2. 🟠 Implicit device-token leakage to non-loopback URLs during unauthenticated gateway probes
| Property | Value |
|---|---|
| Severity | High |
| CWE | CWE-522 |
| Location | src/gateway/probe.ts:47-89 |
Description
probeGateway() only disables device identity for unauthenticated loopback URLs. For any non-loopback URL (including attacker-controlled wss://...) when opts.auth is omitted/empty, it leaves deviceIdentity enabled (passed as undefined).
Because GatewayClient treats deviceIdentity: undefined as “load or create a real device identity”, the connect handshake can:
- Send a stable device identifier/public key (tracking) to the remote endpoint
- Load a cached device token from disk and send it in the connect auth fields (implicit credential disclosure)
This violates the security invariant “don’t send device identity / cached device token to untrusted remote endpoints unless explicitly intended”.
Vulnerable logic (probe keeps identity enabled for non-loopback, unauthenticated URLs):
return (
isLoopbackHost(hostname) &&
!(opts.auth?.token || opts.auth?.password) &&
opts.allowLoopbackDeviceIdentity !== true
);
...
deviceIdentity: disableDeviceIdentity ? null : undefined,Why this leaks credentials (data flow):
- Input:
opts.urlis supplied by callers such as gateway discovery/status commands and can be user-controlled. - Condition: when no explicit token/password is provided for a non-loopback URL,
disableDeviceIdentitybecomesfalse. - Sink:
GatewayClientloads device identity and may load a cached device token and include it in the WebSocketconnecthandshake auth payload sent to that URL.
Impact:
- Attacker-operated
wss://endpoint can collect the client’s device identity and cached device token. - That token can enable authentication attempts or real-time credential forwarding attacks against the legitimate gateway, and at minimum discloses a sensitive credential to an untrusted party.
Recommendation
Make probes opt-out of device identity (and thus device-token fallback) by default, and only opt-in for explicitly trusted cases.
Recommended change in probeGateway():
- Disable device identity whenever explicit shared credentials are not provided, unless the caller explicitly opts in and the endpoint is trusted (e.g., loopback, or
wss://with a pinnedtlsFingerprint).
Example:
const parsed = new URL(opts.url);
const isLoopback = isLoopbackHost(parsed.hostname);
const hasExplicitSharedAuth = Boolean(opts.auth?.token || opts.auth?.password);
// Only allow device identity when explicitly intended.
const allowDeviceIdentity =
hasExplicitSharedAuth ||
(isLoopback && opts.allowLoopbackDeviceIdentity === true);
const client = new GatewayClient({
...,
// Passing null disables identity; undefined loads/creates identity.
deviceIdentity: allowDeviceIdentity ? undefined : null,
});Optionally (defense-in-depth): update GatewayClient.selectConnectAuth() to never use stored device-token fallback unless an explicit allowDeviceTokenFallback flag is set and the URL is trusted.
Analyzed PR: #48472 at commit cbbec1b
Last updated on: 2026-03-17T15:28:11Z
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cab1a1697b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@joshavant @visionik is this something that follows under your radar? Thanks |
|
@greptileai please review |
|
Thanks for the context here. I swept through the related work, and this is now duplicate or superseded. This PR should stay open as maintainer review material, not as a cleanup close: current main has later gateway auth/status hardening that the stale branch conflicts with, while the distinct timeout/health-probe idea still needs a narrow rebase and real behavior proof before it can land. Canonical path: Close this PR as superseded by #72405. So I’m closing this here and keeping the remaining discussion on #72405. Review detailsBest possible solution: Close this PR as superseded by #72405. Do we have a high-confidence way to reproduce the issue? Yes for the blocking PR regressions: source comparison shows the PR head uses health-only status probing, env-only restart auth, and default probe identity behavior where current main has stronger contracts. I did not live-reproduce the original hot-start timeout symptom. Is this the best way to solve the issue? No, not as currently submitted. The maintainable path is a narrow rebase or replacement that keeps current-main gateway status/auth/security behavior while proving any remaining timeout improvement in a real restart flow. Security review: Security review needs attention: The diff is security-sensitive and would weaken current probe identity and URL-error handling if merged mechanically.
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against f799da09479e. |
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
ClawSweeper applied the proposed close for this PR.
|
Summary
Describe the problem and fix in 2–5 bullets:
statusRPC through the generic gateway call path, which is a poor health signal during hot startup and interacts badly with derived probe URLs.healthprobe; trusted daemon-derived loopback probes can stay device-bound without weakening explicit-auth requirements for user-supplied URL overrides; override-auth helpers were isolated into a small module so the daemon probe path does not need the heavier gateway call module.statuscommand behavior outside the daemon probe/restart-health path.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
None.
User-visible / Behavior Changes
openclaw gateway status/ daemon restart health checks use the lightweighthealthRPC instead of the heavierstatusRPC.Security Impact (required)
Yes/No) NoYes/No) YesYes/No) YesYes/No) NoYes/No) NoYes, explain risk + mitigation:The probe path now distinguishes explicit CLI credentials from daemon-derived probe auth so user-supplied URL overrides cannot silently reuse daemon auth. The network call changes are limited to switching the daemon reachability probe from
statustohealthand keeping trusted derived loopback probes device-bound when appropriate.Repro + Verification
Environment
Steps
Expected
Actual
healthfor the daemon probe path, and passes the focused daemon/gateway tests pluspnpm build,pnpm tsgo, andpnpm checklocally.Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
pnpm test -- src/cli/daemon-cli/probe.test.ts src/gateway/probe.test.ts src/cli/daemon-cli/status.gather.test.ts src/cli/daemon-cli.coverage.test.ts src/cli/daemon-cli/restart-health.test.tspnpm buildpnpm tsgopnpm checkReview Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoFailure Recovery (if this breaks)
Revert the four commits in this PR branch or restore the previous daemon probe path to
callGateway(... method: "status").src/cli/daemon-cli/probe.ts,src/cli/daemon-cli/status.gather.ts,src/gateway/probe.ts,src/gateway/call.ts,src/gateway/explicit-auth.tsRisks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.src/gateway/explicit-auth.tsto keep the daemon probe path off the heavier gateway call module.