Skip to content

fix(gateway): increase local loopback probe budget to prevent false negative timeouts#45984

Closed
caicongyang wants to merge 1 commit intoopenclaw:mainfrom
caicongyang:fix-probe-v2
Closed

fix(gateway): increase local loopback probe budget to prevent false negative timeouts#45984
caicongyang wants to merge 1 commit intoopenclaw:mainfrom
caicongyang:fix-probe-v2

Conversation

@caicongyang
Copy link
Copy Markdown

Description

The gateway probe was giving local loopback connections only 800ms to connect and authenticate, but the gateway client has a default connectChallengeTimeoutMs of 4000ms. This caused false negative timeouts on Windows where the WebSocket handshake can take longer than expected.

Changes

Changed the local loopback probe budget from Math.min(800, overallMs) to Math.max(4000, Math.min(overallMs, 8000)) to ensure enough time for the client's internal authentication to complete.

Related Issue

Fixes: #45940

Testing

This fix ensures that local loopback probe connections have at least 4 seconds to complete the WebSocket handshake, matching the gateway client's internal timeout.

…egative timeouts

The gateway probe was giving local loopback connections only 800ms
to connect and authenticate, but the gateway client has a default
connectChallengeTimeoutMs of 4000ms. This caused false negative
timeouts on Windows where the WebSocket handshake can take longer.

Changed the local loopback probe budget from 'Math.min(800, overallMs)'
to 'Math.max(4000, Math.min(overallMs, 8000))' to ensure enough
time for the client's internal authentication to complete.

Fixes: openclaw#45940
@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations size: XS labels Mar 14, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 455329d8e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// too short and caused false negative timeouts on Windows.
if (kind === "localLoopback") {
return Math.min(800, overallMs);
return Math.max(4000, Math.min(overallMs, 8000));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep local probe within requested overall timeout

This change makes localLoopback probes ignore small --timeout values by forcing at least 4000ms (Math.max(4000, ...)), so openclaw gateway probe --timeout 1000 can still wait ~4s when localhost is unreachable or slow. Since the CLI advertises this flag as an "Overall probe budget in ms" (src/cli/gateway-cli/register.ts), this is a behavioral regression that can break automation expecting the command to honor the requested upper bound.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 14, 2026

Greptile Summary

This PR increases the local loopback probe budget from Math.min(800, overallMs) to Math.max(4000, Math.min(overallMs, 8000)) to prevent false-negative timeouts on Windows, where the WebSocket handshake can take longer than 800ms to complete.

Changes:

  • src/commands/gateway-status/helpers.ts: Updated resolveProbeBudgetMs for the localLoopback kind, along with an explanatory comment.

Issues found:

  • The new formula allows the probe budget to exceed overallMs when overallMs < 4000. The default overall command timeout is 3000ms (set in gateway-status.ts line 42 via parseTimeoutMs(opts.timeout, 3000)), which means the probe will routinely be given 4000ms — longer than the overall command budget. Since probes run in Promise.all, the command will take at least 4s by default, exceeding the user-visible 3s timeout. Users explicitly passing --timeout values below 4000 will also see their setting silently overridden. The root fix should include raising the default overallTimeoutMs from 3000 to at least 5000ms in gateway-status.ts.

Confidence Score: 2/5

  • This PR partially addresses the Windows timeout issue but introduces a regression where the probe budget can exceed the overall command timeout (3000ms default vs. 4000ms probe minimum), causing the command to silently run longer than expected.
  • The intent is correct — the 800ms budget was indeed too tight relative to the gateway client's 4000ms connectChallengeTimeoutMs. However, the formula Math.max(4000, Math.min(overallMs, 8000)) can return a value greater than overallMs (e.g., 4000ms when the default overall timeout is 3000ms). This breaks the invariant that the probe budget should be bounded by the overall timeout and causes the command to run longer than the configured timeout. A complete fix requires also raising the default overallTimeoutMs in gateway-status.ts.
  • src/commands/gateway-status/helpers.ts (the changed file) and src/commands/gateway-status.ts (where the default overallTimeoutMs of 3000ms needs to be raised to match the new probe minimum).
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/commands/gateway-status/helpers.ts
Line: 124

Comment:
**Probe budget can exceed `overallMs`, ignoring the command timeout**

The new formula `Math.max(4000, Math.min(overallMs, 8000))` allows the probe budget to **exceed** the `overallMs` argument whenever `overallMs < 4000`. The default `overallTimeoutMs` in `gateway-status.ts` is **3000ms** (line 42: `parseTimeoutMs(opts.timeout, 3000)`), so in the typical invocation this returns 4000ms — a probe budget 33% larger than the overall command timeout.

Since all targets are probed in parallel via `Promise.all` (line 155 of `gateway-status.ts`), this means:
- The command will routinely take ~4s to complete instead of the expected ~3s default.
- Users who explicitly pass `--timeout=2000` (or any value < 4000) will see their setting silently overridden and have to wait at least 4 seconds.

The old formula `Math.min(800, overallMs)` always respected `overallMs`. The fix should keep that invariant. One approach is to also raise the default `overallTimeoutMs` alongside the probe budget increase — e.g., change the default from 3000 to 5000ms in `gateway-status.ts` — so there is headroom for the 4000ms probe without violating the overall budget:

```typescript
// In gateway-status.ts, line 42:
const overallTimeoutMs = parseTimeoutMs(opts.timeout, 5000);

// In helpers.ts, keep the probe bounded by overall:
return Math.min(Math.max(4000, overallMs), 8000);
```

Alternatively, clamp the probe budget to `overallMs` while preserving the intent:
```suggestion
    return Math.min(Math.max(4000, overallMs), 8000);
```
(This still gives 4000ms when `overallMs >= 4000`, and for `overallMs < 4000` it returns exactly `overallMs` — which preserves the user timeout while the root issue is better addressed by raising the default overall timeout.)

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 455329d

// too short and caused false negative timeouts on Windows.
if (kind === "localLoopback") {
return Math.min(800, overallMs);
return Math.max(4000, Math.min(overallMs, 8000));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probe budget can exceed overallMs, ignoring the command timeout

The new formula Math.max(4000, Math.min(overallMs, 8000)) allows the probe budget to exceed the overallMs argument whenever overallMs < 4000. The default overallTimeoutMs in gateway-status.ts is 3000ms (line 42: parseTimeoutMs(opts.timeout, 3000)), so in the typical invocation this returns 4000ms — a probe budget 33% larger than the overall command timeout.

Since all targets are probed in parallel via Promise.all (line 155 of gateway-status.ts), this means:

  • The command will routinely take ~4s to complete instead of the expected ~3s default.
  • Users who explicitly pass --timeout=2000 (or any value < 4000) will see their setting silently overridden and have to wait at least 4 seconds.

The old formula Math.min(800, overallMs) always respected overallMs. The fix should keep that invariant. One approach is to also raise the default overallTimeoutMs alongside the probe budget increase — e.g., change the default from 3000 to 5000ms in gateway-status.ts — so there is headroom for the 4000ms probe without violating the overall budget:

// In gateway-status.ts, line 42:
const overallTimeoutMs = parseTimeoutMs(opts.timeout, 5000);

// In helpers.ts, keep the probe bounded by overall:
return Math.min(Math.max(4000, overallMs), 8000);

Alternatively, clamp the probe budget to overallMs while preserving the intent:

Suggested change
return Math.max(4000, Math.min(overallMs, 8000));
return Math.min(Math.max(4000, overallMs), 8000);

(This still gives 4000ms when overallMs >= 4000, and for overallMs < 4000 it returns exactly overallMs — which preserves the user timeout while the root issue is better addressed by raising the default overall timeout.)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/gateway-status/helpers.ts
Line: 124

Comment:
**Probe budget can exceed `overallMs`, ignoring the command timeout**

The new formula `Math.max(4000, Math.min(overallMs, 8000))` allows the probe budget to **exceed** the `overallMs` argument whenever `overallMs < 4000`. The default `overallTimeoutMs` in `gateway-status.ts` is **3000ms** (line 42: `parseTimeoutMs(opts.timeout, 3000)`), so in the typical invocation this returns 4000ms — a probe budget 33% larger than the overall command timeout.

Since all targets are probed in parallel via `Promise.all` (line 155 of `gateway-status.ts`), this means:
- The command will routinely take ~4s to complete instead of the expected ~3s default.
- Users who explicitly pass `--timeout=2000` (or any value < 4000) will see their setting silently overridden and have to wait at least 4 seconds.

The old formula `Math.min(800, overallMs)` always respected `overallMs`. The fix should keep that invariant. One approach is to also raise the default `overallTimeoutMs` alongside the probe budget increase — e.g., change the default from 3000 to 5000ms in `gateway-status.ts` — so there is headroom for the 4000ms probe without violating the overall budget:

```typescript
// In gateway-status.ts, line 42:
const overallTimeoutMs = parseTimeoutMs(opts.timeout, 5000);

// In helpers.ts, keep the probe bounded by overall:
return Math.min(Math.max(4000, overallMs), 8000);
```

Alternatively, clamp the probe budget to `overallMs` while preserving the intent:
```suggestion
    return Math.min(Math.max(4000, overallMs), 8000);
```
(This still gives 4000ms when `overallMs >= 4000`, and for `overallMs < 4000` it returns exactly `overallMs` — which preserves the user timeout while the root issue is better addressed by raising the default overall timeout.)

How can I resolve this? If you propose a fix, please make it concise.

@BunsDev
Copy link
Copy Markdown
Member

BunsDev commented Apr 25, 2026

Closing this as already fixed/superseded by #47533.

Current main implements the safer version of this fix: active local loopback and explicit loopback probes honor the caller timeout, inactive local loopback probes stay fast, and the probe budget remains bounded by the caller-visible budget. That avoids the regression in this branch where Math.max(4000, Math.min(overallMs, 8000)) can exceed overallMs for the default 3000ms timeout or for an explicit low --timeout.

Maintainer verification on current main:

  • pnpm test src/commands/gateway-status/helpers.test.ts
  • pnpm test src/commands/gateway-status.test.ts src/gateway/handshake-timeouts.test.ts src/gateway/client.watchdog.test.ts

@BunsDev BunsDev added the close:already-fixed claw-marshal label label Apr 25, 2026
@BunsDev BunsDev closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

close:already-fixed claw-marshal label commands Command implementations size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

False negative from openclaw gateway probe on Windows

2 participants