fix(gateway): prevent probe timeout from deferred ESM module evaluation#845
Open
BingqingLyu wants to merge 5 commits intomainfrom
Open
fix(gateway): prevent probe timeout from deferred ESM module evaluation#845BingqingLyu wants to merge 5 commits intomainfrom
BingqingLyu wants to merge 5 commits intomainfrom
Conversation
On Windows (and potentially other platforms with slower module evaluation), the auth-profiles ESM bundle triggers deferred synchronous work (primarily AJV schema compilation) that blocks the event loop for ~7 seconds *after* the top-level import promise resolves. The probe's 800ms loopback budget expires during this window because WebSocket data callbacks cannot fire, causing `gateway probe` to always report "timeout" on 2026.3.13. Add `waitForEventLoopReady()` that schedules short timers and watches for abnormal drift, resolving only after two consecutive on-time callbacks. This guarantees deferred module evaluation has finished before opening any network connections. On unaffected systems this adds ~40ms overhead. Fixes: probe timeout regression on Windows after upgrading to 2026.3.13 Related: openclaw#47640, openclaw#47307 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move waitForEventLoopReady into a shared module (event-loop-ready.ts) and call it in executeGatewayRequestWithScopes in addition to probeGateway. This fixes commands like `cron list`, `devices list`, and any other CLI path that goes through callGateway — they hit the same deferred ESM module evaluation stall that was causing probe timeouts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses review feedback: if the event loop remains starved beyond the deadline (default 10 s), resolve anyway so that callers' own timeout logic can take over rather than hanging indefinitely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move event-loop-ready import before method-scopes to satisfy alphabetical import ordering enforced by the formatter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pass the caller-supplied timeoutMs to waitForEventLoopReady so the readiness preflight respects the probe/call timeout budget instead of using the 10 s default. This prevents commands with tight budgets (e.g. 800 ms loopback probe) from exceeding their timeout contract. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gateway probealways reporting timeout on Windows after upgrading to 2026.3.13waitForEventLoopReady()before opening the probe WebSocket to ensure deferred ESM module evaluation has completedRoot cause
The
auth-profilesESM bundle triggers deferred synchronous work (primarily AJV schema compilation) that blocks the Node.js event loop for ~7 seconds after the top-levelimport()promise resolves. This blocking starts after the first event loop cycle completes —setTimeout(0)fires on time, butsetTimeout(100)is delayed by 7+ seconds.The probe's
resolveProbeBudgetMscaps local loopback budget at 800ms and the overall default is 3000ms. Both expire while the event loop is blocked, because the WebSocket'sopen/messagecallbacks cannot fire until the synchronous work finishes.Evidence from debugging on a Windows 10 machine with Node 24.14:
net.connectafter importhttp.requestafter importwsWebSocket after importwsWebSocket without importsetInterval(100)The
gateway statuscommand (which usescallGatewaywith a 10s timeout) was unaffected because its budget outlasts the stall.Fix
waitForEventLoopReady()schedules 20ms timers and checks for abnormal drift (> 200ms). It resolves only after two consecutive on-time callbacks, guaranteeing the deferred evaluation has finished. On systems without the blocking issue, this adds only ~40ms overhead.A longer-term fix would be to lazy-compile AJV schemas instead of evaluating them at module scope, which would eliminate the event loop stall entirely.
Test plan
openclaw gateway probereturnsReachable: yes(21ms latency) on the affected Windows machine after patchprobe.test.tsuses mockedGatewayClient, sowaitForEventLoopReadycompletes instantly — no test breakage expectedRelated issues
Fixes openclaw#45940 — False negative from
openclaw gateway probeon WindowsFixes openclaw#46226 — Gateway probe shows 3000ms budget but uses 800ms internally — false timeout on healthy local loopback
Related openclaw#46316 —
devices list/nodes statustimeout whilegateway statusshowsRPC probe: ok(regression in 2026.3.12/2026.3.13)Related openclaw#46000 — Windows local gateway reissues operator device token without operator.read on 2026.3.13, breaking status/probe/health
Related openclaw#47640, openclaw#47307
https://www.answeroverflow.com/m/1482583046749163692
🤖 Generated with Claude Code