Bug type
Regression (worked before, now fails)
Summary
After upgrading from v2026.4.22 to v2026.4.26 (also reproduces on current main at 9bb1e59, which package.json reports as 2026.4.27), the gateway sits at ~100% CPU on its single main thread and stops responding to local probes. Same host, same ~/.openclaw, no config changes — only git checkout differs.
I've seen #74209 and the regression range overlaps, but on my machine the dominant signal in a CPU sample is a fs.stat storm in the JS microtask queue rather than bonjour. Filing separately in case the maintainer wants to triage as the same root cause or as a sibling regression.
Versions
- macOS 26.4 (Mac Studio, ARM64), Node 22 via Homebrew
- Bad:
v2026.4.26 (be8c246), and current main at 9bb1e59 (reports as 2026.4.27)
- Good:
v2026.4.22 (00bd2cf)
Steps to reproduce
Same ~/.openclaw, just switch versions:
git checkout v2026.4.26 && pnpm install && pnpm build
openclaw gateway restart
sleep 30
ps -o stat,%cpu,etime $(pgrep -f 'dist/index.js gateway')
# → R 95-100 sustained
git checkout v2026.4.22 && pnpm install && pnpm build
openclaw gateway restart
sleep 30
ps -o stat,%cpu,etime $(pgrep -f 'dist/index.js gateway')
# → S 2.6
Side-by-side, same host
|
4.26 / main |
4.22 |
ps STAT %CPU |
R 95-100 |
S 2.6 |
eventLoopDelayMaxMs (liveness warning) |
up to 314 866 ms |
none reported |
eventLoopUtilization |
0.95–1.00 |
<0.10 |
curl -m 3 http://127.0.0.1:18789/ |
3 s timeout |
3-9 ms |
| Discord WS lifetime |
closes 1000/zombie every 60-90 s |
stable |
openclaw gateway status |
"Connectivity probe: failed (timeout)" while runtime is "active" |
OK, admin-capable |
A representative liveness warning on main:
[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization
interval=320s eventLoopDelayP99Ms=314874.8 eventLoopDelayMaxMs=314874.8
eventLoopUtilization=0.999 cpuCoreRatio=0.585 active=0 waiting=0 queued=0
i.e. the loop blocked for over 5 minutes with nothing in the active queue.
CPU sample on main (5 s, idle, no incoming traffic)
/usr/bin/sample <gateway-pid> 5 — all 4074 stacks collapse to this single path:
4074 uv__io_poll
+ 4074 uv__async_io
+ 4074 uv__work_done
+ 4074 MakeLibuvRequestCallback<uv_fs_s>::Wrapper
+ 4074 node::fs::AfterStat
+ 4074 MicrotaskQueue::PerformCheckpointInternal
+ 4074 MicrotaskQueue::RunMicrotasks
+ 4074 Builtins_PromiseFulfillReactionJob
+ 4074 AsyncFunctionAwaitResolveClosure
+ 4074 <JIT JS, unsymbolicated>
Every sample is in node::fs::AfterStat. The main thread is consumed resolving promises from a flood of fs.stat calls. The same sample on 4.22 with the same config and data spends almost the whole 5 s in kqueue waiting.
Effect on real usage
Channel messages arrive (Discord WS frames are received), the session enters state=processing and ages indefinitely without a reply — I saw agent:main:discord:direct:oliver reach age=1376s queueDepth=1 before the gateway watchdog killed and restarted the process. No subprocess is involved (no docker, no acpx wrapper, no model API call) — the wedge is purely in-JS work on the main thread.
Workaround
Pin to v2026.4.22. Disabling the OpenClaw Auto-Update cron is recommended so the upgrade doesn't reapply.
Related
Bug type
Regression (worked before, now fails)
Summary
After upgrading from
v2026.4.22tov2026.4.26(also reproduces on currentmainat9bb1e59, which package.json reports as2026.4.27), the gateway sits at ~100% CPU on its single main thread and stops responding to local probes. Same host, same~/.openclaw, no config changes — onlygit checkoutdiffers.I've seen #74209 and the regression range overlaps, but on my machine the dominant signal in a CPU sample is a
fs.statstorm in the JS microtask queue rather than bonjour. Filing separately in case the maintainer wants to triage as the same root cause or as a sibling regression.Versions
v2026.4.26(be8c246), and currentmainat9bb1e59(reports as2026.4.27)v2026.4.22(00bd2cf)Steps to reproduce
Same
~/.openclaw, just switch versions:Side-by-side, same host
mainps STAT %CPUR95-100S2.6eventLoopDelayMaxMs(liveness warning)eventLoopUtilizationcurl -m 3 http://127.0.0.1:18789/openclaw gateway statusA representative liveness warning on
main:i.e. the loop blocked for over 5 minutes with nothing in the active queue.
CPU sample on
main(5 s, idle, no incoming traffic)/usr/bin/sample <gateway-pid> 5— all 4074 stacks collapse to this single path:Every sample is in
node::fs::AfterStat. The main thread is consumed resolving promises from a flood offs.statcalls. The same sample on 4.22 with the same config and data spends almost the whole 5 s inkqueuewaiting.Effect on real usage
Channel messages arrive (Discord WS frames are received), the session enters
state=processingand ages indefinitely without a reply — I sawagent:main:discord:direct:oliverreachage=1376s queueDepth=1before the gateway watchdog killed and restarted the process. No subprocess is involved (no docker, no acpx wrapper, no model API call) — the wedge is purely in-JS work on the main thread.Workaround
Pin to
v2026.4.22. Disabling the OpenClaw Auto-Update cron is recommended so the upgrade doesn't reapply.Related