fix #79380: [Bug]: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) β regression from 4.23 to 4.25+#79418
Conversation
|
Codex review: found issues before merge. Reviewed May 31, 2026, 12:04 AM ET / 04:04 UTC. Summary PR surface: Source +7, Tests +116, Other -12. Total +111 across 25 files. Reproducibility: yes. The source path is clear in current main, and the PR discussion includes Raspberry Pi ARM64 before/after evidence plus terminal output showing the Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Land one scheduler-state fix that advances stale due slots while preserving retryable busy-skip behavior, reusing this PR's Raspberry Pi proof where it applies. Do we have a high-confidence way to reproduce the issue? Yes. The source path is clear in current main, and the PR discussion includes Raspberry Pi ARM64 before/after evidence plus terminal output showing the Is this the best way to solve the issue? No. The timer floor is a proven mitigation, but the narrower maintainable fix is to advance stale scheduler state after terminal skips and non-retry deferrals while preserving retryable busy-skip behavior. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against e1a98171417c. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +7, Tests +116, Other -12. Total +111 across 25 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
ec08918 to
dc73fd8
Compare
e3dcc5f to
d803e7b
Compare
Real behavior proof
@clawsweeper re-review |
1 similar comment
Real behavior proof
@clawsweeper re-review |
d803e7b to
bc08c28
Compare
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
Real behavior proof β Raspberry Pi 4 ARM64Environment: Raspberry Pi 4, 4GB RAM, ARM64, Raspberry Pi OS (Debian Bookworm), Docker, OpenClaw 2026.5.7 Before patch (unpatched 2026.5.7)
After patchStartup (2 min after boot): Stabilized (5 min after boot): Gateway logs after patch:
SummaryThe |
|
@steipete The real behavior proof from physical Raspberry Pi 4 ARM64 is provided above β |
bc08c28 to
5df29c7
Compare
|
ClawSweeper PR egg β¨ Hatched: π₯ common Clockwork Shellbean Hatch commandComment Hatchability rules:
Rarity: π₯ common. What is this egg doing here?
|
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
strict-function-types prevents { peer?: ... } from being assignable to unknown.
Tighten the future-delay preservation test so it verifies the post-run timer keeps the natural heartbeat interval instead of the minimum refire floor.
|
Thanks for chasing this and for tying it back to #79380. I landed the semantic scheduler-state fix in #88462 via commit bbc4bee, so this PR is superseded. The landed change advances stale heartbeat due slots after non-retry terminal skips and flood deferrals rather than adding a timer-floor mitigation, and it keeps the wake-layer retry path for busy skips intact. Proof on the landed PR:
Closing this one as replaced by the landed fix. |
Summary
Fixes #79380
Issue
[Bug]: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) β regression from 4.23 to 4.25+
Root Cause
The heartbeat scheduler's
scheduleNextfunction usesrawDelay = Math.max(0, nextDue - now). When all agents are deferred by cooldown or flood guards but theirnextDueMsvalues are not advanced,rawDelaybecomes 0. Thefinallyblock inrun()re-invokesscheduleNextwith delay 0, creating an infinitesetTimeout(0)hot-loop that saturates the Node.js event loop and starves all I/O (Telegram polling, model requests, session locks).Changes
src/infra/heartbeat-runner.ts: AddedHEARTBEAT_MIN_REFIRE_GAP_MS(2 s) floor whenrawDelay === 0to break the tightsetTimeout(0)loop. The guard is generous enough to never mask a legitimate fast-recurring schedule but always breaks an infinite re-trigger cycle.src/infra/heartbeat-runner.scheduler.test.ts: Two new tests β one verifying the 2 s floor when next due time is in the past, and one confirming natural delays are preserved when next due time is in the future.CHANGELOG.md: Single-line fix entry under### Fixes.Verification
pnpm check:changedpassed: typecheck β lint β format βorigin/main, TypeScript check passedReal behavior proof
Behavior or issue addressed: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) caused by an infinite
setTimeout(0)loop in the heartbeat scheduler when all agents are deferred but their schedules are not advanced.Real environment tested: Linux x86_64 workstation running Node.js v22.22.0. The fix is pure JavaScript (setTimeout delay floor) and is platform-independent β the same scheduler logic runs on ARM64 and x86_64.
Exact steps or command run after the patch:
cd openclaw-79380 node proof_repro.mjsThe script inlines the core
scheduleNextlogic fromsrc/infra/heartbeat-runner.tsand compares 20 consecutive scheduling cycles with a past-due agent (simulating the deferred-by-cooldown condition from the bug report).Evidence after fix:
Observed result after fix: Before fix: all 20 consecutive
setTimeoutdelays are 0 ms, confirming the infinite spin-loop that saturates the event loop on ARM64. After fix: all delays are floored to 2000 ms (HEARTBEAT_MIN_REFIRE_GAP_MS), giving the event loop time to process I/O. Future-due agents are unaffected: 30000 ms delay preserved exactly.What was not tested: Physical ARM64 hardware (fix is pure JS, platform-independent). Long-running gateway soak under load with multiple deferred agents.