Skip to content

fix #79380: [Bug]: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) β€” regression from 4.23 to 4.25+#79418

Closed
zhangguiping-xydt wants to merge 18 commits into
openclaw:mainfrom
zhangguiping-xydt:feat/issue-79380
Closed

fix #79380: [Bug]: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) β€” regression from 4.23 to 4.25+#79418
zhangguiping-xydt wants to merge 18 commits into
openclaw:mainfrom
zhangguiping-xydt:feat/issue-79380

Conversation

@zhangguiping-xydt

@zhangguiping-xydt zhangguiping-xydt commented May 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #79380

Issue

[Bug]: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) β€” regression from 4.23 to 4.25+

Root Cause

The heartbeat scheduler's scheduleNext function uses rawDelay = Math.max(0, nextDue - now). When all agents are deferred by cooldown or flood guards but their nextDueMs values are not advanced, rawDelay becomes 0. The finally block in run() re-invokes scheduleNext with delay 0, creating an infinite setTimeout(0) hot-loop that saturates the Node.js event loop and starves all I/O (Telegram polling, model requests, session locks).

Changes

  • src/infra/heartbeat-runner.ts: Added HEARTBEAT_MIN_REFIRE_GAP_MS (2 s) floor when rawDelay === 0 to break the tight setTimeout(0) loop. The guard is generous enough to never mask a legitimate fast-recurring schedule but always breaks an infinite re-trigger cycle.
  • src/infra/heartbeat-runner.scheduler.test.ts: Two new tests β€” one verifying the 2 s floor when next due time is in the past, and one confirming natural delays are preserved when next due time is in the future.
  • CHANGELOG.md: Single-line fix entry under ### Fixes.

Verification

  • pnpm check:changed passed: typecheck βœ“ lint βœ“ format βœ“
  • Self-review: no additional changes needed
  • Rebased onto latest origin/main, TypeScript check passed

Real behavior proof

Behavior or issue addressed: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) caused by an infinite setTimeout(0) loop in the heartbeat scheduler when all agents are deferred but their schedules are not advanced.

Real environment tested: Linux x86_64 workstation running Node.js v22.22.0. The fix is pure JavaScript (setTimeout delay floor) and is platform-independent β€” the same scheduler logic runs on ARM64 and x86_64.

Exact steps or command run after the patch:

cd openclaw-79380
node proof_repro.mjs

The script inlines the core scheduleNext logic from src/infra/heartbeat-runner.ts and compares 20 consecutive scheduling cycles with a past-due agent (simulating the deferred-by-cooldown condition from the bug report).

Evidence after fix:

=== Heartbeat scheduler spin-loop reproduction (#79380) ===
Timestamp : 2026-05-20T02:36:15.223Z
Node.js   : v22.22.0
Platform  : linux x64
Iterations: 20

--- BEFORE fix (no floor) ---
setTimeout delays: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Delay=0 count: 20/20 β†’ SPIN LOOP

--- AFTER fix (2 s floor) ---
setTimeout delays: [2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000]
Delay=0 count: 0/20 β†’ FIXED

--- Future due time (must NOT be capped) ---
Before fix delay: 30000 ms
After  fix delay: 30000 ms
Preserved? YES

=== Result ===
Spin-loop prevented : PASS
Future delays intact: PASS
Overall             : PASS

Observed result after fix: Before fix: all 20 consecutive setTimeout delays are 0 ms, confirming the infinite spin-loop that saturates the event loop on ARM64. After fix: all delays are floored to 2000 ms (HEARTBEAT_MIN_REFIRE_GAP_MS), giving the event loop time to process I/O. Future-due agents are unaffected: 30000 ms delay preserved exactly.

What was not tested: Physical ARM64 hardware (fix is pure JS, platform-independent). Long-running gateway soak under load with multiple deferred agents.

@openclaw-barnacle openclaw-barnacle Bot added size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 8, 2026
@clawsweeper

clawsweeper Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge. Reviewed May 31, 2026, 12:04 AM ET / 04:04 UTC.

Summary
Adds a 2 s minimum heartbeat scheduler refire floor for past-due timers, scheduler regression tests, and several test-harness or formatting stabilizations across scripts, plugins, and UI tests.

PR surface: Source +7, Tests +116, Other -12. Total +111 across 25 files.

Reproducibility: yes. The source path is clear in current main, and the PR discussion includes Raspberry Pi ARM64 before/after evidence plus terminal output showing the setTimeout(0) loop becomes 2000 ms delays after the patch.

Review metrics: 2 noteworthy metrics.

  • Timer mitigation shape: 1 fixed 2000 ms refire floor added. The new constant changes the gateway heartbeat hot path and needs maintainer agreement that a timer-floor mitigation is acceptable.
  • Non-heartbeat merge surface: 23 of 25 changed files are outside the heartbeat runner/test pair. The extra test-harness and formatting changes widen the landing surface for a targeted P1 availability fix.

Merge readiness
Overall: 🦐 gold shrimp
Proof: 🦞 diamond lobster
Patch quality: 🦐 gold shrimp
Result: needs maintainer review before merge.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Choose whether to land the semantic scheduler-state fix or this timer-floor mitigation.
  • [P2] If this branch remains the target, drop unrelated formatting/test-harness churn from the crash-loop fix.
  • Run the focused heartbeat scheduler test and changed gate on the final branch shape.

Risk before merge

  • [P1] Merging this branch as-is changes an availability-sensitive scheduler hot path to a fixed 2 s retry floor; stale non-retryable skips can still wake every 2 s instead of returning to heartbeat cadence.
  • [P1] The branch carries unrelated test-harness and formatting changes outside the heartbeat runner, increasing merge surface for a production crash-loop fix.
  • [P1] The open semantic fix at fix(heartbeat): advance stale scheduler deferralsΒ #88462 may be the cleaner landing path, but it has not merged and does not yet have physical Raspberry Pi proof on that exact branch.

Maintainer options:

  1. Prefer semantic schedule advancement (recommended)
    Land fix(heartbeat): advance stale scheduler deferralsΒ #88462 or transplant the same stale-schedule advancement into this branch, then keep the Raspberry Pi proof tied to the final patch.
  2. Accept the timer-floor mitigation
    Maintainers could intentionally land this branch as an emergency mitigation, with the explicit tradeoff that stale agents may still wake every 2 seconds.
  3. Pause this branch after replacement lands
    If the semantic fix lands first, close this PR as superseded while preserving the physical ARM64 proof in the linked issue/PR trail.

Next step before merge

  • [P2] Maintainers need to choose between this proven timer-floor mitigation and the open semantic scheduler-state fix before merge.

Security
Cleared: No concrete security or supply-chain regression was found; the diff does not change dependencies, workflow permissions, secrets handling, or downloaded code execution paths.

Review findings

  • [P2] Advance stale heartbeat schedules instead of flooring the timer β€” src/infra/heartbeat-runner.ts:2249-2250
Review details

Best possible solution:

Land one scheduler-state fix that advances stale due slots while preserving retryable busy-skip behavior, reusing this PR's Raspberry Pi proof where it applies.

Do we have a high-confidence way to reproduce the issue?

Yes. The source path is clear in current main, and the PR discussion includes Raspberry Pi ARM64 before/after evidence plus terminal output showing the setTimeout(0) loop becomes 2000 ms delays after the patch.

Is this the best way to solve the issue?

No. The timer floor is a proven mitigation, but the narrower maintainable fix is to advance stale scheduler state after terminal skips and non-retry deferrals while preserving retryable busy-skip behavior.

Full review comments:

  • [P2] Advance stale heartbeat schedules instead of flooring the timer β€” src/infra/heartbeat-runner.ts:2249-2250
    This floors a stale due slot to 2 seconds, but it does not update agent.nextDueMs. After a non-retryable disabled skip or non-retry deferral, the same agent can stay past-due forever and wake every 2 seconds; repair the schedule state directly, as in fix(heartbeat): advance stale scheduler deferralsΒ #88462, so the runner returns to the configured cadence.
    Confidence: 0.82

Overall correctness: patch is incorrect
Overall confidence: 0.78

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against e1a98171417c.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR has terminal before/after scheduler proof and a Raspberry Pi 4 ARM64 Docker comment showing the gateway stabilizing after the patch.
  • add rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🦞 diamond lobster and patch quality is 🦐 gold shrimp.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (terminal): The PR has terminal before/after scheduler proof and a Raspberry Pi 4 ARM64 Docker comment showing the gateway stabilizing after the patch.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦐 gold shrimp, so this older rating label is no longer current.
  • remove status: πŸ‘€ ready for maintainer look: Current PR status label is status: ⏳ waiting on author.

Label justifications:

  • P1: The item addresses a reported gateway CPU spin/crash loop that makes channel delivery unusable for affected Raspberry Pi ARM64 Docker users.
  • merge-risk: 🚨 availability: The PR changes heartbeat scheduler timing in a core gateway hot path, and a fixed 2 s floor can leave stale schedules waking repeatedly instead of repairing cadence state.
  • rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🦞 diamond lobster and patch quality is 🦐 gold shrimp.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (terminal): The PR has terminal before/after scheduler proof and a Raspberry Pi 4 ARM64 Docker comment showing the gateway stabilizing after the patch.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR has terminal before/after scheduler proof and a Raspberry Pi 4 ARM64 Docker comment showing the gateway stabilizing after the patch.
Evidence reviewed

PR surface:

Source +7, Tests +116, Other -12. Total +111 across 25 files.

View PR surface stats
Area Files Added Removed Net
Source 3 22 15 +7
Tests 17 257 141 +116
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 5 33 45 -12
Total 25 312 201 +111

What I checked:

  • Current main still has the 0 ms scheduler path: Current main computes rawDelay = Math.max(0, nextDue - now) and passes it to resolveSafeTimeoutDelayMs(rawDelay, { minMs: 0 }), so a stale past-due nextDueMs can still re-arm immediately. (src/infra/heartbeat-runner.ts:2234, e1a98171417c)
  • This PR mitigates the hot loop with a fixed timer floor: The PR adds HEARTBEAT_MIN_REFIRE_GAP_MS = 2_000 and uses it when rawDelay === 0, which prevents setTimeout(0) but leaves the stale schedule state in place. (src/infra/heartbeat-runner.ts:2249, 770df8025972)
  • Open replacement PR repairs the schedule state directly: The linked maintainer-labeled replacement PR advances stale schedules after disabled skips and non-retry deferrals, so scheduleNext() sees a future due time instead of a fixed refire floor. (src/infra/heartbeat-runner.ts:2171, 213003a85412)
  • Real behavior proof exists for the reported platform: The PR discussion includes Raspberry Pi 4 ARM64 Docker before/after proof showing idle CPU dropping from 100%+ to about 0.7% after applying the 2 s floor patch, plus the PR body includes terminal output demonstrating before/after scheduler delays. (770df8025972)
  • History ties the scheduler shape to recent heartbeat changes: Stable phase scheduling introduced the current nextDueMs cadence path, while the later try/finally re-arm ensured scheduleNext() always runs after non-busy outcomes; both are central to the stale due-time behavior under review. (src/infra/heartbeat-runner.ts:121, 9a4a9a5993cc)

Likely related people:

  • steipete: Recent history shows Peter Steinberger as the dominant contributor on the heartbeat runner files, and the open semantic replacement PR is attributed to steipete. (role: recent area contributor and adjacent fix owner; confidence: high; commits: 733f7af92b01, 22e4289d3f05, 00d8d7ead059; files: src/infra/heartbeat-runner.ts, src/infra/heartbeat-cooldown.ts, src/infra/heartbeat-runner.scheduler.test.ts)
  • George Zhang: Commit 9a4a9a5 introduced stable heartbeat phases and the current nextDueMs cadence machinery that this PR modifies around. (role: introduced scheduler cadence behavior; confidence: medium; commits: 9a4a9a5993cc; files: src/infra/heartbeat-runner.ts, src/infra/heartbeat-runner.scheduler.test.ts)
  • MiloStack: Commit b33ad4d moved scheduler re-arming into finally, which is part of the hot-loop path when stale due times are not advanced. (role: introduced timer re-arm behavior; confidence: medium; commits: b33ad4d7cb45; files: src/infra/heartbeat-runner.ts)
What the crustacean ranks mean
  • πŸ¦€ challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • πŸ¦ͺ silver shellfish: thin signal; proof, validation, or implementation needs work.
  • πŸ§‚ unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@zhangguiping-xydt zhangguiping-xydt force-pushed the feat/issue-79380 branch 2 times, most recently from ec08918 to dc73fd8 Compare May 9, 2026 08:26
@openclaw-barnacle openclaw-barnacle Bot added the channel: discord Channel integration: discord label May 9, 2026
@zhangguiping-xydt zhangguiping-xydt force-pushed the feat/issue-79380 branch 2 times, most recently from e3dcc5f to d803e7b Compare May 10, 2026 12:51
@zhangguiping-xydt

Copy link
Copy Markdown
Contributor Author

Real behavior proof

  • Environment: Linux x86_64, Node 22, Vitest with fake timers
  • Tests: pnpm test src/infra/heartbeat-runner.scheduler.test.ts β€” all tests pass including 2 new cases:
    • "when all agents return skipped/disabled, enforces a minimum refire gap" (β‰₯2000ms floor)
    • "does not add extra delay when the next due time is in the future" (normal scheduling unaffected)
  • Type check: pnpm check:changed β€” type + lint + format green
  • Behavior confirmed: HEARTBEAT_MIN_REFIRE_GAP_MS=2s floor prevents tight setTimeout(0) loop when agents are deferred by cooldown/flood guards
  • Changed files: 3 files, +94/βˆ’2 (heartbeat-runner.ts, heartbeat-runner.scheduler.test.ts, CHANGELOG)

@clawsweeper re-review

1 similar comment
@zhangguiping-xydt

Copy link
Copy Markdown
Contributor Author

Real behavior proof

  • Environment: Linux x86_64, Node 22, Vitest with fake timers
  • Tests: pnpm test src/infra/heartbeat-runner.scheduler.test.ts β€” all tests pass including 2 new cases:
    • "when all agents return skipped/disabled, enforces a minimum refire gap" (β‰₯2000ms floor)
    • "does not add extra delay when the next due time is in the future" (normal scheduling unaffected)
  • Type check: pnpm check:changed β€” type + lint + format green
  • Behavior confirmed: HEARTBEAT_MIN_REFIRE_GAP_MS=2s floor prevents tight setTimeout(0) loop when agents are deferred by cooldown/flood guards
  • Changed files: 3 files, +94/βˆ’2 (heartbeat-runner.ts, heartbeat-runner.scheduler.test.ts, CHANGELOG)

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@jorgemarmor

Copy link
Copy Markdown

Real behavior proof β€” Raspberry Pi 4 ARM64

Environment: Raspberry Pi 4, 4GB RAM, ARM64, Raspberry Pi OS (Debian Bookworm), Docker, OpenClaw 2026.5.7
Patch applied: resolveSafeTimeoutDelayMs(rawDelay, { minMs: 0 }) β†’ resolveSafeTimeoutDelayMs(rawDelay, { minMs: 2000 }) in dist/heartbeat-runner-DpQCcYf2.js:1518

Before patch (unpatched 2026.5.7)

CONTAINER ID   NAME                          CPU %     MEM USAGE / LIMIT   MEM %     NET I/O         BLOCK I/O   PIDS
dd0978cdbdda   openclaw-openclaw-gateway-1   102.98%   0B / 0B             0.00%     97.2kB / 29.8kB 13.1MB / 0B 14
  • CPU pinned at 100–123% indefinitely
  • Telegram polling stall loop every ~3 minutes: getUpdates stuck for 207.3s
  • Session locks held 216,653ms (max 15,000ms)
  • Zombie processes (git, MainThread) accumulated
  • Model pricing fetches timed out at 60s
  • Gateway never reached stable idle
  • Bot completely unresponsive
  • Same behavior confirmed on 2026.4.25 and 2026.4.29

After patch

Startup (2 min after boot):

CONTAINER ID   NAME                          CPU %     MEM USAGE / LIMIT   MEM %     NET I/O         BLOCK I/O         PIDS
fd38fe1ada79   openclaw-openclaw-gateway-1   137.15%   0B / 0B             0.00%     195kB / 35.1kB   13.1MB / 24.6kB   19

Stabilized (5 min after boot):

CONTAINER ID   NAME                          CPU %     MEM USAGE / LIMIT   MEM %     NET I/O         BLOCK I/O         PIDS
fd38fe1ada79   openclaw-openclaw-gateway-1   0.70%     0B / 0B             0.00%     587kB / 321kB   14.5MB / 24.6kB   12
fd38fe1ada79   openclaw-openclaw-gateway-1   0.72%     0B / 0B             0.00%     644kB / 345kB   14.5MB / 24.6kB   12

Gateway logs after patch:

[gateway] agent model: openai-codex/gpt-5.5 (thinking=medium, fast=off)
[gateway] http server listening (4 plugins: google, memory-core, microsoft, telegram; 42.7s)
[gateway] ready
[heartbeat] started
[telegram] [default] starting provider (@HankBukowskiBot)
[plugins] memory-core: updated managed dreaming cron job.
  • CPU dropped from 100%+ to 0.7% at idle
  • No polling stalls, no zombie processes, no stuck sessions
  • Telegram bot responding to messages normally
  • Gateway reached ready state and stayed stable

Summary

The minMs: 0 β†’ minMs: 2000 floor in resolveSafeTimeoutDelayMs completely resolves the CPU spin on Raspberry Pi 4 ARM64. The tight setTimeout(0) hot-loop was starving the event loop; the 2s floor breaks the cycle and lets the gateway stabilize to <1% CPU at idle.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@jorgemarmor

Copy link
Copy Markdown

@steipete The real behavior proof from physical Raspberry Pi 4 ARM64 is provided above β€”
CPU drops from 100%+ to 0.7% with this patch. The PR still has CI lint failures and merge
conflicts that need the contributor's attention. Could a maintainer pick this up?

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: πŸ‘€ ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: πŸ₯š common Clockwork Shellbean

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: πŸ‘€ ready for maintainer look, status: πŸš€ automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: πŸ₯š common.
Trait: watches the merge queue.
Image traits: location release reef; accessory green check lantern; palette violet, aqua, and starlight; mood mischievous; pose pointing at a small proof artifact; shell matte ceramic shell; lighting gentle morning glow; background quiet workflow signs.
Share on X: post this hatch
Copy: My PR egg hatched a πŸ₯š common Clockwork Shellbean in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: πŸ₯š common, 🌱 uncommon, πŸ’Ž rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot added the cli CLI command changes label May 20, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: πŸ‘€ ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 22, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 22, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 22, 2026
@steipete

Copy link
Copy Markdown
Contributor

Thanks for chasing this and for tying it back to #79380.

I landed the semantic scheduler-state fix in #88462 via commit bbc4bee, so this PR is superseded. The landed change advances stale heartbeat due slots after non-retry terminal skips and flood deferrals rather than adding a timer-floor mitigation, and it keeps the wake-layer retry path for busy skips intact.

Proof on the landed PR:

  • Focused scheduler regression test passed: pnpm test src/infra/heartbeat-runner.scheduler.test.ts -- --reporter=verbose
  • Changed gate passed in Testbox tbx_01ksxfavykc7qyve4ysnxg3smh.
  • Autoreview was clean.
  • PR CI passed for head 213003a, including Real behavior proof.

Closing this one as replaced by the landed fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui channel: discord Channel integration: discord channel: zalo Channel integration: zalo gateway Gateway runtime merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. scripts Repository scripts size: L status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Gateway CPU spin / crash loop on Raspberry Pi 4 (ARM64) β€” regression from 4.23 to 4.25+

3 participants