Skip to content

fix: guarantee heartbeat timer re-arm with try/finally#52270

Merged
grp06 merged 2 commits into
openclaw:mainfrom
MiloStack:fix/heartbeat-timer-rearm
Mar 27, 2026
Merged

fix: guarantee heartbeat timer re-arm with try/finally#52270
grp06 merged 2 commits into
openclaw:mainfrom
MiloStack:fix/heartbeat-timer-rearm

Conversation

@MiloStack

@MiloStack MiloStack commented Mar 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Wraps the heartbeat run() body in try/finally so scheduleNext() is called on every exit path
  • Removes all manual scheduleNext() calls inside run() — the finally block is the single re-arm point
  • Preserves the requests-in-flight exemption (wake layer handles retry for this case)

Root Cause

run() has multiple exit paths that skip scheduleNext(), permanently killing the heartbeat timer. The most common trigger is the requests-in-flight early return at line 1141, but any unhandled rejection in the async chain can also break the re-arm.

The cron system doesn't have this bug because armTimer() is always called in the onTimer() catch handler and uses a single-layer timer design.

Why not remove .unref()?

PRs #31226 and #31161 proposed removing .unref() from the timer. That's the wrong fix — .unref() is the correct Node.js pattern for background schedulers. The timer dies because scheduleNext() is never called, not because .unref() causes the timer to be GC'd. We confirmed this by patching out .unref() and observing the timer still doesn't fire.

requests-in-flight handling

PR #47107 proposed a similar try/finally but was flagged by Greptile for a regression: finally { scheduleNext() } causes a tight spin loop when requests are in flight (delay=0 since nextDueMs already passed).

This PR uses a requestsInFlight flag to skip the re-arm for that specific case. The wake layer (heartbeat-wake.ts) already handles retry scheduling via schedule(DEFAULT_RETRY_MS, "retry") — calling scheduleNext() would race with the wake layer's 1s retry and bypass the cooldown.

The existing tests (heartbeat-runner.scheduler.test.ts) cover both the retry and schedule-preservation behavior for requests-in-flight.

Testing

Tested on OpenClaw 2026.3.14-dev, Node 24.14.0, Docker, 5 agents:

  • Set heartbeat interval to 5m, disabled external watchdog
  • First heartbeat fired at exactly the 5m mark
  • Multiple agents running and communicating autonomously
  • Timer re-armed after each cycle (confirmed via debug logging)
  • Ran for 30+ minutes without timer death

Fixes #31139
Fixes #45772

@greptile-apps

greptile-apps Bot commented Mar 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a long-standing heartbeat timer liveness bug by wrapping the run() body in a try/finally block so scheduleNext() is guaranteed to be called on every exit path. Previously, the requests-in-flight early return at line 1141 (and any unhandled rejection in the async chain) would silently skip scheduleNext(), permanently killing the interval timer. The fix is sound and well-scoped.

Key changes:

  • A single finally { if (!requestsInFlight) scheduleNext(); } replaces all prior manual scheduleNext() call-sites inside run()
  • A new requestsInFlight boolean correctly gates the re-arm for the requests-in-flight case, where the wake layer already schedules a 1 s retry — preventing a spin-loop race
  • scheduleNext() itself already calls clearTimeout on any live timer (lines 982–984), so there is no risk of stacking duplicate timers when finally fires
  • Early-exit guards at the top of run() (stopped / heartbeats disabled / no agents) remain outside the try block and do not call scheduleNext(), matching the previous behaviour; the updateConfig path re-arms the timer when conditions change

Confidence Score: 5/5

  • This PR is safe to merge — it is a clean, targeted fix with no new logic paths and correct handling of all edge cases.
  • The fix is mechanically correct. Every prior explicit scheduleNext() site now reaches the finally block instead, the requestsInFlight flag faithfully preserves the one case where re-arming must be skipped, and the pre-existing timer deduplication in scheduleNext() (clearTimeout guard) prevents any double-firing risk. No new state is introduced beyond the single boolean, and the change is well-covered by the existing scheduler tests the PR description references.
  • No files require special attention.

Reviews (1): Last reviewed commit: "fix: wrap heartbeat run() in try/finally..." | Re-trigger Greptile

@MiloStack

Copy link
Copy Markdown
Contributor Author

@steipete this is a legitimate fix apologies for the tag but i feel like resolving these discord / telegram flapping issues are of utmost importance. Especially after a major Node version bump when you jumped from Node version 22 to 24

@grp06 grp06 self-assigned this Mar 27, 2026
@grp06 grp06 force-pushed the fix/heartbeat-timer-rearm branch from d0ef0b7 to 8810bac Compare March 27, 2026 01:18
@grp06 grp06 force-pushed the fix/heartbeat-timer-rearm branch from 8810bac to 42df914 Compare March 27, 2026 01:27
yelnatscoding and others added 2 commits March 26, 2026 18:28
The heartbeat scheduler silently dies after the first batch because
run() has multiple exit paths that skip scheduleNext(). Once any path
returns without re-arming the timer, heartbeats stop permanently.

The most common trigger is the requests-in-flight early return, but
any unhandled rejection in the async chain between runOnce() and the
manual scheduleNext() calls can also kill the timer.

Fix: wrap the run() body in try/finally so scheduleNext() is called
on every exit path. All manual scheduleNext() calls inside are removed
— the finally block is the single re-arm point.

The requests-in-flight case is excluded via a flag because the wake
layer (heartbeat-wake.ts) already handles retry scheduling for this
case with DEFAULT_RETRY_MS. Calling scheduleNext() here would register
a 0ms timer that races with the wake layer's 1s retry.

Fixes openclaw#31139
Fixes openclaw#45772

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@grp06 grp06 force-pushed the fix/heartbeat-timer-rearm branch from 42df914 to cd0bcc2 Compare March 27, 2026 01:28
@grp06 grp06 merged commit b33ad4d into openclaw:main Mar 27, 2026
3 checks passed
@grp06

grp06 commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

Merged via squash.

Thanks @MiloStack!

pxnt pushed a commit to pxnt/openclaw that referenced this pull request Mar 27, 2026
Merged via squash.

Prepared head SHA: cd0bcc2
Co-authored-by: MiloStack <265805734+MiloStack@users.noreply.github.com>
Co-authored-by: grp06 <1573959+grp06@users.noreply.github.com>
Reviewed-by: @grp06
godlin-gh pushed a commit to YouMindInc/openclaw that referenced this pull request Mar 27, 2026
Merged via squash.

Prepared head SHA: cd0bcc2
Co-authored-by: MiloStack <265805734+MiloStack@users.noreply.github.com>
Co-authored-by: grp06 <1573959+grp06@users.noreply.github.com>
Reviewed-by: @grp06
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
Merged via squash.

Prepared head SHA: cd0bcc2
Co-authored-by: MiloStack <265805734+MiloStack@users.noreply.github.com>
Co-authored-by: grp06 <1573959+grp06@users.noreply.github.com>
Reviewed-by: @grp06
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
Merged via squash.

Prepared head SHA: cd0bcc2
Co-authored-by: MiloStack <265805734+MiloStack@users.noreply.github.com>
Co-authored-by: grp06 <1573959+grp06@users.noreply.github.com>
Reviewed-by: @grp06
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Merged via squash.

Prepared head SHA: cd0bcc2
Co-authored-by: MiloStack <265805734+MiloStack@users.noreply.github.com>
Co-authored-by: grp06 <1573959+grp06@users.noreply.github.com>
Reviewed-by: @grp06
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
Merged via squash.

Prepared head SHA: cd0bcc2
Co-authored-by: MiloStack <265805734+MiloStack@users.noreply.github.com>
Co-authored-by: grp06 <1573959+grp06@users.noreply.github.com>
Reviewed-by: @grp06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

3 participants