fix: exponential backoff for heartbeat requests-in-flight retries#31638
Closed
kami-saia wants to merge 2 commits into
Closed
fix: exponential backoff for heartbeat requests-in-flight retries#31638kami-saia wants to merge 2 commits into
kami-saia wants to merge 2 commits into
Conversation
When a heartbeat is skipped due to requests-in-flight, advanceAgentSchedule pushes the next run by the full interval (e.g. 60m) even though the heartbeat never actually executed. This means a transient busy period causes a missed heartbeat with no retry. Replace with exponential backoff: 1m → 2m → 4m → 8m → 16m (5 retries max, ~31m total). If all retries are exhausted, fall back to the normal interval to avoid scheduling a retry right before the next regular heartbeat. Resets on successful run.
Contributor
Greptile SummaryImplements exponential backoff for heartbeat retries when requests are in-flight, replacing the previous behavior of immediately advancing to the next full interval. The implementation adds a Key changes:
Issue found:
Confidence Score: 4/5
Last reviewed commit: 134c656 |
Contributor
Additional Comments (1)
Prompt To Fix With AIThis is a comment left during a code review.
Path: src/infra/heartbeat-runner.ts
Line: 1066-1072
Comment:
`consecutiveSkips` not preserved across config updates - will reset retry counter mid-backoff
```suggestion
nextAgents.set(agent.agentId, {
agentId: agent.agentId,
heartbeat: agent.heartbeat,
intervalMs,
lastRunMs: prevState?.lastRunMs,
nextDueMs,
consecutiveSkips: prevState?.consecutiveSkips,
});
```
How can I resolve this? If you propose a fix, please make it concise. |
…r short intervals - Preserve consecutiveSkips when updateConfig rebuilds agent state (Greptile review) - Disable exponential backoff for intervals < 1h — next regular tick is soon enough
Author
|
Thanks for the catch — this is actually already handled in the current code. Line 1072: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a heartbeat is skipped due to requests-in-flight,
advanceAgentSchedulepushes the next run by the full interval (e.g. 60m) even though the heartbeat never actually executed. A transient busy period causes a completely missed heartbeat with no retry.Fix
Replace the full-interval advance with exponential backoff:
advanceAgentSchedule(normal interval) to avoid scheduling a retry right before the next regular heartbeatconsecutiveSkipsresets to 0 on any successful runChanges
consecutiveSkipsfield toHeartbeatAgentStateadvanceAgentScheduleresetsconsecutiveSkips = 0advanceAgentScheduleSingle file change:
src/infra/heartbeat-runner.ts🤖 AI-assisted (Claude Opus/Sonnet via OpenClaw agent). Fully tested in production.