Heartbeat: spread interval runs across stable phases#64560
Conversation
🔒 Aisle Security AnalysisWe found 1 potential security issue(s) in this PR:
1. 🟡 Targeted heartbeat wakes can indefinitely postpone scheduled interval heartbeats (scheduler starvation)
Description
As a result, any targeted wake that successfully runs a heartbeat pushes that agent’s next scheduled interval slot out to Impact:
Vulnerable code: agent.nextDueMs =
reason === "interval"
? computeNextHeartbeatPhaseDueMs({ nowMs: now, intervalMs: agent.intervalMs, phaseMs: agent.phaseMs })
: now + agent.intervalMs;RecommendationKeep the phase-aligned schedule independent from targeted/action-driven runs. One approach:
Example (sketch): // state
type HeartbeatAgentState = {
// ...
phaseMs: number;
nextDueMs: number; // next interval slot (phase aligned)
cooldownUntilMs: number; // min time before any run is allowed
};
function advanceAfterRun(agent: HeartbeatAgentState, now: number) {
agent.cooldownUntilMs = now + agent.intervalMs;
agent.nextDueMs = computeNextHeartbeatPhaseDueMs({
nowMs: now,
intervalMs: agent.intervalMs,
phaseMs: agent.phaseMs,
});
}
// When deciding to run on interval ticks:
if (reason === "interval" && (now < agent.nextDueMs || now < agent.cooldownUntilMs)) continue;
// When deciding to run on targeted wakes:
if (now < agent.cooldownUntilMs) return { status: "skipped", reason: "cooldown" };Additionally, consider rate-limiting/authenticating external wake sources (e.g. hook endpoints) so they cannot be abused to manipulate scheduler timing. Analyzed PR: #64560 at commit Last updated on: 2026-04-11T02:41:05Z |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42d50ed944
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| reason === "interval" | ||
| ? computeNextHeartbeatPhaseDueMs({ | ||
| nowMs: now, | ||
| intervalMs: agent.intervalMs, | ||
| phaseMs: agent.phaseMs, |
There was a problem hiding this comment.
Preserve cooldown after non-interval wakes
After a manual/exec/hook wake, the code still sets nextDueMs = now + interval, but the next interval-triggered run immediately snaps back to the hashed phase via computeNextHeartbeatPhaseDueMs, which can schedule another heartbeat far sooner than a full interval (e.g., minutes or milliseconds later depending on phase). Before this commit, subsequent runs stayed on now + interval, so this introduces a cadence regression for action-driven wakes and can unexpectedly increase heartbeat frequency after user/event-triggered runs.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR replaces startup-relative Confidence Score: 5/5Safe to merge — scheduling math is correct, edge cases are handled, and test coverage is thorough. No P0 or P1 findings. The phase derivation, cycle-position arithmetic, config-reload preservation, and cooldown/phase bifurcation all behave correctly. The only possible nit is that resolveHeartbeatPhaseMs reads only 4 bytes of the SHA-256 digest (minor modulo bias for large intervals, negligible for the spreading goal), but this is firmly P2 and does not affect correctness. No files require special attention. Reviews (1): Last reviewed commit: "Heartbeat: spread interval runs across s..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c67169b4ef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return createHash("sha256") | ||
| .update(process.env.HOME ?? "") | ||
| .update("\0") | ||
| .update(process.cwd()) | ||
| .digest("hex"); |
There was a problem hiding this comment.
Use unique fallback scheduler seed per gateway
When loadOrCreateDeviceIdentity() fails (for example on read-only or permission-limited state dirs), the fallback seed is derived only from HOME and cwd. In many containerized/service deployments those values are identical across replicas, so different gateways end up with the same schedulerSeed and therefore the same per-agent phase, re-synchronizing interval heartbeats and negating this commit’s load-spreading goal under exactly the degraded path this fallback is meant to handle.
Useful? React with 👍 / 👎.
1f3d716 to
774ede6
Compare
|
Merged via squash.
Thanks @odysseus0! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 774ede6408
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| prev && | ||
| prev.intervalMs === intervalMs && | ||
| prev.phaseMs === phaseMs && | ||
| prev.nextDueMs > params.nowMs |
There was a problem hiding this comment.
Keep missed due slot when heartbeat config hot-reloads
resolveNextHeartbeatDueMs only preserves the previous schedule when prev.nextDueMs > nowMs. On hot reloads (src/gateway/server-reload-handlers.ts:87-89 calls heartbeatRunner.updateConfig for heartbeat-relevant config changes), if reload occurs just after a due boundary, this logic recomputes from nowMs and skips the overdue run until the next phase slot, delaying heartbeats by up to a full interval. Before this change, lastRunMs + interval allowed immediate catch-up for overdue slots.
Useful? React with 👍 / 👎.
Merged via squash. Prepared head SHA: 774ede6 Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Reviewed-by: @odysseus0
Merged via squash. Prepared head SHA: 774ede6 Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Reviewed-by: @odysseus0
Merged via squash. Prepared head SHA: 774ede6 Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Reviewed-by: @odysseus0
Merged via squash. Prepared head SHA: 774ede6 Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Reviewed-by: @odysseus0
Merged via squash. Prepared head SHA: 774ede6 Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Reviewed-by: @odysseus0
Merged via squash. Prepared head SHA: 774ede6 Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Reviewed-by: @odysseus0
Summary
30m, and targeted/action-driven wakes keep the existing cooldown behavior.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
now + intervalfor initial and follow-up interval heartbeats, so each gateway stayed pinned to startup-relative times instead of being distributed across the interval.Regression Test Plan (if applicable)
src/infra/heartbeat-schedule.test.ts,src/infra/heartbeat-runner.scheduler.test.tssrc/infra/heartbeat-runner.scheduler.test.tscovered the old interval scheduler behavior.User-visible / Behavior Changes
30m.Diagram (if applicable)
Security Impact (required)
Yes, explain risk + mitigation:Repro + Verification
Environment
agents.defaults.heartbeat.every="30m"Steps
Expected
Actual
Evidence
Human Verification (required)
now + intervalbehavior.pnpm buildcurrently fails on an unrelated baseline issue resolvingacpx/runtimefromextensions/acpx/src/runtime.ts.Review Conversations
Compatibility / Migration
Risks and Mitigations