Skip to content

Commit 9a4a9a5

Browse files
authored
Heartbeat: spread interval runs across stable phases (#64560)
Merged via squash. Prepared head SHA: 774ede6 Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Co-authored-by: odysseus0 <8635094+odysseus0@users.noreply.github.com> Reviewed-by: @odysseus0
1 parent e11d902 commit 9a4a9a5

5 files changed

Lines changed: 270 additions & 44 deletions

File tree

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,12 +115,20 @@ Docs: https://docs.openclaw.ai
115115
- Daemon/gateway: prevent systemd restart storms on configuration errors by exiting with `EX_CONFIG` and adding generated unit restart-prevention guards. (#63913) Thanks @neo1027144-creator.
116116
- Agents/exec: prevent gateway crash ("Agent listener invoked outside active run") when a subagent exec tool produces stdout/stderr after the agent run has ended or been aborted. (#62821) Thanks @openperf.
117117
- Gateway/OpenAI compat: return real `usage` for non-stream `/v1/chat/completions` responses, emit the final usage chunk when `stream_options.include_usage=true`, and bound usage-gated stream finalization after lifecycle end. (#62986) Thanks @Lellansin.
118+
- Matrix/migration: keep packaged warning-only crypto migrations from being misclassified as actionable when only helper chunks are present, so startup and doctor stay on the warning-only path instead of creating unnecessary migration snapshots. (#64373) Thanks @gumadeiras.
119+
- Matrix/ACP thread bindings: preserve canonical room casing and parent conversation routing during ACP session spawn so mixed-case room ids bind correctly from top-level rooms and existing Matrix threads. (#64343) Thanks @gumadeiras.
118120
- Agents/subagents: deduplicate delivered completion announces so retry or re-entry cleanup does not inject duplicate internal-context completion turns into the parent session. (#61525) Thanks @100yenadmin.
119121
- Agents/exec: keep sandboxed `tools.exec.host=auto` sessions from honoring per-call `host=node` or `host=gateway` overrides while a sandbox runtime is active, and stop advertising node routing in that state so exec stays on the sandbox host. (#63880)
120122
- Agents/subagents: preserve archived delete-mode runs until `sessions.delete` succeeds and prevent overlapping archive sweeps from duplicating in-flight cleanup attempts. (#61801) Thanks @100yenadmin.
121123
- Cron/isolated agent: run scheduled agent turns as non-owner senders so owner-only tools stay unavailable during cron execution. (#63878)
122124
- Discord/sandbox: include `image` in sandbox media param normalization so Discord event cover images cannot bypass sandbox path rewriting. (#64377) Thanks @mmaps.
123125
- Agents/exec: extend exec completion detection to cover local background exec formats so the owner-downgrade fires correctly for all exec paths. (#64376) Thanks @mmaps.
126+
- Security/dependencies: pin axios to 1.15.0 and add a plugin install dependency denylist that blocks known malicious packages before install. (#63891) Thanks @mmaps.
127+
- Browser/security: apply three-phase interaction navigation guard to pressKey and type(submit) so delayed JS redirects from keypress cannot bypass SSRF policy. (#63889) Thanks @mmaps.
128+
129+
- Browser/security: guard existing-session Chrome MCP interaction routes with SSRF post-checks so delayed navigation from click, type, press, and evaluate cannot bypass the configured policy. (#64370) Thanks @eleqtrizit.
130+
- Browser/security: default browser SSRF policy to strict mode so unconfigured installs block private-network navigation, and align external-content marker span mapping so ZWS-injected boundary spoofs are fully sanitized. (#63885) Thanks @eleqtrizit.
131+
- Browser/security: apply SSRF navigation policy to subframe document navigations so iframe-targeted private-network hops are blocked without quarantining the parent page. (#64371) Thanks @eleqtrizit.
124132
- Hooks/security: mark agent hook system events as untrusted and sanitize hook display names before cron metadata reuse. (#64372) Thanks @eleqtrizit.
125133
- Daemon/launchd: keep `openclaw gateway stop` persistent without uninstalling the macOS LaunchAgent, re-enable it on explicit restart or repair, and harden launchd label handling. (#64447) Thanks @ngutman.
126134
- Plugins/context engines: preserve `plugins.slots.contextEngine` through normalization and keep explicitly selected workspace context-engine plugins enabled, so loader diagnostics and plugin activation stop dropping that slot selection. (#64192) Thanks @hclsys.
@@ -131,6 +139,7 @@ Docs: https://docs.openclaw.ai
131139
- Media/security: honor sender-scoped `toolsBySender` policy for outbound host-media reads so denied senders cannot trigger host file disclosure via attachment hydration. (#64459) Thanks @eleqtrizit.
132140
- Browser/security: reject strict-policy hostname navigation unless the hostname is an explicit allowlist exception or IP literal, and route CDP HTTP discovery through the pinned SSRF fetch path. (#64367) Thanks @eleqtrizit.
133141
- Models/vLLM: ignore empty `tool_calls` arrays from reasoning-model OpenAI-compatible replies, reset false `toolUse` stop reasons when no actual tool calls were parsed, and stop sending `tool_choice` unless tools are present so vLLM reasoning responses no longer hang indefinitely. (#61197, #61534) Thanks @balajisiva.
142+
- Heartbeat/scheduling: spread interval heartbeats across stable per-agent phases derived from gateway identity, so provider traffic is distributed more uniformly across the configured interval instead of clustering around startup-relative times. (#64560) Thanks @odysseus0.
134143

135144
## 2026.4.9
136145

src/infra/heartbeat-runner.scheduler.test.ts

Lines changed: 73 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
import { afterEach, describe, expect, it, vi } from "vitest";
22
import type { OpenClawConfig } from "../config/config.js";
33
import { startHeartbeatRunner } from "./heartbeat-runner.js";
4+
import { computeNextHeartbeatPhaseDueMs, resolveHeartbeatPhaseMs } from "./heartbeat-schedule.js";
45
import { requestHeartbeatNow, resetHeartbeatWakeStateForTests } from "./heartbeat-wake.js";
56

67
describe("startHeartbeatRunner", () => {
78
type RunOnce = Parameters<typeof startHeartbeatRunner>[0]["runOnce"];
9+
const TEST_SCHEDULER_SEED = "heartbeat-runner-test-seed";
810

911
function useFakeHeartbeatTime() {
1012
vi.useFakeTimers();
@@ -15,6 +17,7 @@ describe("startHeartbeatRunner", () => {
1517
return startHeartbeatRunner({
1618
cfg: heartbeatConfig(),
1719
runOnce,
20+
stableSchedulerSeed: TEST_SCHEDULER_SEED,
1821
});
1922
}
2023

@@ -29,6 +32,18 @@ describe("startHeartbeatRunner", () => {
2932
} as OpenClawConfig;
3033
}
3134

35+
function resolveDueFromNow(nowMs: number, intervalMs: number, agentId: string) {
36+
return computeNextHeartbeatPhaseDueMs({
37+
nowMs,
38+
intervalMs,
39+
phaseMs: resolveHeartbeatPhaseMs({
40+
schedulerSeed: TEST_SCHEDULER_SEED,
41+
agentId,
42+
intervalMs,
43+
}),
44+
});
45+
}
46+
3247
function createRequestsInFlightRunSpy(skipCount: number) {
3348
let callCount = 0;
3449
return vi.fn().mockImplementation(async () => {
@@ -49,6 +64,7 @@ describe("startHeartbeatRunner", () => {
4964
const runner = startHeartbeatRunner({
5065
cfg: params.cfg,
5166
runOnce: params.runSpy,
67+
stableSchedulerSeed: TEST_SCHEDULER_SEED,
5268
});
5369

5470
requestHeartbeatNow(params.wake);
@@ -72,8 +88,9 @@ describe("startHeartbeatRunner", () => {
7288
const runSpy = vi.fn().mockResolvedValue({ status: "ran", durationMs: 1 });
7389

7490
const runner = startDefaultRunner(runSpy);
91+
const firstDueMs = resolveDueFromNow(0, 30 * 60_000, "main");
7592

76-
await vi.advanceTimersByTimeAsync(30 * 60_000 + 1_000);
93+
await vi.advanceTimersByTimeAsync(firstDueMs + 1);
7794

7895
expect(runSpy).toHaveBeenCalledTimes(1);
7996
expect(runSpy.mock.calls[0]?.[0]).toEqual(
@@ -90,19 +107,26 @@ describe("startHeartbeatRunner", () => {
90107
},
91108
} as OpenClawConfig);
92109

93-
await vi.advanceTimersByTimeAsync(10 * 60_000 + 1_000);
110+
const nowAfterReload = Date.now();
111+
const nextMainDueMs = resolveDueFromNow(nowAfterReload, 10 * 60_000, "main");
112+
const nextOpsDueMs = resolveDueFromNow(nowAfterReload, 15 * 60_000, "ops");
113+
const finalDueMs = Math.max(nextMainDueMs, nextOpsDueMs);
94114

95-
expect(runSpy).toHaveBeenCalledTimes(2);
96-
expect(runSpy.mock.calls[1]?.[0]).toEqual(
97-
expect.objectContaining({ agentId: "main", heartbeat: { every: "10m" } }),
98-
);
115+
await vi.advanceTimersByTimeAsync(finalDueMs - Date.now() + 1);
99116

100-
await vi.advanceTimersByTimeAsync(5 * 60_000 + 1_000);
101-
102-
expect(runSpy).toHaveBeenCalledTimes(3);
103-
expect(runSpy.mock.calls[2]?.[0]).toEqual(
104-
expect.objectContaining({ agentId: "ops", heartbeat: { every: "15m" } }),
117+
expect(runSpy.mock.calls.slice(1).map((call) => call[0]?.agentId)).toEqual(
118+
expect.arrayContaining(["main", "ops"]),
105119
);
120+
expect(
121+
runSpy.mock.calls.some(
122+
(call) => call[0]?.agentId === "main" && call[0]?.heartbeat?.every === "10m",
123+
),
124+
).toBe(true);
125+
expect(
126+
runSpy.mock.calls.some(
127+
(call) => call[0]?.agentId === "ops" && call[0]?.heartbeat?.every === "15m",
128+
),
129+
).toBe(true);
106130

107131
runner.stop();
108132
});
@@ -121,13 +145,14 @@ describe("startHeartbeatRunner", () => {
121145
});
122146

123147
const runner = startDefaultRunner(runSpy);
148+
const firstDueMs = resolveDueFromNow(0, 30 * 60_000, "main");
124149

125150
// First heartbeat fires and throws
126-
await vi.advanceTimersByTimeAsync(30 * 60_000 + 1_000);
151+
await vi.advanceTimersByTimeAsync(firstDueMs + 1);
127152
expect(runSpy).toHaveBeenCalledTimes(1);
128153

129154
// Second heartbeat should still fire (scheduler must not be dead)
130-
await vi.advanceTimersByTimeAsync(30 * 60_000 + 1_000);
155+
await vi.advanceTimersByTimeAsync(30 * 60_000);
131156
expect(runSpy).toHaveBeenCalledTimes(2);
132157

133158
runner.stop();
@@ -142,18 +167,27 @@ describe("startHeartbeatRunner", () => {
142167
const cfg = {
143168
agents: { defaults: { heartbeat: { every: "30m" } } },
144169
} as OpenClawConfig;
170+
const firstDueMs = resolveDueFromNow(0, 30 * 60_000, "main");
145171

146172
// Start runner A
147-
const runnerA = startHeartbeatRunner({ cfg, runOnce: runSpy1 });
173+
const runnerA = startHeartbeatRunner({
174+
cfg,
175+
runOnce: runSpy1,
176+
stableSchedulerSeed: TEST_SCHEDULER_SEED,
177+
});
148178

149179
// Start runner B (simulates lifecycle reload)
150-
const runnerB = startHeartbeatRunner({ cfg, runOnce: runSpy2 });
180+
const runnerB = startHeartbeatRunner({
181+
cfg,
182+
runOnce: runSpy2,
183+
stableSchedulerSeed: TEST_SCHEDULER_SEED,
184+
});
151185

152186
// Stop runner A (stale cleanup) — should NOT kill runner B's handler
153187
runnerA.stop();
154188

155189
// Runner B should still fire
156-
await vi.advanceTimersByTimeAsync(30 * 60_000 + 1_000);
190+
await vi.advanceTimersByTimeAsync(firstDueMs + 1);
157191
expect(runSpy2).toHaveBeenCalledTimes(1);
158192
expect(runSpy1).not.toHaveBeenCalled();
159193

@@ -185,10 +219,12 @@ describe("startHeartbeatRunner", () => {
185219
const runner = startHeartbeatRunner({
186220
cfg: heartbeatConfig(),
187221
runOnce: runSpy,
222+
stableSchedulerSeed: TEST_SCHEDULER_SEED,
188223
});
224+
const firstDueMs = resolveDueFromNow(0, 30 * 60_000, "main");
189225

190226
// First heartbeat returns requests-in-flight
191-
await vi.advanceTimersByTimeAsync(30 * 60_000 + 1_000);
227+
await vi.advanceTimersByTimeAsync(firstDueMs + 1);
192228
expect(runSpy).toHaveBeenCalledTimes(1);
193229

194230
// The wake layer retries after DEFAULT_RETRY_MS (1 s). No scheduleNext()
@@ -204,28 +240,40 @@ describe("startHeartbeatRunner", () => {
204240

205241
// Simulate a long-running heartbeat: the first 5 calls return
206242
// requests-in-flight (retries from the wake layer), then the 6th succeeds.
207-
const runSpy = createRequestsInFlightRunSpy(5);
243+
const callTimes: number[] = [];
244+
let callCount = 0;
245+
const runSpy = vi.fn().mockImplementation(async () => {
246+
callTimes.push(Date.now());
247+
callCount++;
248+
if (callCount <= 5) {
249+
return { status: "skipped", reason: "requests-in-flight" } as const;
250+
}
251+
return { status: "ran", durationMs: 1 } as const;
252+
});
208253

209254
const runner = startHeartbeatRunner({
210255
cfg: heartbeatConfig(),
211256
runOnce: runSpy,
257+
stableSchedulerSeed: TEST_SCHEDULER_SEED,
212258
});
259+
const intervalMs = 30 * 60_000;
260+
const firstDueMs = resolveDueFromNow(0, intervalMs, "main");
213261

214-
// Trigger the first heartbeat at t=30m — returns requests-in-flight.
215-
await vi.advanceTimersByTimeAsync(30 * 60_000 + 1_000);
262+
// Trigger the first heartbeat at the agent's first slot — returns requests-in-flight.
263+
await vi.advanceTimersByTimeAsync(firstDueMs + 1);
216264
expect(runSpy).toHaveBeenCalledTimes(1);
217265

218266
// Simulate 4 more retries at short intervals (wake layer retries).
219267
for (let i = 0; i < 4; i++) {
220268
requestHeartbeatNow({ reason: "retry", coalesceMs: 0 });
221269
await vi.advanceTimersByTimeAsync(1_000);
222270
}
223-
expect(runSpy).toHaveBeenCalledTimes(5);
271+
expect(callTimes.some((time) => time >= firstDueMs + intervalMs)).toBe(false);
224272

225-
// The next interval tick at ~t=60m should still fire — the schedule
226-
// must not have been pushed to t=30m * 6 = 180m by the 5 retries.
227-
await vi.advanceTimersByTimeAsync(30 * 60_000);
228-
expect(runSpy).toHaveBeenCalledTimes(6);
273+
// The next interval tick at the next scheduled slot should still fire —
274+
// the retries must not push the phase out by multiple intervals.
275+
await vi.advanceTimersByTimeAsync(firstDueMs + intervalMs - Date.now() + 1);
276+
expect(callTimes.some((time) => time >= firstDueMs + intervalMs)).toBe(true);
229277

230278
runner.stop();
231279
});

0 commit comments

Comments
 (0)