Skip to content

Commit 1cd9292

Browse files
fix(gateway): reuse ws ping for connection health
1 parent 8c49255 commit 1cd9292

14 files changed

Lines changed: 41 additions & 79 deletions

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -526,7 +526,7 @@ Docs: https://docs.openclaw.ai
526526
- macOS app: move recent session context rows into a Context submenu while keeping usage and cost details root-level, so the menu bar companion stays compact with many active sessions. Thanks @guti.
527527
- Gateway/SDK: add SDK-facing tools.invoke RPC with shared HTTP policy, typed approval/refusal results, and SDK helper support. Refs #74705. Thanks @BunsDev and @ai-hpc.
528528
- Discord: keep active buttons, selects, and forms working across Gateway restarts until they expire, so multi-step Discord interactions are less likely to break during upgrades or restarts. Thanks @amknight.
529-
- Gateway/health: add per-client `health.connection` telemetry with WebSocket ping/pong RTT, last-heartbeat, and stale-connection reporting while keeping cached service health shared. (#70230) Thanks @trialanderrorstudios.
529+
- Gateway/health: add per-client `health.connection` telemetry with authenticated WebSocket keepalive ping/pong RTT, last-heartbeat, and stale-connection reporting while keeping cached service health shared. (#70230) Thanks @trialanderrorstudios.
530530
- Messages/docs: clarify that `BodyForAgent` is the primary inbound model text while `Body` is the legacy envelope fallback, and add Signal coverage so channel hardening patches target the real prompt path. Refs #66198. Thanks @defonota3box.
531531
- Slack: publish a safe default App Home tab view on `app_home_opened` and include the Home tab event in setup manifests. Fixes #11655; refs #52020. Thanks @TinyTb.
532532
- Slack: keep track of bot-participated threads across restarts, so ongoing threaded conversations can continue auto-replying after the Gateway is restarted. Thanks @amknight.

docs/cli/health.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ Notes:
3333
refresh in the background.
3434
- JSON output from an authenticated WebSocket health response includes a per-client
3535
`connection` block with `connected`, `rttMs`, and `lastHeartbeatAt`. Those fields
36-
describe the current Gateway connection, not the cached service snapshot.
36+
describe the current Gateway connection, not the cached service snapshot. RTT is
37+
sampled from the authenticated WebSocket keepalive ping/pong.
3738
- `--verbose` forces a live probe, prints gateway connection details, and expands the
3839
human-readable output across all configured accounts and agents.
3940
- Output includes per-agent session stores when multiple agents are configured.

docs/gateway/health.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ The health snapshot includes: `ok` (boolean), `ts` (timestamp), `durationMs` (pr
7676
}
7777
```
7878

79-
`connection` is not stored in the shared cached health snapshot. It is overlaid for the requesting WebSocket client so separate clients keep independent liveness and RTT state. Before the first successful WebSocket ping/pong, `connected` can be `true` while `rttMs` and `lastHeartbeatAt` are `null`.
79+
`connection` is not stored in the shared cached health snapshot. It is overlaid for the requesting WebSocket client so separate clients keep independent liveness and RTT state. RTT samples come from the authenticated WebSocket keepalive ping/pong. Before the first successful WebSocket ping/pong, `connected` can be `true` while `rttMs` and `lastHeartbeatAt` are `null`.
8080

8181
## Related
8282

docs/gateway/protocol.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -320,7 +320,7 @@ enumeration of `src/gateway/server-methods/*.ts`.
320320

321321
<AccordionGroup>
322322
<Accordion title="System and identity">
323-
- `health` returns the cached or freshly probed gateway health snapshot. Authenticated WebSocket responses include a per-client `connection` block with `connected`, WebSocket ping/pong `rttMs`, and `lastHeartbeatAt`; this block is overlaid per response and is not part of the shared cached service snapshot.
323+
- `health` returns the cached or freshly probed gateway health snapshot. Authenticated WebSocket responses include a per-client `connection` block with `connected`, authenticated keepalive ping/pong `rttMs`, and `lastHeartbeatAt`; this block is overlaid per response and is not part of the shared cached service snapshot.
324324
- `diagnostics.stability` returns the recent bounded diagnostic stability recorder. It keeps operational metadata such as event names, counts, byte sizes, memory readings, queue/session state, channel/plugin names, and session ids. It does not keep chat text, webhook bodies, tool outputs, raw request or response bodies, tokens, cookies, or secret values. Operator read scope is required.
325325
- `status` returns the `/status`-style gateway summary; sensitive fields are included only for admin-scoped operator clients.
326326
- `gateway.identity.get` returns the gateway device identity used by relay and pairing flows.

src/gateway/server-close.test.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,6 @@ function createGatewayCloseTestDeps(
8585
nodePresenceTimers: new Map(),
8686
broadcast: vi.fn(),
8787
tickInterval: setInterval(() => undefined, 60_000),
88-
connectionPingInterval: setInterval(() => undefined, 60_000),
8988
healthInterval: setInterval(() => undefined, 60_000),
9089
dedupeCleanup: setInterval(() => undefined, 60_000),
9190
mediaCleanup: null,

src/gateway/server-close.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,6 @@ export function createGatewayCloseHandler(params: {
186186
nodePresenceTimers: Map<string, ReturnType<typeof setInterval>>;
187187
broadcast: (event: string, payload: unknown, opts?: { dropIfSlow?: boolean }) => void;
188188
tickInterval: ReturnType<typeof setInterval>;
189-
connectionPingInterval: ReturnType<typeof setInterval>;
190189
healthInterval: ReturnType<typeof setInterval>;
191190
dedupeCleanup: ReturnType<typeof setInterval>;
192191
mediaCleanup: ReturnType<typeof setInterval> | null;
@@ -311,7 +310,6 @@ export function createGatewayCloseHandler(params: {
311310
restartExpectedMs,
312311
});
313312
clearInterval(params.tickInterval);
314-
clearInterval(params.connectionPingInterval);
315313
clearInterval(params.healthInterval);
316314
clearInterval(params.dedupeCleanup);
317315
if (params.mediaCleanup) {

src/gateway/server-maintenance.test.ts

Lines changed: 0 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
import { afterEach, describe, expect, it, vi } from "vitest";
2-
import { WebSocket } from "ws";
32
import type { HealthSummary } from "../commands/health.js";
43
import type { ChatAbortControllerEntry } from "./chat-abort.js";
5-
import type { GatewayWsClient } from "./server/ws-types.js";
64

75
const cleanOldMediaMock = vi.fn(async () => {});
86

@@ -30,7 +28,6 @@ function createActiveRun(sessionKey: string): ChatAbortControllerEntry {
3028

3129
function createMaintenanceTimerDeps() {
3230
return {
33-
clients: new Set<GatewayWsClient>(),
3431
broadcast: () => {},
3532
nodeSendToAllSubscribed: () => {},
3633
getPresenceVersion: () => 1,
@@ -51,13 +48,11 @@ function createMaintenanceTimerDeps() {
5148

5249
function stopMaintenanceTimers(timers: {
5350
tickInterval: NodeJS.Timeout;
54-
connectionPingInterval: NodeJS.Timeout;
5551
healthInterval: NodeJS.Timeout;
5652
dedupeCleanup: NodeJS.Timeout;
5753
mediaCleanup: NodeJS.Timeout | null;
5854
}) {
5955
clearInterval(timers.tickInterval);
60-
clearInterval(timers.connectionPingInterval);
6156
clearInterval(timers.healthInterval);
6257
clearInterval(timers.dedupeCleanup);
6358
if (timers.mediaCleanup) {
@@ -128,32 +123,6 @@ describe("startGatewayMaintenanceTimers", () => {
128123
stopMaintenanceTimers(timers);
129124
});
130125

131-
it("pings authenticated open clients on the connection health cadence", async () => {
132-
vi.useFakeTimers();
133-
vi.setSystemTime(new Date("2026-04-12T00:00:00Z"));
134-
const { CONNECTION_PING_INTERVAL_MS } = await import("./server/connection-health.js");
135-
const { startGatewayMaintenanceTimers } = await import("./server-maintenance.js");
136-
const ping = vi.fn();
137-
const deps = createMaintenanceTimerDeps();
138-
const client = {
139-
socket: { readyState: WebSocket.OPEN, ping },
140-
connectionHealth: { connectedAtMs: Date.now() },
141-
} as unknown as GatewayWsClient;
142-
deps.clients.add(client);
143-
144-
const timers = startGatewayMaintenanceTimers(deps);
145-
146-
await vi.advanceTimersByTimeAsync(CONNECTION_PING_INTERVAL_MS);
147-
148-
expect(ping).toHaveBeenCalledWith(String(Date.now()));
149-
expect(
150-
(client as { connectionHealth: { lastPingSentAtMs?: number } }).connectionHealth
151-
.lastPingSentAtMs,
152-
).toBe(Date.now());
153-
154-
stopMaintenanceTimers(timers);
155-
});
156-
157126
it("skips overlapping media cleanup runs", async () => {
158127
vi.useFakeTimers();
159128
let resolveCleanup = () => {};

src/gateway/server-maintenance.ts

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,9 @@ import {
1212
} from "./server-constants.js";
1313
import type { DedupeEntry } from "./server-shared.js";
1414
import { formatError } from "./server-utils.js";
15-
import { CONNECTION_PING_INTERVAL_MS, pingGatewayClient } from "./server/connection-health.js";
1615
import { setBroadcastHealthUpdate } from "./server/health-state.js";
17-
import type { GatewayWsClient } from "./server/ws-types.js";
1816

1917
export function startGatewayMaintenanceTimers(params: {
20-
clients: Set<GatewayWsClient>;
2118
broadcast: (
2219
event: string,
2320
payload: unknown,
@@ -50,7 +47,6 @@ export function startGatewayMaintenanceTimers(params: {
5047
mediaCleanupTtlMs?: number;
5148
}): {
5249
tickInterval: ReturnType<typeof setInterval>;
53-
connectionPingInterval: ReturnType<typeof setInterval>;
5450
healthInterval: ReturnType<typeof setInterval>;
5551
dedupeCleanup: ReturnType<typeof setInterval>;
5652
mediaCleanup: ReturnType<typeof setInterval> | null;
@@ -72,13 +68,6 @@ export function startGatewayMaintenanceTimers(params: {
7268
params.nodeSendToAllSubscribed("tick", payload);
7369
}, TICK_INTERVAL_MS);
7470

75-
const connectionPingInterval = setInterval(() => {
76-
const now = Date.now();
77-
for (const client of params.clients) {
78-
pingGatewayClient(client, now);
79-
}
80-
}, CONNECTION_PING_INTERVAL_MS);
81-
8271
// periodic health refresh to keep cached snapshot warm
8372
const healthInterval = setInterval(() => {
8473
void params
@@ -178,7 +167,6 @@ export function startGatewayMaintenanceTimers(params: {
178167
if (typeof params.mediaCleanupTtlMs !== "number") {
179168
return {
180169
tickInterval,
181-
connectionPingInterval,
182170
healthInterval,
183171
dedupeCleanup,
184172
mediaCleanup: null,
@@ -209,5 +197,5 @@ export function startGatewayMaintenanceTimers(params: {
209197

210198
void runMediaCleanup();
211199

212-
return { tickInterval, connectionPingInterval, healthInterval, dedupeCleanup, mediaCleanup };
200+
return { tickInterval, healthInterval, dedupeCleanup, mediaCleanup };
213201
}

src/gateway/server-methods/health.ts

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -119,10 +119,7 @@ export const healthHandlers: GatewayRequestHandlers = {
119119
}
120120
try {
121121
const snap = await refreshHealthSnapshot({ probe: wantsProbe, includeSensitive });
122-
const payload = context.getEventLoopHealth
123-
? { ...snap, eventLoop: context.getEventLoopHealth() }
124-
: snap;
125-
respond(true, withConnectionHealth(payload, client), undefined);
122+
respond(true, withConnectionHealth(snap, client), undefined);
126123
} catch (err) {
127124
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
128125
}

src/gateway/server-runtime-handles.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ type GatewayConfigReloaderHandle = {
99
export type GatewayServerMutableState = {
1010
bonjourStop: (() => Promise<void>) | null;
1111
tickInterval: ReturnType<typeof setInterval>;
12-
connectionPingInterval: ReturnType<typeof setInterval>;
1312
healthInterval: ReturnType<typeof setInterval>;
1413
dedupeCleanup: ReturnType<typeof setInterval>;
1514
mediaCleanup: ReturnType<typeof setInterval> | null;
@@ -38,7 +37,6 @@ export function createGatewayServerMutableState(): GatewayServerMutableState {
3837
return {
3938
bonjourStop: null as (() => Promise<void>) | null,
4039
tickInterval: noopInterval(),
41-
connectionPingInterval: noopInterval(),
4240
healthInterval: noopInterval(),
4341
dedupeCleanup: noopInterval(),
4442
mediaCleanup: null as ReturnType<typeof setInterval> | null,

0 commit comments

Comments
 (0)