Skip to content

Commit f7c32fc

Browse files
amittellYibei Ouclaudeobviyus
authored
fix(telegram): lower polling keepalive delay (#83304)
* fix(telegram): enable TCP keepalive on getUpdates connections to prevent NAT timeout stalls Long-polling connections to api.telegram.org stay idle for up to the getUpdates timeout (~900 s). Most home/office NAT tables expire idle TCP entries after 60–1800 s (commonly ~1000 s). When the NAT entry is silently dropped the connection hangs rather than returning an error, leaving the grammY runner stuck until the 90 s stall watchdog fires and forces a restart cycle. Fix: unconditionally set `keepAlive: true` and `keepAliveInitialDelay: 30_000` (30 s) on the undici Agent `connect` options built in `buildTelegramConnectOptions`. OS-level TCP keepalive probes sent every ~75 s (OS default) will: 1. Refresh the NAT table entry before it expires. 2. Surface dead connections immediately with ETIMEDOUT instead of hanging forever. The `return Object.keys(connect).length > 0 ? connect : null` guard is also removed; `connect` is now always non-empty so it always returns the object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 92e454c) * fix(telegram): stop self-flagging disconnected on poll-cycle start; widen channel connect grace to 300s (cherry picked from commit 1ca963a) * fix(telegram): catch hung polling startups that preserve inherited connected:true The widened 300s channel connect grace and the removal of connected:false from notePollingStart left a path where a polling restart could hang forever looking healthy. notePollingStart clears lastConnectedAt, lastEventAt, and lastTransportActivityAt but deliberately omits connected, so server-channels' patch-merge inherits a connected:true from the previous lifecycle. After grace, evaluateChannelHealth's stale-socket branch requires lastTransportActivityAt to be non-null and the connected:false branch is masked, so the channel sits healthy with no first getUpdates. Add a post-grace branch to evaluateChannelHealth that flags polling channels as stale-socket when connected:true is paired with null lastConnectedAt and null lastTransportActivityAt and a non-null lastStartAt. Scoped to mode:polling so webhook channels and channels without continuous transport tracking are not falsely flagged. Align TELEGRAM_POLLING_CONNECT_GRACE_MS in the Telegram status diagnostic with DEFAULT_CHANNEL_CONNECT_GRACE_MS so openclaw channels status agrees with the shared health monitor on the grace window. Refresh the notePollingStart comment to point at the new evaluateChannelHealth branch. Addresses clawsweeper review on #83304 (P1 connect-grace startup-hang, P2 diagnostic grace drift). Tests cover the new flagged path, the in-grace happy path, and the prior-successful-connect happy path. * fix(telegram): clear polling connected state on startup * fix(gateway): add defense-in-depth health-policy branch for hung polling startups Defense in depth on top of 87db46c's notePollingStart connected:false fix. The primary path (notePollingStart writes connected:false explicitly so evaluateChannelHealth's existing connected===false branch catches a hung restart) is unchanged. This adds a defensive post-grace branch that catches the same hang via a different signature -- inherited connected:true paired with null lastConnectedAt and null lastTransportActivityAt -- in case a future code path forgets to clear the inherited connected flag on lifecycle start. Scoped to mode:polling so webhook channels and channels without continuous transport tracking are not falsely flagged. Also bump lastStartAt: Date.now() - 121_000 to 301_000 in the spool-handler timeout test added by upstream #83505 so it falls past the widened 300s TELEGRAM_POLLING_CONNECT_GRACE_MS suppression window (mirroring the same fixup already applied to the two adjacent polling-startup tests). * revert(telegram,gateway): keep connect grace at 120s Drop the 120s -> 300s widening from this PR after maintainer feedback that the extra grace masks real startup bugs. The defense-in-depth checks added in earlier commits (notePollingStart clearing inherited connected state, the stale-socket policy branch, the per-snapshot startup grace test) all work fine at 120s and remain valuable on their own. Reverts in: - src/gateway/channel-health-policy.ts: DEFAULT_CHANNEL_CONNECT_GRACE_MS 300 -> 120 - extensions/telegram/src/status-issues.ts: TELEGRAM_POLLING_CONNECT_GRACE_MS 300 -> 120 - extensions/telegram/src/status.test.ts: lastStartAt 301_000 -> 121_000 (3 cases) The new channel-health-policy.test.ts cases use explicit channelConnectGraceMs: 10_000 in the policy, so they are unaffected by the default constant change. * fix(telegram): narrow polling keepalive fix --------- Co-authored-by: Yibei Ou <yibeiou@Yibeis-Mac-mini.local> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Ayaan Zaidi <hi@obviy.us>
1 parent 51d7f3c commit f7c32fc

2 files changed

Lines changed: 29 additions & 17 deletions

File tree

extensions/telegram/src/fetch.test.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,11 @@ function expectStickyAutoSelectDispatcher(
281281
expect(options?.autoSelectFamilyAttemptTimeout).toBe(300);
282282
}
283283

284+
function expectTelegramKeepAliveOptions(options: Record<string, unknown> | undefined): void {
285+
expect(options?.keepAlive).toBe(true);
286+
expect(options?.keepAliveInitialDelay).toBe(30_000);
287+
}
288+
284289
function expectHttp1OnlyDispatcher(
285290
dispatcher:
286291
| {
@@ -424,6 +429,7 @@ describe("resolveTelegramFetch", () => {
424429
expectHttp1OnlyDispatcher(dispatcher);
425430
expect(dispatcher?.options?.connect?.autoSelectFamily).toBe(true);
426431
expect(dispatcher?.options?.connect?.autoSelectFamilyAttemptTimeout).toBe(300);
432+
expectTelegramKeepAliveOptions(dispatcher?.options?.connect);
427433
expect(typeof dispatcher?.options?.connect?.lookup).toBe("function");
428434
});
429435

@@ -460,8 +466,10 @@ describe("resolveTelegramFetch", () => {
460466
expectHttp1OnlyDispatcher(dispatcher);
461467
expect(dispatcher?.options?.connect?.autoSelectFamily).toBe(false);
462468
expect(dispatcher?.options?.connect?.autoSelectFamilyAttemptTimeout).toBe(300);
469+
expectTelegramKeepAliveOptions(dispatcher?.options?.connect);
463470
expect(dispatcher?.options?.proxyTls?.autoSelectFamily).toBe(false);
464471
expect(dispatcher?.options?.proxyTls?.autoSelectFamilyAttemptTimeout).toBe(300);
472+
expectTelegramKeepAliveOptions(dispatcher?.options?.proxyTls);
465473
});
466474

467475
it("adds managed proxy CA trust to Telegram env proxy dispatchers", async () => {

extensions/telegram/src/fetch.ts

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,8 @@ function createDnsResultOrderLookup(
159159
};
160160
}
161161

162+
const TELEGRAM_KEEPALIVE_INITIAL_DELAY_MS = 30_000;
163+
162164
function buildTelegramConnectOptions(params: {
163165
autoSelectFamily: boolean | null;
164166
dnsResultOrder: TelegramDnsResultOrder | null;
@@ -167,14 +169,21 @@ function buildTelegramConnectOptions(params: {
167169
autoSelectFamily?: boolean;
168170
autoSelectFamilyAttemptTimeout?: number;
169171
family?: number;
172+
keepAlive?: boolean;
173+
keepAliveInitialDelay?: number;
170174
lookup?: LookupFunction;
171-
} | null {
175+
} {
172176
const connect: {
173177
autoSelectFamily?: boolean;
174178
autoSelectFamilyAttemptTimeout?: number;
175179
family?: number;
180+
keepAlive?: boolean;
181+
keepAliveInitialDelay?: number;
176182
lookup?: LookupFunction;
177-
} = {};
183+
} = {
184+
keepAlive: true,
185+
keepAliveInitialDelay: TELEGRAM_KEEPALIVE_INITIAL_DELAY_MS,
186+
};
178187

179188
if (params.forceIpv4) {
180189
connect.family = 4;
@@ -189,7 +198,7 @@ function buildTelegramConnectOptions(params: {
189198
connect.lookup = lookup;
190199
}
191200

192-
return Object.keys(connect).length > 0 ? connect : null;
201+
return connect;
193202
}
194203

195204
function shouldBypassEnvProxyForTelegramApi(env: NodeJS.ProcessEnv = process.env): boolean {
@@ -252,34 +261,29 @@ function resolveTelegramDispatcherPolicy(params: {
252261
const explicitProxyUrl = params.proxyUrl?.trim();
253262
if (explicitProxyUrl) {
254263
return {
255-
policy: connect
256-
? {
257-
mode: "explicit-proxy",
258-
proxyUrl: explicitProxyUrl,
259-
allowPrivateProxy: true,
260-
proxyTls: { ...connect },
261-
}
262-
: {
263-
mode: "explicit-proxy",
264-
proxyUrl: explicitProxyUrl,
265-
allowPrivateProxy: true,
266-
},
264+
policy: {
265+
mode: "explicit-proxy",
266+
proxyUrl: explicitProxyUrl,
267+
allowPrivateProxy: true,
268+
proxyTls: { ...connect },
269+
},
267270
mode: "explicit-proxy",
268271
};
269272
}
270273
if (params.useEnvProxy) {
271274
return {
272275
policy: {
273276
mode: "env-proxy",
274-
...(connect ? { connect: { ...connect }, proxyTls: { ...connect } } : {}),
277+
connect: { ...connect },
278+
proxyTls: { ...connect },
275279
},
276280
mode: "env-proxy",
277281
};
278282
}
279283
return {
280284
policy: {
281285
mode: "direct",
282-
...(connect ? { connect: { ...connect } } : {}),
286+
connect: { ...connect },
283287
},
284288
mode: "direct",
285289
};

0 commit comments

Comments
 (0)