Commit f7c32fc
fix(telegram): lower polling keepalive delay (#83304)
* fix(telegram): enable TCP keepalive on getUpdates connections to prevent NAT timeout stalls
Long-polling connections to api.telegram.org stay idle for up to the
getUpdates timeout (~900 s). Most home/office NAT tables expire idle TCP
entries after 60–1800 s (commonly ~1000 s). When the NAT entry is
silently dropped the connection hangs rather than returning an error,
leaving the grammY runner stuck until the 90 s stall watchdog fires and
forces a restart cycle.
Fix: unconditionally set `keepAlive: true` and
`keepAliveInitialDelay: 30_000` (30 s) on the undici Agent `connect`
options built in `buildTelegramConnectOptions`. OS-level TCP keepalive
probes sent every ~75 s (OS default) will:
1. Refresh the NAT table entry before it expires.
2. Surface dead connections immediately with ETIMEDOUT instead of
hanging forever.
The `return Object.keys(connect).length > 0 ? connect : null` guard is
also removed; `connect` is now always non-empty so it always returns the
object.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 92e454c)
* fix(telegram): stop self-flagging disconnected on poll-cycle start; widen channel connect grace to 300s
(cherry picked from commit 1ca963a)
* fix(telegram): catch hung polling startups that preserve inherited connected:true
The widened 300s channel connect grace and the removal of connected:false from
notePollingStart left a path where a polling restart could hang forever
looking healthy. notePollingStart clears lastConnectedAt, lastEventAt, and
lastTransportActivityAt but deliberately omits connected, so server-channels'
patch-merge inherits a connected:true from the previous lifecycle. After grace,
evaluateChannelHealth's stale-socket branch requires lastTransportActivityAt
to be non-null and the connected:false branch is masked, so the channel sits
healthy with no first getUpdates.
Add a post-grace branch to evaluateChannelHealth that flags polling channels
as stale-socket when connected:true is paired with null lastConnectedAt and
null lastTransportActivityAt and a non-null lastStartAt. Scoped to mode:polling
so webhook channels and channels without continuous transport tracking are
not falsely flagged. Align TELEGRAM_POLLING_CONNECT_GRACE_MS in the Telegram
status diagnostic with DEFAULT_CHANNEL_CONNECT_GRACE_MS so openclaw channels
status agrees with the shared health monitor on the grace window. Refresh
the notePollingStart comment to point at the new evaluateChannelHealth branch.
Addresses clawsweeper review on #83304 (P1 connect-grace startup-hang, P2
diagnostic grace drift). Tests cover the new flagged path, the in-grace happy
path, and the prior-successful-connect happy path.
* fix(telegram): clear polling connected state on startup
* fix(gateway): add defense-in-depth health-policy branch for hung polling startups
Defense in depth on top of 87db46c's notePollingStart connected:false fix.
The primary path (notePollingStart writes connected:false explicitly so
evaluateChannelHealth's existing connected===false branch catches a hung
restart) is unchanged. This adds a defensive post-grace branch that catches
the same hang via a different signature -- inherited connected:true paired
with null lastConnectedAt and null lastTransportActivityAt -- in case a
future code path forgets to clear the inherited connected flag on lifecycle
start. Scoped to mode:polling so webhook channels and channels without
continuous transport tracking are not falsely flagged.
Also bump lastStartAt: Date.now() - 121_000 to 301_000 in the spool-handler
timeout test added by upstream #83505 so it falls past the widened 300s
TELEGRAM_POLLING_CONNECT_GRACE_MS suppression window (mirroring the same
fixup already applied to the two adjacent polling-startup tests).
* revert(telegram,gateway): keep connect grace at 120s
Drop the 120s -> 300s widening from this PR after maintainer feedback that
the extra grace masks real startup bugs. The defense-in-depth checks added
in earlier commits (notePollingStart clearing inherited connected state,
the stale-socket policy branch, the per-snapshot startup grace test) all
work fine at 120s and remain valuable on their own.
Reverts in:
- src/gateway/channel-health-policy.ts: DEFAULT_CHANNEL_CONNECT_GRACE_MS 300 -> 120
- extensions/telegram/src/status-issues.ts: TELEGRAM_POLLING_CONNECT_GRACE_MS 300 -> 120
- extensions/telegram/src/status.test.ts: lastStartAt 301_000 -> 121_000 (3 cases)
The new channel-health-policy.test.ts cases use explicit channelConnectGraceMs:
10_000 in the policy, so they are unaffected by the default constant change.
* fix(telegram): narrow polling keepalive fix
---------
Co-authored-by: Yibei Ou <yibeiou@Yibeis-Mac-mini.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ayaan Zaidi <hi@obviy.us>1 parent 51d7f3c commit f7c32fc
2 files changed
Lines changed: 29 additions & 17 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
281 | 281 | | |
282 | 282 | | |
283 | 283 | | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
284 | 289 | | |
285 | 290 | | |
286 | 291 | | |
| |||
424 | 429 | | |
425 | 430 | | |
426 | 431 | | |
| 432 | + | |
427 | 433 | | |
428 | 434 | | |
429 | 435 | | |
| |||
460 | 466 | | |
461 | 467 | | |
462 | 468 | | |
| 469 | + | |
463 | 470 | | |
464 | 471 | | |
| 472 | + | |
465 | 473 | | |
466 | 474 | | |
467 | 475 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| 162 | + | |
| 163 | + | |
162 | 164 | | |
163 | 165 | | |
164 | 166 | | |
| |||
167 | 169 | | |
168 | 170 | | |
169 | 171 | | |
| 172 | + | |
| 173 | + | |
170 | 174 | | |
171 | | - | |
| 175 | + | |
172 | 176 | | |
173 | 177 | | |
174 | 178 | | |
175 | 179 | | |
| 180 | + | |
| 181 | + | |
176 | 182 | | |
177 | | - | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
178 | 187 | | |
179 | 188 | | |
180 | 189 | | |
| |||
189 | 198 | | |
190 | 199 | | |
191 | 200 | | |
192 | | - | |
| 201 | + | |
193 | 202 | | |
194 | 203 | | |
195 | 204 | | |
| |||
252 | 261 | | |
253 | 262 | | |
254 | 263 | | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
267 | 270 | | |
268 | 271 | | |
269 | 272 | | |
270 | 273 | | |
271 | 274 | | |
272 | 275 | | |
273 | 276 | | |
274 | | - | |
| 277 | + | |
| 278 | + | |
275 | 279 | | |
276 | 280 | | |
277 | 281 | | |
278 | 282 | | |
279 | 283 | | |
280 | 284 | | |
281 | 285 | | |
282 | | - | |
| 286 | + | |
283 | 287 | | |
284 | 288 | | |
285 | 289 | | |
| |||
0 commit comments