fix(telegram): keep polling watchdog on getUpdates liveness#78646
fix(telegram): keep polling watchdog on getUpdates liveness#78646steipete merged 1 commit intoopenclaw:mainfrom
Conversation
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. A high-confidence source reproduction exists on current main: make Real behavior proof Next step before merge Security Review detailsBest possible solution: Land this narrow watchdog fix after exact-head required checks finish green, while leaving broader transport-rebuild behavior tracked in the related open issue. Do we have a high-confidence way to reproduce the issue? Yes. A high-confidence source reproduction exists on current main: make Is this the best way to solve the issue? Yes. Removing generic Bot API liveness from the restart gate while retaining it as diagnostic output is the narrowest maintainable fix for the documented What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 120eb3426a14. |
cce6e7a to
de0da45
Compare
de0da45 to
e301533
Compare
e301533 to
1caf97e
Compare
(cherry picked from commit 440111f)
Summary
getUpdatespolling.sendMessagetraffic could mask a wedged inbound polling loop, leaving Telegram replies silent until a manual restart.getUpdatesliveness only, while keeping unrelated API elapsed time in diagnostics.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
TelegramPollingLivenessTracker.detectStall()returned no stall when eithergetUpdateselapsed time or generic Bot API elapsed time was still within the threshold.getUpdatescoincides with recent or in-flight non-polling API traffic.Regression Test Plan (if applicable)
extensions/telegram/src/polling-liveness.test.ts,extensions/telegram/src/polling-session.test.tsgetUpdatesstill triggers watchdog restart even whensendMessagerecently succeeded or a non-getUpdatesAPI call is in flight.User-visible / Behavior Changes
Telegram polling recovery now restarts stale inbound polling even if unrelated outbound Telegram API calls are active or recently succeeded.
Diagram (if applicable)
Security Impact (required)
Yes, explain risk + mitigation: N/ARepro + Verification
Environment
TELEGRAM_BOT_TOKENenv fallback from a local redacted token fileSteps
getUpdatesliveness state.sendMessagesuccess or an in-flight non-getUpdatesAPI call.Expected
Actual
getUpdatesliveness controls the watchdog and restart proceeds.Evidence
Validation on the rebased branch:
Real behavior proof
getUpdatesliveness even when unrelated outbound Bot API calls are active.fix/telegram-polling-watchdog-getupdates, commite301533582, Nodev22.22.1, pnpm10.33.2, real Telegram Bot API token from a local redacted token file, and a private DM chat with the bot.getMe, read a recent private DM viagetUpdates, sent a disabled-notification proof message withsendMessage, exercised the PR liveness code to verify stalegetUpdatesreturnsSTALLafter outbound Telegram activity, then ran an isolated source-mode Gateway on port19986across the watchdog window withTELEGRAM_BOT_TOKENsupplied via env fallback.Telegram client also showed the real round trip:
Additional isolated Gateway live proof after the same patch:
Before-fix long-lived reproduction on parent commit
d05415d603:getMe,getUpdates, andsendMessage; the watchdog returnedSTALLafter real outbound Telegram activity; the isolated Gateway stayed live/ready across the watchdog window with one Telegram provider start, zero falsePolling stall detectedlogs, and zerogetUpdates conflictlogs.Human Verification (required)
getUpdateswith recent non-polling API success, stalegetUpdateswith recent in-flight non-polling API activity, stalegetUpdateswith newer in-flight non-polling activity, existing stale polling restart paths, pre-fix long-lived suppression reproduction, real Telegram Bot APIgetMe/getUpdates/sendMessage, and an isolated live Gateway Telegram polling run across the watchdog window.apiElapsedMsfor debugging while not using generic API liveness to suppress stale polling recovery.Review Conversations
Compatibility / Migration
Risks and Mitigations
getUpdateshealth, and the watchdog threshold/throttling still bounds restarts.