Skip to content

Fix Telegram isolated polling stall watchdog#84861

Merged
joshavant merged 2 commits into
mainfrom
fix/telegram-isolated-polling-stall-watchdog
May 21, 2026
Merged

Fix Telegram isolated polling stall watchdog#84861
joshavant merged 2 commits into
mainfrom
fix/telegram-isolated-polling-stall-watchdog

Conversation

@joshavant

@joshavant joshavant commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Honor Telegram isolated polling pollingStallThresholdMs by wiring isolated worker activity into the liveness tracker and restarting silent workers.
  • Add regression coverage for silent-worker restart and spooled-message activity.

Fixes #83950

Verification

  • node scripts/run-vitest.mjs extensions/telegram/src/polling-session.test.ts -- --reporter=verbose
  • node scripts/run-vitest.mjs extensions/telegram/src/polling-liveness.test.ts -- --reporter=verbose
  • AUTOREVIEW_AUTO_TESTS=0 .agents/skills/autoreview/scripts/autoreview --mode local
  • AWS Crabbox fix proof: provider=aws, leaseId=cbx_a81230ba78bf, run=run_0d6294cf4a6e
  • Credentialed AWS Crabbox Telegram proof: provider=aws, leaseId=cbx_d0af75929598, run=run_bc0c4ba2da67, Convex credential kind telegram

Real behavior proof

Behavior addressed: Telegram isolated polling now honors pollingStallThresholdMs and restarts a silent isolated ingress worker instead of wedging indefinitely.

Real environment tested: Direct AWS Crabbox Linux. Runtime-level fix proof used provider=aws, leaseId=cbx_a81230ba78bf, run=run_0d6294cf4a6e. Credentialed Telegram proof used provider=aws, leaseId=cbx_d0af75929598, run=run_bc0c4ba2da67, with a Convex-leased telegram bot credential.

Exact steps or command run after this patch: First ran a no-credential TelegramPollingSession harness on AWS with isolated ingress enabled, pollingStallThresholdMs=30000, a fake Telegram API fetch, and a silent worker task, then observed for 65s before explicit abort. Then ran a credentialed AWS harness that leased telegram from Convex, validated the SUT bot token against the real Telegram Bot API with getMe, deleteWebhook, and getUpdates, forwarded one live getUpdates through a local API proxy, induced a silent hung getUpdates, and observed the isolated polling watchdog restart the ingress.

Evidence after fix: OPENCLAW_ISSUE_83950_FIX_PROOF beforeAbort showed stopCount=1, workerFactoryCalls=2, and stall log lines including Polling stall detected... plus the isolated ingress restart log. OPENCLAW_ISSUE_83950_CREDENTIALED_LIVE_FIX_PROOF showed tokenValidated=true, real API preflight methods getMe, deleteWebhook, and getUpdates, forwardedGetUpdates=1, hungGetUpdates=2, workerFactoryCalls=2, stopCount=1 before abort, and stall/restart logs.

Observed result after fix: Silent worker was stopped and restarted before explicit abort in both proofs; the credentialed live proof exited 0 after the watchdog detected active getUpdates stuck and restarted isolated ingress.

What was not tested: A naturally occurring Telegram production outage was not waited on, and no telegram-user human/Desktop credential was used. The credentialed proof used the available Convex telegram bot credential and a controlled local proxy-induced getUpdates hang after validating the real bot token against Telegram.

@openclaw-barnacle openclaw-barnacle Bot added channel: telegram Channel integration: telegram size: M maintainer Maintainer-authored PR labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR wires Telegram isolated polling worker activity into the liveness watchdog, restarts silent isolated workers, adds polling-session regression coverage, and adds an Unreleased changelog entry.

Reproducibility: yes. Current main clearly passes pollingStallThresholdMs while defaulting isolated ingress on, but the threshold is only consumed by the non-isolated polling watchdog path.

PR rating
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Summary: Focused implementation and regression coverage with credible session-level proof; live Telegram proof would raise confidence for the transport-risk edge.

Rank-up moves:

  • Add or request a redacted live Telegram polling smoke only if maintainers require transport proof beyond the AWS session harness.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (live_output): The PR body includes after-fix AWS Crabbox live output from a runtime TelegramPollingSession harness showing the silent worker stopped and restarted, with the caveat that it did not use a real Telegram bot token/network stall.

Mantis proof suggestion
A live Telegram transport smoke would materially reduce the remaining merge risk around polling restart behavior. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram live QA: verify Telegram polling can still receive and reply after an induced isolated ingress restart from polling stall liveness.

Risk before merge

  • The patch changes live Telegram isolated ingress restart behavior; if liveness accounting is wrong, active workers could be restarted and inbound messages delayed while the cycle recovers.
  • The available real behavior proof is an AWS session-level harness with a fake Telegram API fetch, not a live Telegram bot-token/network stall, so maintainers may still want a live Telegram probe before merge.

Maintainer options:

  1. Ask for live Telegram proof (recommended)
    Before merge, a maintainer can request a redacted live Telegram polling smoke or equivalent Mantis run to prove the transport still receives messages after an induced isolated ingress restart.
  2. Accept the session-level proof
    Maintainers can accept the AWS Crabbox harness as sufficient because the bug is source-level liveness wiring and the proof exercises the actual TelegramPollingSession restart path without credentials.
  3. Pause if live-stall proof is required
    If reviewers require a real bot-token/network stall rather than a synthetic silent worker, hold the PR until that proof is available instead of merging on tests alone.

Next step before merge
No automated repair is needed; the remaining action is maintainer review of the Telegram transport proof and merge-risk tradeoff.

Security
Cleared: The diff touches Telegram runtime/test/changelog code only and does not add dependencies, scripts, permissions, credential handling, or supply-chain execution paths.

Review details

Best possible solution:

Land the worker-liveness watchdog with the focused regression coverage once maintainers accept the session-level proof or obtain a live Telegram smoke for the transport path.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main clearly passes pollingStallThresholdMs while defaulting isolated ingress on, but the threshold is only consumed by the non-isolated polling watchdog path.

Is this the best way to solve the issue?

Yes. Reusing the existing liveness tracker for isolated worker messages is a narrow fix for the documented config contract and avoids adding a new opt-out setting or changing the public config surface.

Label changes:

  • add P1: The PR fixes a default Telegram polling stall path that can wedge inbound updates for real channel users.
  • add merge-risk: 🚨 message-delivery: The diff changes when the Telegram isolated ingress worker is stopped and restarted in the inbound polling path.
  • add merge-risk: 🚨 availability: A watchdog false positive could churn the Telegram isolated polling loop and temporarily reduce channel availability.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix AWS Crabbox live output from a runtime TelegramPollingSession harness showing the silent worker stopped and restarted, with the caveat that it did not use a real Telegram bot token/network stall.
  • add rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🐚 platinum hermit, patch quality is 🐚 platinum hermit, and Focused implementation and regression coverage with credible session-level proof; live Telegram proof would raise confidence for the transport-risk edge.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix AWS Crabbox live output from a runtime TelegramPollingSession harness showing the silent worker stopped and restarted, with the caveat that it did not use a real Telegram bot token/network stall.

Label justifications:

  • P1: The PR fixes a default Telegram polling stall path that can wedge inbound updates for real channel users.
  • merge-risk: 🚨 message-delivery: The diff changes when the Telegram isolated ingress worker is stopped and restarted in the inbound polling path.
  • merge-risk: 🚨 availability: A watchdog false positive could churn the Telegram isolated polling loop and temporarily reduce channel availability.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🐚 platinum hermit, patch quality is 🐚 platinum hermit, and Focused implementation and regression coverage with credible session-level proof; live Telegram proof would raise confidence for the transport-risk edge.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix AWS Crabbox live output from a runtime TelegramPollingSession harness showing the silent worker stopped and restarted, with the caveat that it did not use a real Telegram bot token/network stall.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix AWS Crabbox live output from a runtime TelegramPollingSession harness showing the silent worker stopped and restarted, with the caveat that it did not use a real Telegram bot token/network stall.

What I checked:

  • current default path makes the bug source-reproducible: Current main passes account.config.pollingStallThresholdMs into TelegramPollingSession while defaulting isolatedIngress.enabled to true, so the configured watchdog value is present on the default path. (extensions/telegram/src/monitor.ts:292, e42726204490)
  • current main only applies the stall threshold in non-isolated polling: Current main stores #stallThresholdMs and chooses the isolated cycle by default, but the visible watchdog detectStall call using #stallThresholdMs is in #runPollingCycle rather than #runIsolatedIngressCycle. (extensions/telegram/src/polling-session.ts:280, e42726204490)
  • PR wires isolated worker messages into liveness: The head revision creates a TelegramPollingLivenessTracker in #runIsolatedIngressCycle and records poll-start, poll-success, poll-error, and spooled activity from the isolated worker before watchdog evaluation. (extensions/telegram/src/polling-session.ts:708, 84b887d820f3)
  • PR adds an isolated ingress watchdog restart: The head revision adds an interval in the isolated cycle that calls liveness.detectStall with this.#stallThresholdMs, marks transport dirty, logs/statuses the stall, stops the worker, and forces the cycle forward after the stop grace timer. (extensions/telegram/src/polling-session.ts:808, 84b887d820f3)
  • PR extends liveness tracking for worker activity: The head revision adds lastGetUpdatesActivityAt, noteGetUpdatesSuccessCount, and noteGetUpdatesActivity so spooled updates can keep an in-flight isolated poll from being treated as silent. (extensions/telegram/src/polling-liveness.ts:15, 84b887d820f3)
  • regression coverage covers restart and active spooling: The PR adds tests for restarting a stalled isolated ingress worker and for keeping the worker alive when spooled messages show activity before the threshold expires. (extensions/telegram/src/polling-session.test.ts:782, 84b887d820f3)

Likely related people:

  • Peter Steinberger: Relevant history shows Peter split Telegram polling/session surfaces and deduped the polling-session harness, which are central to reviewing this PR's implementation and tests. (role: major refactor and test-harness contributor; confidence: medium; commits: 0c0f1e34cba0, 53f90af990a6, a185ca283a74; files: extensions/telegram/src/polling-session.ts, extensions/telegram/src/polling-session.test.ts, extensions/telegram/src/polling-liveness.ts)
  • wangchunyue: The closest prior watchdog-specific history changed polling-session.ts and polling-session.test.ts to prevent the Telegram polling watchdog from dropping replies. (role: prior polling watchdog contributor; confidence: medium; commits: dd61171f5b3e; files: extensions/telegram/src/polling-session.ts, extensions/telegram/src/polling-session.test.ts)
  • pkuGeo: A prior stalled polling fix rebuilt Telegram transport after stalled polling cycles in the same polling-session code path. (role: stalled polling restart contributor; confidence: medium; commits: e035a0d98c91; files: extensions/telegram/src/polling-session.ts)
  • Dallin Romney: Current-main blame for the isolated polling cycle and liveness tracker points to a recent commit touching these files, although the commit subject is broader than this specific Telegram behavior. (role: recent current-file contributor; confidence: low; commits: 447a3643c69b; files: extensions/telegram/src/polling-session.ts, extensions/telegram/src/polling-liveness.ts, extensions/telegram/src/telegram-ingress-worker.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against e42726204490.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: ✨ glimmer Gilded Shellbean

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: ✨ glimmer.
Trait: sparkles near resolved comments.
Image traits: location review cove; accessory little merge flag; palette cobalt, lime, and pearl; mood patient; pose stepping out of a freshly hatched shell; shell woven fiber shell; lighting tiny status-light glow; background gentle dashboard dots.
Share on X: post this hatch
Copy: My PR egg hatched a ✨ glimmer Gilded Shellbean in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@joshavant joshavant merged commit 40db92f into main May 21, 2026
143 of 151 checks passed
@joshavant joshavant deleted the fix/telegram-isolated-polling-stall-watchdog branch May 21, 2026 07:19
@joshavant joshavant restored the fix/telegram-isolated-polling-stall-watchdog branch May 21, 2026 07:20
@joshavant joshavant deleted the fix/telegram-isolated-polling-stall-watchdog branch May 21, 2026 07:20
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix(telegram): watch isolated polling stalls

* docs(changelog): note telegram polling watchdog fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: telegram Channel integration: telegram maintainer Maintainer-authored PR merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: channels.telegram.pollingStallThresholdMs silently ignored when isolated polling ingress is enabled (default)

1 participant