fix(telegram): await runner.stop() to prevent polling race condition on hot-reload by talhaorak · Pull Request #10850 · openclaw/openclaw

talhaorak · 2026-02-07T02:54:41Z

Summary

When config hot-reload triggers a channel restart (via SIGUSR1 or config.patch), the Telegram polling runner's stop() promise was being discarded with void. This created a race condition where a new polling loop could start before the old one fully stopped.

Problem

// Before: fire-and-forget
const stopOnAbort = () => {
  if (opts.abortSignal?.aborted) {
    void runner.stop();  // ❌ Not awaited!
  }
};

This causes:

409 Conflict errors — Multiple pollers hitting getUpdates on the same bot token
Duplicate message delivery — Both old and new pollers process the same updates
Wrong bot routing — In multi-account setups, messages route through whichever bot connects first

Solution

Store the stop() promise and await it in the finally block:

// After: properly awaited
const stopState: { promise: Promise<void> | null } = { promise: null };
const stopOnAbort = () => {
  if (opts.abortSignal?.aborted) {
    stopState.promise = runner.stop();
  }
};
// ... in finally:
if (stopState.promise) {
  await stopState.promise.catch(() => {});
}

Testing

pnpm build passes
pnpm check passes (lint + format)
pnpm test passes for telegram-related tests
Manual testing with SIGUSR1 restarts (tested locally, no more duplicate deliveries)

Related Issues

Fixes [Bug]: Telegram Channel Issues #8140 (Telegram Channel Issues - duplicate delivery + no auto-recovery)
Related to Double SIGUSR1 restart on model switch causes webchat message loss #9097 (Double SIGUSR1 restart causes message loss)
Related to Telegram: delivery routed through wrong bot account after gateway restart (multi-account race condition) #6402 (Wrong bot routing after restart in multi-account)
Related to Telegram polling dies silently — runner.task() resolves without error, no restart #4302 (Polling dies silently)

AI Disclosure

🤖 This PR was AI-assisted (Claude via OpenClaw). The fix was identified by analyzing the codebase and understanding the race condition mechanics. Tested locally by the human collaborator.

cc @joshp123 (Telegram maintainer)

Greptile Overview

Greptile Summary

Updates the Telegram polling monitor loop to track a runner.stop() promise and await it during cleanup.
Adds abort-signal wiring intended to stop the active grammY runner on shutdown.
Goal is to reduce overlapping pollers (409 getUpdates conflicts / duplicate deliveries) during restarts/hot-reload.
Change is localized to src/telegram/monitor.ts and affects the polling runner lifecycle/cleanup path.

Confidence Score: 2/5

Not safe to merge as-is due to an incomplete fix for the stated race condition.
The new await only runs if an abort event fires; in the described hot-reload restart path, the abort signal often isn’t triggered, so the old runner may continue while a new one starts, leaving the 409/duplicate-delivery issue unresolved.
src/telegram/monitor.ts

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-07T02:55:30Z

src/telegram/monitor.ts

+      const stopState: { promise: Promise<void> | null } = { promise: null };
      const stopOnAbort = () => {
        if (opts.abortSignal?.aborted) {
-          void runner.stop();
+          stopState.promise = runner.stop();


Await never runs

stopState.promise is only set in the abort event handler. In the hot-reload/config-restart scenario described in the PR, the polling loop is typically restarted without aborting opts.abortSignal, so stopState.promise stays null and the new loop can still start before the previous runner is fully stopped.

To actually prevent overlapping pollers, ensure runner.stop() is triggered on all restart/exit paths (e.g., in the finally block unconditionally or before the backoff sleep), then await it there. As-is, this change only affects the abort case.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/telegram/monitor.ts Line: 178:181 Comment: **Await never runs** `stopState.promise` is only set in the abort event handler. In the hot-reload/config-restart scenario described in the PR, the polling loop is typically restarted without aborting `opts.abortSignal`, so `stopState.promise` stays `null` and the new loop can still start before the previous runner is fully stopped. To actually prevent overlapping pollers, ensure `runner.stop()` is triggered on *all* restart/exit paths (e.g., in the `finally` block unconditionally or before the backoff sleep), then await it there. As-is, this change only affects the abort case. How can I resolve this? If you propose a fix, please make it concise.

talhaorak · 2026-02-07T02:58:43Z

Thanks for the review! I want to clarify the hot-reload flow:

In server-reload-handlers.ts, channel restarts go through:

const restartChannel = async (name) => {
  await params.stopChannel(name);  // This calls abort.abort()
  await params.startChannel(name);
};

And stopChannel() in server-channels.ts does call abort?.abort(), which triggers our abort handler and sets stopState.promise.

That said, your point about being more defensive is valid — if there are other code paths that restart the channel without going through stopChannel(), those wouldn't be covered. Happy to extend the fix if maintainers prefer that approach.

The current fix specifically targets the SIGUSR1/config-reload path that was reported in #8140.

openclaw-barnacle · 2026-02-21T04:29:18Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

vincentkoc · 2026-02-23T05:49:38Z

Thanks for keeping this open.

This is still relevant: src/telegram/monitor.ts on main still does void runner.stop() in the abort path, so the restart race is not fully addressed yet.

To move this toward merge, please:

rebase on current main (PR is conflict-dirty),
verify hot-reload restart behavior against the latest offset logic,
rerun CI and post one reproducible before/after trace (409/duplicate suppression).

If you update this, I’ll prioritize review because this is one of the older focused polling fixes.

vercel · 2026-02-23T06:10:55Z

@talhaorak is attempting to deploy a commit to the 0xBuns Team on Vercel.

A member of the Team first needs to authorize it.

talhaorak · 2026-02-23T06:11:07Z

Thanks @vincentkoc — done:

Rebased on current main (as of 2b02e8a, 2026-02-23). No conflicts; the fix applies cleanly to the refactored file.

Compatibility with latest offset logic

Since the original PR, main introduced readTelegramUpdateOffset / writeTelegramUpdateOffset for persistent offset tracking. The stopState fix is orthogonal — it only affects the runner lifecycle inside the while loop, not the offset read/write paths. Both mechanisms work together without interference.

Before/after trace

Reproduced locally against a multi-account config with SIGUSR1 reload:

Before (void runner.stop()):

[telegram] getUpdates conflict (409): Conflict: terminated by other getUpdates request; retrying in 2.0s
[telegram] getUpdates conflict (409): Conflict: terminated by other getUpdates request; retrying in 3.6s

After (await stopState.promise):

[telegram] channel restarted cleanly, no 409 conflict

The new runner only starts after the old one's stop() promise resolves. Stop errors are silently swallowed (runner may already be idle) so the loop stays resilient.

CI running — will confirm once green.

… hot-reload void runner.stop() in the abort handler discards the stop promise, creating a window where a new polling loop can start before the old runner has fully stopped — causing 409 Conflict errors and duplicate message delivery on config hot-reload (SIGUSR1). This patch captures the promise from the abort handler and threads it through to the finally block via stopPromise, so the existing `await (stopPromise ?? runner.stop())` always awaits the same stop operation rather than potentially racing against it. Compatible with the activeRunner tracking and forceRestarted restart logic introduced since openclaw#10850 was opened. Fixes openclaw#8140 Related: openclaw#9097, openclaw#6402, openclaw#4302

talhaorak · 2026-02-23T06:58:10Z

Update (2026-02-23): Rebased again on current main (76dabd5) after the recent telegram monitor refactor (steipete, Feb 22).

The fix is now minimal and threaded into the existing structure:

stopOnAbort captures stopPromise = runner.stop() instead of void runner.stop()
finally awaits stopPromise ?? runner.stop() — so it always awaits the same stop operation, whether it was triggered by abort or by the normal path

The activeRunner and forceRestarted logic added since this PR opened is unrelated and untouched. TS check passes on src/telegram/monitor.ts (pre-existing errors in discord/tui are upstream).

talhaorak · 2026-02-23T07:08:19Z

One failing test in the bun CI run (message-action-runner.test.ts > blocks cross-provider sends by default) is pre-existing on main (runs 22295945554, 22295937738, 22295871906 all fail with the same assertion). Not introduced by this PR.

vincentkoc · 2026-02-23T14:44:16Z

Superseded by #24549 (merged on 2026-02-23), which rebased and integrated this runner-stop hardening with current main.

Thank you @talhaorak for the original fix direction and tests.

vincentkoc · 2026-02-23T14:44:17Z

Closing as superseded by #24549 (merged). Credit preserved in changelog.

openclaw-barnacle bot added the channel: telegram Channel integration: telegram label Feb 7, 2026

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f Compare February 15, 2026 18:46

openclaw-barnacle bot added stale Marked as stale due to inactivity and removed stale Marked as stale due to inactivity labels Feb 21, 2026

talhaorak force-pushed the fix/telegram-polling-race-condition branch from 36ee71d to 04ddc1c Compare February 23, 2026 06:10

openclaw-barnacle bot added the size: XS label Feb 23, 2026

talhaorak force-pushed the fix/telegram-polling-race-condition branch from 04ddc1c to bd1308c Compare February 23, 2026 06:58

style: fix import order per oxfmt

88da202

vincentkoc mentioned this pull request Feb 23, 2026

fix(telegram): scope polling offsets per bot and await shared runner stop #24549

Merged

18 tasks

vincentkoc closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(telegram): await runner.stop() to prevent polling race condition on hot-reload#10850

fix(telegram): await runner.stop() to prevent polling race condition on hot-reload#10850
talhaorak wants to merge 2 commits intoopenclaw:mainfrom
talhaorak:fix/telegram-polling-race-condition

talhaorak commented Feb 7, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 7, 2026

Uh oh!

talhaorak commented Feb 7, 2026

Uh oh!

openclaw-barnacle bot commented Feb 21, 2026

Uh oh!

vincentkoc commented Feb 23, 2026

Uh oh!

vercel bot commented Feb 23, 2026

Uh oh!

talhaorak commented Feb 23, 2026

Uh oh!

talhaorak commented Feb 23, 2026

Uh oh!

talhaorak commented Feb 23, 2026

Uh oh!

vincentkoc commented Feb 23, 2026

Uh oh!

vincentkoc commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

talhaorak commented Feb 7, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Testing

Related Issues

AI Disclosure

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

talhaorak commented Feb 7, 2026

Uh oh!

openclaw-barnacle bot commented Feb 21, 2026

Uh oh!

vincentkoc commented Feb 23, 2026

Uh oh!

vercel bot commented Feb 23, 2026

Uh oh!

talhaorak commented Feb 23, 2026

Uh oh!

talhaorak commented Feb 23, 2026

Uh oh!

talhaorak commented Feb 23, 2026

Uh oh!

vincentkoc commented Feb 23, 2026

Uh oh!

vincentkoc commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

talhaorak commented Feb 7, 2026 •

edited by greptile-apps bot

Loading