fix(telegram): await runner.stop() to prevent polling race condition on hot-reload#10850
fix(telegram): await runner.stop() to prevent polling race condition on hot-reload#10850talhaorak wants to merge 2 commits intoopenclaw:mainfrom
Conversation
src/telegram/monitor.ts
Outdated
| const stopState: { promise: Promise<void> | null } = { promise: null }; | ||
| const stopOnAbort = () => { | ||
| if (opts.abortSignal?.aborted) { | ||
| void runner.stop(); | ||
| stopState.promise = runner.stop(); |
There was a problem hiding this comment.
Await never runs
stopState.promise is only set in the abort event handler. In the hot-reload/config-restart scenario described in the PR, the polling loop is typically restarted without aborting opts.abortSignal, so stopState.promise stays null and the new loop can still start before the previous runner is fully stopped.
To actually prevent overlapping pollers, ensure runner.stop() is triggered on all restart/exit paths (e.g., in the finally block unconditionally or before the backoff sleep), then await it there. As-is, this change only affects the abort case.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/telegram/monitor.ts
Line: 178:181
Comment:
**Await never runs**
`stopState.promise` is only set in the abort event handler. In the hot-reload/config-restart scenario described in the PR, the polling loop is typically restarted without aborting `opts.abortSignal`, so `stopState.promise` stays `null` and the new loop can still start before the previous runner is fully stopped.
To actually prevent overlapping pollers, ensure `runner.stop()` is triggered on *all* restart/exit paths (e.g., in the `finally` block unconditionally or before the backoff sleep), then await it there. As-is, this change only affects the abort case.
How can I resolve this? If you propose a fix, please make it concise.|
Thanks for the review! I want to clarify the hot-reload flow: In const restartChannel = async (name) => {
await params.stopChannel(name); // This calls abort.abort()
await params.startChannel(name);
};And That said, your point about being more defensive is valid — if there are other code paths that restart the channel without going through The current fix specifically targets the SIGUSR1/config-reload path that was reported in #8140. |
bfc1ccb to
f92900f
Compare
|
This pull request has been automatically marked as stale due to inactivity. |
|
Thanks for keeping this open. This is still relevant: To move this toward merge, please:
If you update this, I’ll prioritize review because this is one of the older focused polling fixes. |
36ee71d to
04ddc1c
Compare
|
@talhaorak is attempting to deploy a commit to the 0xBuns Team on Vercel. A member of the Team first needs to authorize it. |
|
Thanks @vincentkoc — done: Rebased on current main (as of 2b02e8a, 2026-02-23). No conflicts; the fix applies cleanly to the refactored file. Compatibility with latest offset logic Since the original PR, main introduced Before/after trace Reproduced locally against a multi-account config with SIGUSR1 reload: Before (void runner.stop()): After (await stopState.promise): The new runner only starts after the old one's CI running — will confirm once green. |
… hot-reload void runner.stop() in the abort handler discards the stop promise, creating a window where a new polling loop can start before the old runner has fully stopped — causing 409 Conflict errors and duplicate message delivery on config hot-reload (SIGUSR1). This patch captures the promise from the abort handler and threads it through to the finally block via stopPromise, so the existing `await (stopPromise ?? runner.stop())` always awaits the same stop operation rather than potentially racing against it. Compatible with the activeRunner tracking and forceRestarted restart logic introduced since openclaw#10850 was opened. Fixes openclaw#8140 Related: openclaw#9097, openclaw#6402, openclaw#4302
04ddc1c to
bd1308c
Compare
|
Update (2026-02-23): Rebased again on current main (76dabd5) after the recent telegram monitor refactor (steipete, Feb 22). The fix is now minimal and threaded into the existing structure:
The |
|
One failing test in the bun CI run ( |
|
Superseded by #24549 (merged on 2026-02-23), which rebased and integrated this runner-stop hardening with current Thank you @talhaorak for the original fix direction and tests. |
|
Closing as superseded by #24549 (merged). Credit preserved in changelog. |
Summary
When config hot-reload triggers a channel restart (via SIGUSR1 or
config.patch), the Telegram polling runner'sstop()promise was being discarded withvoid. This created a race condition where a new polling loop could start before the old one fully stopped.Problem
This causes:
getUpdateson the same bot tokenSolution
Store the
stop()promise and await it in the finally block:Testing
pnpm buildpassespnpm checkpasses (lint + format)pnpm testpasses for telegram-related testsRelated Issues
AI Disclosure
🤖 This PR was AI-assisted (Claude via OpenClaw). The fix was identified by analyzing the codebase and understanding the race condition mechanics. Tested locally by the human collaborator.
cc @joshp123 (Telegram maintainer)
Greptile Overview
Greptile Summary
runner.stop()promise and await it during cleanup.src/telegram/monitor.tsand affects the polling runner lifecycle/cleanup path.Confidence Score: 2/5
(2/5) Greptile learns from your feedback when you react with thumbs up/down!