fix(telegram): sanitize update offset + lock polling#3186
fix(telegram): sanitize update offset + lock polling#3186daxiong888 wants to merge 1 commit intoopenclaw:mainfrom
Conversation
- Ignore/backup invalid persisted lastUpdateId instead of getting stuck\n- Add a per-bot poll lock to avoid multiple getUpdates loops\n- Wire lock + offset sanitize into telegram monitor
| const args = readLinuxCmdline(pid); | ||
| if (!args) return "unknown"; | ||
| // Best-effort: still running, but not necessarily a poller; treat unknown as alive unless stale. | ||
| return args.length > 0 ? "alive" : "unknown"; |
There was a problem hiding this comment.
[P2] On Linux, the fallback cmdline check treats any process with a non-empty /proc/<pid>/cmdline as "alive" (args.length > 0). If payload.startTime is missing/corrupted or /proc/<pid>/stat is unreadable, a reused PID from an unrelated process can keep the lock looking alive and prevent reclaim until the acquire loop times out.
Also appears in: stale cleanup only running when ownerStatus !== "alive" at src/telegram/poll-lock.ts:215-233.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/telegram/poll-lock.ts
Line: 94:97
Comment:
[P2] On Linux, the fallback `cmdline` check treats any process with a non-empty `/proc/<pid>/cmdline` as "alive" (`args.length > 0`). If `payload.startTime` is missing/corrupted or `/proc/<pid>/stat` is unreadable, a reused PID from an unrelated process can keep the lock looking alive and prevent reclaim until the acquire loop times out.
Also appears in: stale cleanup only running when `ownerStatus !== "alive"` at src/telegram/poll-lock.ts:215-233.
How can I resolve this? If you propose a fix, please make it concise.| it("prevents multiple concurrent pollers for same account+token", async () => { | ||
| await withTempStateDir(async () => { | ||
| const lock1 = await acquireTelegramPollLock({ | ||
| token: "123:abc", | ||
| accountId: "default", | ||
| timeoutMs: 50, | ||
| pollIntervalMs: 5, | ||
| staleMs: 10_000, | ||
| }); | ||
|
|
||
| await expect( | ||
| acquireTelegramPollLock({ | ||
| token: "123:abc", | ||
| accountId: "default", | ||
| timeoutMs: 30, | ||
| pollIntervalMs: 5, | ||
| staleMs: 10_000, | ||
| }), | ||
| ).rejects.toBeInstanceOf(TelegramPollLockError); |
There was a problem hiding this comment.
[P3] This test uses very tight timing (timeoutMs: 30–50, pollIntervalMs: 5). On slower/busy CI runners or some filesystems (notably Windows), this can become flaky (e.g., second acquire not observing the lock in time, or timing out differently). Consider loosening the timeouts / using fake timers to make the assertion deterministic.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/telegram/poll-lock.test.ts
Line: 23:41
Comment:
[P3] This test uses very tight timing (`timeoutMs: 30–50`, `pollIntervalMs: 5`). On slower/busy CI runners or some filesystems (notably Windows), this can become flaky (e.g., second acquire not observing the lock in time, or timing out differently). Consider loosening the timeouts / using fake timers to make the assertion deterministic.
How can I resolve this? If you propose a fix, please make it concise.bfc1ccb to
f92900f
Compare
|
This is still directionally useful, but it now overlaps with later merged offset work and current monitor changes. To make this mergeable, please rebase and split into two pieces:
That split will let us land low-risk pieces first and preserve author credit cleanly. |
|
Additional code-level blockers after deeper review:
Recommendation: close this as-is and reopen as two focused PRs (lock-only, sanitize-only) rebased on current main. |
|
Appreciate the effort on this hardening pass. I’m closing this as not mergeable in current form: it overlaps with newer merged/active offset lineage and uses stale If you want to continue this work, please open focused follow-up PRs (lock-only and sanitize-only) rebased on current main. |
This prevents Telegram polling from silently getting stuck when persisted state is corrupted and avoids multi-poller conflicts.
Changes:
lastUpdateId(must be a safe integer within the 32-bit Bot API range). If invalid, back up the file and start from a null offset.getUpdatesloops.Notes:
CLAWDBOT_ALLOW_MULTI_TELEGRAM_POLL=1to disable the poll lock.Tests:
pnpm vitest run src/telegram/update-offset-store.test.ts src/telegram/poll-lock.test.tsGreptile Overview
Greptile Summary
This PR hardens Telegram long-polling by (1) validating and backing up corrupted persisted
lastUpdateIdstate before starting, and (2) adding a per-bot polling lock (account + token hash) to prevent multiple concurrentgetUpdatesloops. The monitor now acquires the lock before starting polling, logs invalid offset recovery viaonInvalid, and ensures the lock is released on exit (including webhook mode).Confidence Score: 4/5
(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!