Skip to content

Telegram polling lacks per-token dedup guard — external scripts and hot-reload can silently create duplicate pollers #56230

@Co-Messi

Description

@Co-Messi

The Problem

There's no mechanism in monitorTelegramProvider() to detect or prevent duplicate polling sessions for the same bot token. If anything starts a second getUpdates call on the same token — whether it's a hot-reload race, an external script, or a health monitor probe — the gateway enters a permanent 409 Conflict loop with no recovery path short of a full restart.

I ran into this when debugging persistent 409s on a 10-agent setup running macOS + LaunchAgent. Two full debug sessions, hundreds of 409s logged over days. The root cause turned out to be two things happening at once:

  1. An external script (launched via a separate LaunchAgent) was calling getUpdates on the same bot token every 60 seconds — the gateway had no idea this was happening
  2. The hot-reload channel restart path (applyHotReloadstopChannelstartChannel) relies on waitForGracefulStop which has a 15-second timeout (POLL_STOP_GRACE_MS). If the grammY runner doesn't stop within that window, the function returns anyway, and the new poller starts while the old one is still holding a connection

Both scenarios result in two concurrent getUpdates calls on the same token → Telegram returns 409 → the poller enters a retry loop → retries create overlapping connections → self-sustaining conflict.

The Fix (what worked for me)

Added a global Map keyed by bot token in monitorTelegramProvider():

const __activePollers = new Map();

async function monitorTelegramProvider(opts = {}) {
  // Before starting, check for existing session on this token
  const existingEntry = __activePollers.get(token);
  if (existingEntry) {
    // Wait for old session to release (with timeout)
    await Promise.race([
      existingEntry.done,
      new Promise(r => setTimeout(r, 5000))
    ]);
  }
  
  // Register this session
  let resolvePollerDone;
  const pollerDone = new Promise(r => { resolvePollerDone = r; });
  __activePollers.set(token, { accountId, startedAt: Date.now(), done: pollerDone });
  
  try {
    // ... existing polling logic ...
  } finally {
    __activePollers.delete(token);
    resolvePollerDone();
  }
}

Also added a 500ms drain pause in the hot-reload handler between stopChannel and startChannel:

const restartChannel = async (name) => {
  await params.stopChannel(name);
  await new Promise(r => setTimeout(r, 500)); // let long-poll drain
  await params.startChannel(name);
};

Both patches are currently applied to the bundled dist files (hacky, I know), but they've been running clean for hours with zero 409s on a 4-bot setup.

Why the existing fix (#20930) doesn't cover this

PR #20930 fixed the SIGUSR1 + config.patch race condition where providers started twice during a signal-triggered restart. But the file-watcher hot-reload path (chokidarapplySnapshotapplyHotReload → channel restart) still has no dedup guard. And nothing in the gateway prevents external processes from polling the same token.

The 90-second POLL_STALL_THRESHOLD_MS also creates a predictable failure window — after exactly 90 seconds of clean operation, the watchdog can trigger a stalled restart that overlaps with the existing poller if waitForGracefulStop times out.

Related Issues

These all share the same underlying gap: monitorTelegramProvider has no awareness of whether another session is already polling the same token.

Environment

  • OpenClaw v2026.3.24
  • macOS (Apple Silicon, LaunchAgent-managed gateway)
  • 4 active Telegram bots (was 10, reduced during debugging)
  • Models: kimi-coding/k2p5, minimax-portal/MiniMax-M2.7, openai-codex/gpt-5.4

Suggested Approach

A per-token registry in monitorTelegramProvider (as implemented above) is the simplest fix. It's defense-in-depth — even if the channel manager's stop/start logic is perfect, the Telegram provider itself should refuse to create a second poller for a token that's already being polled. The registry lives in the same module, requires no changes to the channel manager, and adds maybe 20 lines of code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions