Telegram polling dies silently — runner.task() resolves without error, no restart

## Summary

The Telegram long-polling loop can **silently stop** while the gateway process remains alive. The grammY runner's `task()` promise resolves without error when `maxRetryTime` is exceeded, causing the polling `while` loop to exit via `return`. No restart logic exists, so the bot becomes permanently unresponsive until manual restart.

A related issue: when a channel's `startAccount` task crashes or exits, the `.catch()` logs the error and `.finally()` sets `running: false` — but **never restarts the channel**.

## Root Cause

### Bug 1: `src/telegram/monitor.ts` — polling exits on silent resolve

```typescript
// BEFORE (broken): runner.task() resolves → return exits the while loop forever
await runner.task();
return; // ← this kills the polling loop silently
```

When grammY's internal retry window (`maxRetryTime: 5min`) is exhausted (e.g., Telegram API unreachable for >5 min), `runner.task()` resolves cleanly (no error). The `return` statement exits `monitorTelegramProvider()`, and the channel marks `running: false`. The bot never polls again.

### Bug 2: `src/gateway/server-channels.ts` — no channel restart on crash

When `startAccount()` throws or resolves unexpectedly, the promise chain logs the error but never restarts the channel. The channel is permanently dead until gateway restart.

## Observed Behavior

- Gateway process alive for 9+ hours (PID visible, port listening)
- **Zero** outbound TCP connections to Telegram API (`lsof -i` shows no `149.154.x.x`)
- Last log entry: `"[default] channel exited: Request to 'getUpdates' timed out after 500 seconds"`
- Bot completely unresponsive to Telegram messages
- Only fix: manual `launchctl stop/start`

## Proposed Fix (implemented locally, tested)

### Fix 1: Auto-restart polling on unexpected stop

Replace `return` with backoff + `continue` in the polling loop. Only exit when `abortSignal` is aborted:

```typescript
// AFTER: treat silent resolve as recoverable, restart with backoff
await runner.task();
if (opts.abortSignal?.aborted) return; // intentional stop — exit cleanly
restartAttempts += 1;
const delayMs = computeBackoff(TELEGRAM_POLL_RESTART_POLICY, restartAttempts);
log(`Telegram polling stopped unexpectedly; restarting in ${formatDurationMs(delayMs)}.`);
await sleepWithAbort(delayMs, opts.abortSignal);
// continue → loops back to create a new runner
```

### Fix 2: Auto-restart channels with exponential backoff

Wrap `startAccount()` in `runAccountWithRestart()` — a while loop with exponential backoff (3s initial, 60s max, factor 2, jitter 0.2) that auto-restarts crashed channels up to 20 times:

```typescript
const runAccountWithRestart = async () => {
  let restartAttempts = 0;
  while (!abort.signal.aborted) {
    try {
      await startAccount({ ... });
      if (abort.signal.aborted) return;
      restartAttempts += 1;
      if (restartAttempts > MAX_CHANNEL_RESTART_ATTEMPTS) { /* give up */ return; }
      const delayMs = computeBackoff(CHANNEL_RESTART_POLICY, restartAttempts);
      log(`channel stopped unexpectedly; restarting in ${formatDurationMs(delayMs)}`);
      await sleepWithAbort(delayMs, abort.signal);
    } catch (err) {
      // same pattern: backoff + restart, give up after MAX attempts
    }
  }
};
```

### Fix 3: Active Telegram health check (new)

Periodically call `bot.api.getMe()` (every 60s) to verify Telegram API connectivity. After 3 consecutive failures, force-stop the runner — which triggers the existing restart loop:

```typescript
const stopHealthCheck = startPollingHealthCheck({
  bot,
  intervalMs: 60_000,
  maxFailures: 3,
  timeoutMs: 10_000,
  onUnhealthy: () => void runner.stop(),
  signal: opts.abortSignal,
});
```

### Fix 4: Process-level self-watchdog (new)

Detect event loop hangs via `setInterval` drift detection. If the event loop is unresponsive for >30s, force `process.exit(1)` so launchd (`KeepAlive: true`) auto-restarts the gateway:

```typescript
export function startWatchdog(opts?: WatchdogOptions, signal?: AbortSignal): () => void {
  let lastTick = Date.now();
  const timer = setInterval(() => {
    const delta = Date.now() - lastTick;
    if (delta > thresholdMs) process.exit(1);
    lastTick = Date.now();
  }, intervalMs);
  timer.unref();
  return () => clearInterval(timer);
}
```

## Environment

- macOS (Mac mini, LaunchAgent with KeepAlive)
- Node.js 22
- grammY + @grammyjs/runner
- Long-polling mode (not webhook)

## Related Issues

- #4248 — Gateway crashes on unhandled fetch rejection
- #3815 — Clawdbot Gateway Crashes Repeatedly
- #3646 — Multiple Critical Channel Bugs (Telegram, AbortError)
- #2935 — Heartbeat stops after context compression
- #1964 — Telegram channel goes silent periodically

## Files Changed

1. `src/telegram/monitor.ts` — polling restart loop + health check integration
2. `src/gateway/server-channels.ts` — `runAccountWithRestart()` with backoff
3. `src/infra/watchdog.ts` — new process self-watchdog
4. `src/gateway/server.impl.ts` — watchdog lifecycle integration
5. Tests: `monitor.test.ts` (9 tests), `watchdog.test.ts` (4 tests)

All tests pass. Build clean. Lint clean.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Telegram polling dies silently — runner.task() resolves without error, no restart #4302

Summary

Root Cause

Bug 1: `src/telegram/monitor.ts` — polling exits on silent resolve

Bug 2: `src/gateway/server-channels.ts` — no channel restart on crash

Observed Behavior

Proposed Fix (implemented locally, tested)

Fix 1: Auto-restart polling on unexpected stop

Fix 2: Auto-restart channels with exponential backoff

Fix 3: Active Telegram health check (new)

Fix 4: Process-level self-watchdog (new)

Environment

Related Issues

Files Changed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Telegram polling dies silently — runner.task() resolves without error, no restart #4302

Description

Summary

Root Cause

Bug 1: src/telegram/monitor.ts — polling exits on silent resolve

Bug 2: src/gateway/server-channels.ts — no channel restart on crash

Observed Behavior

Proposed Fix (implemented locally, tested)

Fix 1: Auto-restart polling on unexpected stop

Fix 2: Auto-restart channels with exponential backoff

Fix 3: Active Telegram health check (new)

Fix 4: Process-level self-watchdog (new)

Environment

Related Issues

Files Changed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug 1: `src/telegram/monitor.ts` — polling exits on silent resolve

Bug 2: `src/gateway/server-channels.ts` — no channel restart on crash