Summary
The Telegram webhook handler in extensions/telegram/src/webhook.ts:280-378 only sends the 200 ack after the bot middleware (bot.on('message') → user-defined handlers) completes. For installations where the message handler does substantial work (agent dispatch, async task scheduling, multi-step routing), this can push the response time past Telegram's tolerance window, causing Telegram to mark the delivery failed and queue the message for retry.
The TELEGRAM_WEBHOOK_CALLBACK_TIMEOUT_MS = 10_000 ceiling combined with grammy's onTimeout: "return" does prevent unbounded blocking, but 10s is on the edge of Telegram's webhook delivery tolerance and adds unnecessary latency on every request.
The canonical webhook pattern (Telegram's own docs, Stripe webhooks, all major webhook receivers) is: ack 200 immediately, process the body asynchronously.
Reproduction
- Configure a telegram channel via the bundled telegram extension with
webhookUrl pointing at a publicly reachable endpoint
- Wire a bot handler whose work routinely takes >10s (e.g. an agent-dispatch handler that submits to a slower backend like an LLM via openclaw
agent --json --message ...)
- Send a Telegram message
- Observe
getWebhookInfo pending_update_count accumulates over time even when the system is otherwise healthy
- Time a direct POST to the local webhook endpoint:
time curl -m 15 "http://127.0.0.1:18801/webhooks/telegram" \
-X POST \
-H "Content-Type: application/json" \
-H "X-Telegram-Bot-Api-Secret-Token: <your-secret>" \
-d '{"update_id":99999996,"message":{...}}'
Observed: ~10.0s response time on every request. Expected for an ack-first pattern: <100ms.
Diagnosis
In extensions/telegram/src/webhook.ts:
// line 280
const handler = grammy.webhookCallback(bot, "callback", {
secretToken: secret,
onTimeout: "return",
timeoutMilliseconds: TELEGRAM_WEBHOOK_CALLBACK_TIMEOUT_MS,
});
// line 375
await handler(body.value, reply, secretHeader, unauthorized);
if (!replied) {
respondText(200);
}
The await handler(...) blocks until middleware finishes (or 10s onTimeout fires). The reply callback isn't called until the middleware decides to call it — which for a heavy handler means at end-of-processing.
Proposed fix
Two options, ranked by impact:
Option 1 (recommended): Convert to fire-and-forget pattern.
respondText(200);
void handler(body.value, async () => {}, secretHeader, async () => {});
Trade-off: middleware errors no longer surface to Telegram (no 4xx/5xx). For Telegram webhook semantics this is fine — Telegram doesn't act on 4xx differently than 200, and 5xx triggers their retry curve. Logging is the right surface for handler errors, not HTTP status.
Option 2 (smaller, less surgical): Lower TELEGRAM_WEBHOOK_CALLBACK_TIMEOUT_MS to 2-3 seconds. Grammy still aborts the wait at the lower bound and ack happens faster. Middleware continues running in background after onTimeout returns. Less ideal because the awaited path still adds 2-3s on every request.
I'd offer a PR for option 1 if there's interest. The change is ~10 lines in webhook.ts. Tests in webhook.test.ts would need updating to assert ack-before-handler-completion.
Environment
- openclaw 2026.4.15
- Tailscale Funnel exposing local port 18801 to a public HTTPS endpoint
- macOS 25.2.0
- Telegram webhook with secret_token
Why this matters
Without the ack-first pattern, every install whose bot handler does meaningful work hits a window where Telegram's queue grows during normal operation. Operators end up writing watchdog scripts to detect stuck pending_update_count and force re-delivery via deleteWebhook → getUpdates → replay — band-aid for what should be a clean handler pattern.
Summary
The Telegram webhook handler in
extensions/telegram/src/webhook.ts:280-378only sends the 200 ack after the bot middleware (bot.on('message')→ user-defined handlers) completes. For installations where the message handler does substantial work (agent dispatch, async task scheduling, multi-step routing), this can push the response time past Telegram's tolerance window, causing Telegram to mark the delivery failed and queue the message for retry.The
TELEGRAM_WEBHOOK_CALLBACK_TIMEOUT_MS = 10_000ceiling combined with grammy'sonTimeout: "return"does prevent unbounded blocking, but 10s is on the edge of Telegram's webhook delivery tolerance and adds unnecessary latency on every request.The canonical webhook pattern (Telegram's own docs, Stripe webhooks, all major webhook receivers) is: ack 200 immediately, process the body asynchronously.
Reproduction
webhookUrlpointing at a publicly reachable endpointagent --json --message ...)getWebhookInfopending_update_countaccumulates over time even when the system is otherwise healthyObserved: ~10.0s response time on every request. Expected for an ack-first pattern: <100ms.
Diagnosis
In
extensions/telegram/src/webhook.ts:The
await handler(...)blocks until middleware finishes (or 10s onTimeout fires). Thereplycallback isn't called until the middleware decides to call it — which for a heavy handler means at end-of-processing.Proposed fix
Two options, ranked by impact:
Option 1 (recommended): Convert to fire-and-forget pattern.
Trade-off: middleware errors no longer surface to Telegram (no 4xx/5xx). For Telegram webhook semantics this is fine — Telegram doesn't act on 4xx differently than 200, and 5xx triggers their retry curve. Logging is the right surface for handler errors, not HTTP status.
Option 2 (smaller, less surgical): Lower
TELEGRAM_WEBHOOK_CALLBACK_TIMEOUT_MSto 2-3 seconds. Grammy still aborts the wait at the lower bound and ack happens faster. Middleware continues running in background after onTimeout returns. Less ideal because the awaited path still adds 2-3s on every request.I'd offer a PR for option 1 if there's interest. The change is ~10 lines in
webhook.ts. Tests inwebhook.test.tswould need updating to assert ack-before-handler-completion.Environment
Why this matters
Without the ack-first pattern, every install whose bot handler does meaningful work hits a window where Telegram's queue grows during normal operation. Operators end up writing watchdog scripts to detect stuck
pending_update_countand force re-delivery viadeleteWebhook→getUpdates→ replay — band-aid for what should be a clean handler pattern.