You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gateway is experiencing severe Node.js event loop starvation. Agents appear frozen because gateway operations are taking 3-7 minutes instead of milliseconds. This is the same symptom pattern as issue #74404 (Gateway CPU-Saturated, Agents Stop Responding), but the system is running a beta version after the supposedly fixed stable release.
Steps to reproduce
Reset a few agents, at least one will have this issue, in my case all my agents have this issue and needed to downgrade.
Expected behavior
Do I need to explain?
Actual behavior
The gateway is experiencing severe Node.js event loop starvation. Agents appear frozen because gateway operations are taking 3-7 minutes instead of milliseconds. This is the same symptom pattern as issue #74404 (Gateway CPU-Saturated, Agents Stop Responding), but the system is running a beta version after the supposedly fixed stable release.
## Evidence### Issue #74404 — Gateway CPU-Saturated, Agents Stop Responding
- **Severity:** HIGH
- **Status:** CLOSED in 2026.5.2 stable (supposedly fixed)
- **Current version:** 2026.5.12-beta.1 (AFTER the stable fix release)
- **Key metric - Event Loop Delays:**
- 21:06: 28,145ms delay
- 21:10: 52,126ms delay
- 21:17: 120,807ms delay (2 minutes)
- 21:22: 145,221ms delay
- 21:25: 196,270ms delay
- 21:29: 211,730ms delay (3.5 minutes)
- 21:34: 277,594ms delay (4.6 minutes)
- 21:40: 346,821ms delay (5.8 minutes)
- **21:47: 420,420ms delay (7 minutes)** ← User reported Marcus stuck here
- 21:53: stability check FAILED with 10s timeout
### Gateway Response Times (sessions.list)
- 21:38: **1,816,340ms** (30 minutes) for a single sessions.list call
- 21:53: **2,731,729ms** (45 minutes) for sessions.list
### Telegram Polling Stalls (multiple)
21:19:50 - Polling stall detected (142.88s stuck)
21:25:51 - Polling stall detected (206.45s stuck)
21:34:21 - Polling stall detected (287.77s stuck)
### Gateway Stability Check
21:53:35 - Gateway stability failed: GatewayTransportError: gateway timeout after 10000ms
---
## System Info| Component | Value ||-----------|-------|| OS | Linux, NVMe disk || Node.js | v24.14.1 || Memory Total | 31 GiB || Memory Available | 26 GiB || Disk Usage | 70% used (68 GiB available) || Gateway | loopback (127.0.0.1:18789) || Service | systemd user (pid 1582, state active) |
---
## Analysis
The blocklist entry for#74404 states it was "fixedin 2026.5.2 stable". However:1. **User is running 2026.5.12-beta.1** - a beta version released AFTER the stable fix2. **The fix appears incomplete or regressed** - event loop delays are back to 400+ seconds3. **Telegram polling is a major contributor** - getUpdates calls are timing out and blockingThis suggests either:- The #74404 fix was incomplete- A regression was introduced after 2026.5.2 stable- The Telegram polling issue (#73432 - QMD Embed Timer) is exacerbating the problem---## Relevant Blocklist Entries| Issue | Status | Relevance ||-------|--------|-----------|| #74404 | CLOSED (2026.5.2 stable) | SAME SYMPTOM - but fix regressed? || #73432 | OPEN | Telegram polling stalls || #75501 | OPEN | Too many open files (v4.29 regression) |
Impact and severity
This version is useless for me; I needed to downgrade.
Additional information
Marcus Stalled Session - Root Cause Analysis
Date: 2026-05-12 23:47 UTC Agent: Marcus (Developer) Version: OpenClaw 2026.5.12-beta.1
Executive Summary
Marcus appeared "stuck" with a writing indicator but no output because the entire gateway event loop was blocked by a long-running Telegram channel startup operation. The Telegram startAccount phase took 7,332,155ms (122 minutes) due to blocking HTTP calls to the Telegram Bot API.
Move Telegram's startAccount to run in a separate task queue or worker thread:
// Current (BLOCKING):consttrackedPromise=Promise.resolve().then(()=>measureStartup(`channels.${channelId}.start-account`,()=>startAccount({ cfg,accountId: id, account, runtime, ... })));// Fixed (NON-BLOCKING):// Run startAccount in a separate microtask to not block the gatewayconsttrackedPromise=Promise.resolve().then(async()=>{// Don't await blocking operations in the gateway task queueif(channelId==='telegram'){// Spawn as detached tasksetImmediate(()=>{measureStartup(`channels.${channelId}.start-account`,()=>startAccount({ cfg,accountId: id, account, runtime, ... }));});return;}returnmeasureStartup(`channels.${channelId}.start-account`,()=>startAccount({ cfg,accountId: id, account, runtime, ... }));});
Fix 2: Implement Circuit Breaker for Telegram API
File:probe-DuPRVUmp.js
Add a circuit breaker that fails fast when Telegram is unresponsive:
constcircuitBreaker={failures: 0,maxFailures: 3,resetTimeout: 30000,// 30 secondsasynccall(fn){if(this.failures>=this.maxFailures){thrownewError('Circuit breaker open - Telegram API unavailable');}try{returnawaitfn();}catch(err){this.failures++;if(this.failures>=this.maxFailures){setTimeout(()=>this.failures=0,this.resetTimeout);}throwerr;}}};// Usage:meRes=awaitcircuitBreaker.call(()=>fetchWithTimeout(`${base}/getMe`,{},timeoutBudgetMs,fetcher));
Fix 3: Use Worker Threads for Blocking HTTP
For Node.js, wrap blocking HTTP calls in a Worker thread:
const{ Worker }=require('worker_threads');asyncfunctionstartAccountInWorker(params){returnnewPromise((resolve,reject)=>{constworker=newWorker('./telegram-start-worker.js',{workerData: params});worker.on('message',resolve);worker.on('error',reject);worker.on('exit',(code)=>{if(code!==0)reject(newError(`Worker exited with code ${code}`));});});}
Fix 4: Timeout with Aggressive Retry Limits
File:probe-DuPRVUmp.js:571
Reduce the timeout budget and fail faster:
// Current: timeoutBudgetMs can be very largemeRes=awaitfetchWithTimeout(`${base}/getMe`,{},timeoutBudgetMs,fetcher);// Fixed: Cap at 5 seconds, fail fastconstTEELEGRAM_START_TIMEOUT=5000;// 5 seconds maxmeRes=awaitfetchWithTimeout(`${base}/getMe`,{},TEELEGRAM_START_TIMEOUT,fetcher);
Bug type
Regression (worked before, now fails)
Beta release blocker
Yes
Summary
The gateway is experiencing severe Node.js event loop starvation. Agents appear frozen because gateway operations are taking 3-7 minutes instead of milliseconds. This is the same symptom pattern as issue #74404 (Gateway CPU-Saturated, Agents Stop Responding), but the system is running a beta version after the supposedly fixed stable release.
Steps to reproduce
Reset a few agents, at least one will have this issue, in my case all my agents have this issue and needed to downgrade.
Expected behavior
Do I need to explain?
Actual behavior
The gateway is experiencing severe Node.js event loop starvation. Agents appear frozen because gateway operations are taking 3-7 minutes instead of milliseconds. This is the same symptom pattern as issue #74404 (Gateway CPU-Saturated, Agents Stop Responding), but the system is running a beta version after the supposedly fixed stable release.
OpenClaw version
v2026.5.12-beta.1
Operating system
Ubuntu
Install method
NPM
Model
Minimax
Provider / routing chain
Minimax
Additional provider/model setup details
openclaw-2026-05-13.log
Logs, screenshots, and evidence
Impact and severity
This version is useless for me; I needed to downgrade.
Additional information
Marcus Stalled Session - Root Cause Analysis
Date: 2026-05-12 23:47 UTC
Agent: Marcus (Developer)
Version: OpenClaw 2026.5.12-beta.1
Executive Summary
Marcus appeared "stuck" with a writing indicator but no output because the entire gateway event loop was blocked by a long-running Telegram channel startup operation. The Telegram
startAccountphase took 7,332,155ms (122 minutes) due to blocking HTTP calls to the Telegram Bot API.Timeline of Events (Using Marcus as Example)
21:53:xx - Marcus's Session Queued
21:56:55 - Gateway Liveness Warning
channels.telegram.start-accountspent 7,332,155ms (122 minutes!) in start-account phase21:47:28 - Event Loop Starvation Intensifies
Root Cause Analysis
Primary Cause: Blocking HTTP Calls in Telegram startAccount
The Telegram channel
startAccountphase makes synchronous blocking HTTP calls to:getMe- Bot API probe (line 571 inprobe-DuPRVUmp.js)deleteWebhook- Cleanup before pollinggetWebhookInfo- Check webhook stateCode location:
server-channels-CrJ7hZRA.js:402The Problem
When Telegram's Bot API is slow or experiencing issues:
fetchWithTimeout()calls block the Node.js event loopWhy This Is Critical
The
fetchWithTimeout()function infetch-timeout-BsLaC-cZ.js:Even though it uses
AbortController, the underlyingfetchcan still block for the full timeout duration while holding the event loop.Evidence from Logs
Why #74404 Fix Appears Regressed
The blocklist states #74404 was "fixed in 2026.5.2 stable". However:
sessions.listperformance)Fix Suggestions
Fix 1: Non-Blocking Telegram Startup (Recommended)
File:
server-channels-CrJ7hZRA.jsMove Telegram's
startAccountto run in a separate task queue or worker thread:Fix 2: Implement Circuit Breaker for Telegram API
File:
probe-DuPRVUmp.jsAdd a circuit breaker that fails fast when Telegram is unresponsive:
Fix 3: Use Worker Threads for Blocking HTTP
For Node.js, wrap blocking HTTP calls in a Worker thread:
Fix 4: Timeout with Aggressive Retry Limits
File:
probe-DuPRVUmp.js:571Reduce the timeout budget and fail faster:
Files Involved
fetch-timeout-BsLaC-cZ.jsprobe-DuPRVUmp.js:571server-channels-CrJ7hZRA.js:402extensions/telegram/*