You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gateway intermittently stalls its Node event loop for tens to hundreds of seconds, causing cross-channel latency/failures. This is not limited to WhatsApp: during the same periods Telegram polling/send actions stall or fail, Slack socket pings/pongs time out, and WhatsApp Web repeatedly disconnects/exits. WhatsApp additionally hits recurring 408/428 reconnect/session-expiry failures, tracked separately in #75736.
User-visible impact
WhatsApp messages sometimes get an automatic reaction, but no assistant reply is sent or the reply is delayed by minutes.
Telegram is also occasionally slow and has send/dispatch failures.
Gateway status probes can show channels as connected briefly, then stopped/disconnected shortly after.
Why the WhatsApp reaction happens but no answer follows
Logs show WhatsApp inbound/reaction handling can complete before the assistant run/delivery path finishes. After the reaction, the gateway/agent path can stall on event-loop delay, session/lane waits, file lock timeouts, LLM timeout, or WhatsApp listener flapping. This creates the visible pattern: ✅ reaction arrives, but no final answer is delivered.
Environment
OpenClaw: 2026.4.29 (a448042)
OS: Linux 6.8.0-100-generic x64
Node: 22.22.0
Gateway: systemd user service
Install: npm/pnpm global CLI
Channels enabled: WhatsApp, Telegram, Slack
Host resources during investigation: RAM available ~1.5–1.6GiB, disk ~88% full, gateway process around 35–40% RSS and CPU spikes during stalls.
Current channel state example
Gateway reachable.
- Slack default: enabled, configured, stopped, disconnected, error: channel stop timed out after 5000ms
- Telegram default: enabled, configured, running, connected, mode: polling, works
- WhatsApp default: enabled, configured, linked, stopped, disconnected, error: channel exited without an error
[WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 15000ms!
[slack] socket disconnected (disconnect). retry 1/12 in 2s
[health-monitor] [slack:default] health-monitor: restarting (reason: disconnected)
[slack] [default] channel stop exceeded 5000ms after abort; continuing shutdown
WhatsApp flapping / delivery failures
[whatsapp] Web connection closed (status 408). Retry 1/12 in 2.2s… (status=408 Request Time-out Connection was lost)
[whatsapp] Web connection closed (status 428: session expired or precondition required). Relink with `openclaw channels login --channel whatsapp`. Stopping web monitoring.
[whatsapp] [default] channel exited without an error
[whatsapp] [default] auto-restart attempt 3/10 in 22s
[tools] message failed: Error: No active WhatsApp Web listener (account: default).
Gateway event loop is being blocked by one or more synchronous/CPU-heavy or file-lock-heavy operations, causing all channel transports to miss heartbeats/timeouts.
Session persistence / trajectory flushing may be contributing: sessions.json.lock timeout and pi-trajectory-flush cleanup timeout appear near stalls.
LLM/tool-loop timeouts and context-overflow diagnostics may be leaving sessions in long processing_without_queue states, causing lane waits and downstream delivery delays.
A stuck agent run or trajectory flush should not block channel polling/websocket heartbeats for 10–170s.
Inbound ack/reaction and assistant reply delivery should not diverge silently; if a reply cannot be delivered, the failure should be recoverable/observable.
Telegram/Slack/WhatsApp transports should remain responsive even when one session or cron is stuck.
Actual behavior
Gateway event-loop stalls correlate with Telegram polling stalls, Slack pings timing out, WhatsApp disconnects/exits, lane waits, session lock failures, and missed/delayed user replies.
Bug type
Performance / reliability regression
Summary
Gateway intermittently stalls its Node event loop for tens to hundreds of seconds, causing cross-channel latency/failures. This is not limited to WhatsApp: during the same periods Telegram polling/send actions stall or fail, Slack socket pings/pongs time out, and WhatsApp Web repeatedly disconnects/exits. WhatsApp additionally hits recurring
408/428reconnect/session-expiry failures, tracked separately in #75736.User-visible impact
Why the WhatsApp reaction happens but no answer follows
Logs show WhatsApp inbound/reaction handling can complete before the assistant run/delivery path finishes. After the reaction, the gateway/agent path can stall on event-loop delay, session/lane waits, file lock timeouts, LLM timeout, or WhatsApp listener flapping. This creates the visible pattern: ✅ reaction arrives, but no final answer is delivered.
Environment
2026.4.29(a448042)6.8.0-100-genericx6422.22.0Current channel state example
Sanitized evidence
Event-loop / liveness stalls
Telegram affected too
Slack affected too
WhatsApp flapping / delivery failures
Reaction without timely answer pattern
Agent/session/lane symptoms
Counts from a 12h log sample
Hypotheses
sessions.json.locktimeout andpi-trajectory-flushcleanup timeout appear near stalls.processing_without_queuestates, causing lane waits and downstream delivery delays.Expected behavior
Actual behavior
Gateway event-loop stalls correlate with Telegram polling stalls, Slack pings timing out, WhatsApp disconnects/exits, lane waits, session lock failures, and missed/delayed user replies.
Related