You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On openclaw 2026.4.23, single-account, stable wired network, the time between Discord delivering a DM and the agent runtime starting to process it is regularly 100–400 seconds. Agent processing + LLM call after that point is healthy at 5–7 s, so the delay is entirely in the message-ingest path. This persists after the recent Discord-lifecycle hardening that closed #56492 (PR #68159, commit 5adf9d2…081f45), #53132, and #51116.
Filing as a separate, narrow issue per @steipete's instruction in the #53132 close: "If a similar startup hang is still reproducible on a current build, please open a fresh issue with current logs." This is a different failure surface than #38596 (which is about the health-monitor restart loop) — here, the bot does NOT visibly restart; the lag happens inside Carbon's reconnect/RESUME flow on a connection that the gateway still considers up.
Environment
OpenClaw
2026.4.23 (a979721 per --help and status --json runtimeVersion)
@buape/carbon
0.16.0 (latest on npm; pinned exactly by @openclaw/discord plugin)
Configure single Discord bot, restart gateway, wait for [gateway] ready (6 plugins …) then [discord] logged in to discord as <id> (OpenClaw).
Leave bot idle.
Observe gateway.log: [discord] gateway: Gateway websocket closed: 1000 followed by Gateway reconnect scheduled in <~1000ms> (close, resume=true) — happens every 3–5 minutes on idle.
Send a DM to the bot at any point.
Most of the time the bot replies in 5–10 s. Periodically (when the inbound message lands during a reconnect window or right around a did not reach READY within 30000ms event), reply takes 100–400 s.
Evidence — agent trajectory log
From ~/.openclaw/agents/main/sessions/<sid>.trajectory.jsonl, with the user message_id decoded from the embedded Discord snowflake (Discord epoch 1420070400000):
seq= 1 2026-04-25T07:25:22.028Z session.started
seq= 4 2026-04-25T07:25:23.309Z prompt.submitted
seq= 5 2026-04-25T07:25:26.783Z model.completed reply="Hi."
seq= 7 2026-04-25T07:25:26.898Z session.ended
user message_id 1497497072411218070 → Discord-snowflake-time 2026-04-25T07:18:44.213Z
=> Discord→openclaw INGEST lag = 397.8 s
Agent processing total = 4.9 s
Model latency only = 3.5 s
END-TO-END = 402.6 s
Three consecutive idle-bot tests on the same session, same wired network:
User msg
Discord ts (UTC)
session.started
Δ ingest
Δ model
Δ end-to-end
hi
07:18:44.213
07:25:22.028
397.8 s
3.5 s
402.6 s
Pingping
07:42:45.606
07:44:25.532
99.9 s
5.7 s
106.5 s
Double ping
07:51:03.549
07:53:13.547
130.0 s
6.7 s
137.3 s
The agent + LLM segment is consistently 5–10 s. The 100–400 s headline number is entirely in the segment between Discord's gateway and openclaw's DiscordMessageListener enqueuing into the agent runtime.
Correlated stuck session warning while the inbound message sits in queue:
queueDepth=1 with a pending message that does eventually get answered → this is buffering / replay during reconnect, not permanent loss (so distinct from #51116's user-facing claim that messages are "lost" — they're delayed, not dropped).
WS-close cadence (4-hour window, single account, otherwise idle)
Every WS close triggers Carbon's RESUME flow. Inbound messages received during the reconnect window get buffered. RESUME usually succeeds within ~1 s, but when it fails — discord gateway opened but did not reach READY within 30000ms (defined in dist/extensions/discord/provider-Bc1Lm79N.js:5897 as DISCORD_GATEWAY_RUNTIME_READY_TIMEOUT_MS = 3e4) — the channel exits and the outer auto-restart attempt 1/10 in 5s cycle kicks in, adding 30–60 s of unavailability per failed RESUME.
Network is not the cause
$ ping -c 3 gateway.discord.gg
9.319/10.380/11.440 ms (0% loss)
Discord-side reachability is fine. CPU on the gateway process is 0.0% per ps -o %cpu during these episodes; RSS 57 MB; no event-loop stall.
What I tried — A/B downgrade to Carbon 0.15.0 (does not work)
npm pack @buape/carbon@0.15.0 → atomic swap into dist/extensions/discord/node_modules/@buape/carbon.
Patched dist/extensions/discord/package.json@buape/carbon pin to "0.15.0".
SIGUSR1 restart.
Result: bot reaches [discord] client initialized as <id> (OpenClaw); awaiting gateway readiness and never proceeds to logged in to discord. openclaw status --json shows channelSummary: [] for the entire test window. No errors in gateway.err.log. Reverted cleanly to 0.16.0 via backup; symptom from the new issue resumed exactly as before.
So 0.15.0 is incompatible with the current @openclaw/discord plugin code path; not a viable downgrade.
Discord WebSocket disconnects every ~10 minutes, messages lost during reconnect window #51116 CLOSED 2026-04-25 — "WS disconnects every ~10 min, messages lost." Closed as obsolete on main. The "messages lost" framing was too strong — this issue documents that messages are delayed, not lost, on current main, but the user-facing latency is still in the hundreds of seconds.
Is the 3–5 min close-1000 cadence on idle considered baseline behavior on current main, or a regression worth investigating? A reproducer on a clean install would settle this — happy to provide more environment detail.
What's the right place in the code path for a buffered-message catch-up after RESUME? The Carbon Client.events after RESUMED should re-deliver missed events per Discord's gateway spec (SESSION op resume + RESUMED event with replayed dispatches). If openclaw's DiscordMessageListener is being recreated during the inner reconnect, those replayed events would never reach the agent runtime — that would explain the 100–400 s lag matching the reconnect window exactly. Worth checking whether the listener is preserved across Carbon's internal reconnects.
Would a chat.history-style catch-up on every successful RESUME (re-fetching the last N messages on each affected channel since last_message_id) be in scope as a defensive backstop, even if the underlying buffering issue is fixed? It's the same ask Discord WebSocket disconnects every ~10 minutes, messages lost during reconnect window #51116 made, and it'd be a small, opt-in change behind a config flag.
Logs / artifacts available on request
/tmp/openclaw/openclaw-2026-04-25.log (full ndjson trace, ~250 KB at time of writing)
~/.openclaw/agents/main/sessions/<sid>.trajectory.jsonl (the trajectory excerpts above came from here)
~/.openclaw/logs/gateway.log, gateway.err.log
The Carbon 0.15.0 A/B test result (post-restart awaiting gateway readiness hang) reproducible on demand.
Happy to upload as gist links if you'd prefer — let me know format you'd like.
Summary
On
openclaw 2026.4.23, single-account, stable wired network, the time between Discord delivering a DM and the agent runtime starting to process it is regularly 100–400 seconds. Agent processing + LLM call after that point is healthy at 5–7 s, so the delay is entirely in the message-ingest path. This persists after the recent Discord-lifecycle hardening that closed #56492 (PR #68159, commit5adf9d2…081f45), #53132, and #51116.Filing as a separate, narrow issue per
@steipete's instruction in the #53132 close: "If a similar startup hang is still reproducible on a current build, please open a fresh issue with current logs." This is a different failure surface than #38596 (which is about the health-monitor restart loop) — here, the bot does NOT visibly restart; the lag happens inside Carbon's reconnect/RESUME flow on a connection that the gateway still considers up.Environment
a979721per--helpandstatus --json runtimeVersion)@buape/carbon@openclaw/discordplugin)discord-api-types^0.38.47ws^8.20.0gateway.discord.gg, 0% packet losschannels.telegram.enabled: false), mDNS off (discovery.mdns.mode: "off"), TTS preflight off (messages.tts.enabled: false)ai.openclaw.gateway,gateway.controlUi.allowInsecureAuth: true(separate concern)Reproduction
[gateway] ready (6 plugins …)then[discord] logged in to discord as <id> (OpenClaw).[discord] gateway: Gateway websocket closed: 1000followed byGateway reconnect scheduled in <~1000ms> (close, resume=true)— happens every 3–5 minutes on idle.did not reach READY within 30000msevent), reply takes 100–400 s.Evidence — agent trajectory log
From
~/.openclaw/agents/main/sessions/<sid>.trajectory.jsonl, with the usermessage_iddecoded from the embedded Discord snowflake (Discord epoch1420070400000):Three consecutive idle-bot tests on the same session, same wired network:
hiPingpingDouble pingThe agent + LLM segment is consistently 5–10 s. The 100–400 s headline number is entirely in the segment between Discord's gateway and openclaw's
DiscordMessageListenerenqueuing into the agent runtime.Correlated
stuck sessionwarning while the inbound message sits in queue:queueDepth=1 with a pending message that does eventually get answered → this is buffering / replay during reconnect, not permanent loss (so distinct from #51116's user-facing claim that messages are "lost" — they're delayed, not dropped).
WS-close cadence (4-hour window, single account, otherwise idle)
Concrete close timestamps over a single ~26-min window post-restart:
Every WS close triggers Carbon's RESUME flow. Inbound messages received during the reconnect window get buffered. RESUME usually succeeds within ~1 s, but when it fails —
discord gateway opened but did not reach READY within 30000ms(defined indist/extensions/discord/provider-Bc1Lm79N.js:5897asDISCORD_GATEWAY_RUNTIME_READY_TIMEOUT_MS = 3e4) — the channel exits and the outerauto-restart attempt 1/10 in 5scycle kicks in, adding 30–60 s of unavailability per failed RESUME.Network is not the cause
Discord-side reachability is fine. CPU on the gateway process is
0.0%perps -o %cpuduring these episodes; RSS 57 MB; no event-loop stall.What I tried — A/B downgrade to Carbon 0.15.0 (does not work)
Hypothesis: Carbon 0.16.0 (released 2026-04-16) regressed RESUME vs 0.15.0 (2026-04-10). Tested by:
npm pack @buape/carbon@0.15.0→ atomic swap intodist/extensions/discord/node_modules/@buape/carbon.dist/extensions/discord/package.json@buape/carbonpin to"0.15.0".Result: bot reaches
[discord] client initialized as <id> (OpenClaw); awaiting gateway readinessand never proceeds tologged in to discord.openclaw status --jsonshowschannelSummary: []for the entire test window. No errors ingateway.err.log. Reverted cleanly to 0.16.0 via backup; symptom from the new issue resumed exactly as before.So 0.15.0 is incompatible with the current
@openclaw/discordplugin code path; not a viable downgrade.Related issues
Clientconstructor IDENTIFY race. Fixed by PR fix(discord): prevent Identify silent-drop race in gateway startup #68159. The login race no longer happens for me; this issue is about post-login churn, which fix(discord): prevent Identify silent-drop race in gateway startup #68159 doesn't touch.awaiting gateway readinesshang. Different class (multi-account); fixed in 2026.3.24+. Single-account doesn't hang on login on current main.Asks
Concrete and narrow:
Client.eventsafterRESUMEDshould re-deliver missed events per Discord's gateway spec (SESSIONop resume +RESUMEDevent with replayed dispatches). If openclaw'sDiscordMessageListeneris being recreated during the inner reconnect, those replayed events would never reach the agent runtime — that would explain the 100–400 s lag matching the reconnect window exactly. Worth checking whether the listener is preserved across Carbon's internal reconnects.chat.history-style catch-up on every successful RESUME (re-fetching the last N messages on each affected channel sincelast_message_id) be in scope as a defensive backstop, even if the underlying buffering issue is fixed? It's the same ask Discord WebSocket disconnects every ~10 minutes, messages lost during reconnect window #51116 made, and it'd be a small, opt-in change behind a config flag.Logs / artifacts available on request
/tmp/openclaw/openclaw-2026-04-25.log(full ndjson trace, ~250 KB at time of writing)~/.openclaw/agents/main/sessions/<sid>.trajectory.jsonl(the trajectory excerpts above came from here)~/.openclaw/logs/gateway.log,gateway.err.logawaiting gateway readinesshang) reproducible on demand.Happy to upload as gist links if you'd prefer — let me know format you'd like.