Skip to content

[Bug]: Discord gateway hang at 'awaiting gateway readiness' still reproduces on 2026.5.3-1 (macOS) — six closed dups, raw-ws test isolates failure to Carbon Client lifecycle #77668

@RyanSandoval

Description

@RyanSandoval

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

On macOS, the Discord gateway plugin (Carbon) silently hangs at client initialized as <id>; awaiting gateway readiness after restart, never reaching READY, with no timeout/error event ever fired. A raw ws connection from the same Node binary, same node_modules/ws, same machine, same bot token completes the WS handshake to gateway.discord.gg in <1 second — proving the failure is purely inside Carbon's Client constructor → registerClient async flow, not the network/token/Discord. Still reproduces on 2026.5.3-1 despite #56492 / #57075 / #58290 / #59820 / #70841 / #63223 having been closed as duplicates over the past two months.

Steps to reproduce

  1. Install OpenClaw 2026.5.3-1 on macOS via npm global, run as a ~/Library/LaunchAgents/ai.openclaw.gateway.plist LaunchAgent.
  2. Configure a single Discord bot account, single guild, with commands.native: false and commands.nativeSkills: false (rules out the slash-deploy CPU spike from [Bug]: Discord gateway stuck at 'awaiting gateway readiness' — --verbose flag as workaround (v2026.4.2) #60559).
  3. launchctl bootout "gui/$(id -u)/ai.openclaw.gateway" → wait 60s → launchctl bootstrap …
  4. tail -f ~/.openclaw/logs/gateway.log | grep '\[discord\]'

Observed (every restart over the past 36+ hours, ~10 attempts):

[discord] [default] starting provider (@K2)
[discord] channels resolved: <guild_id> (guild:K2; aliases:guild:<guild_id>)
[discord] users resolved: <user_id>
[discord] channel users resolved: <user_id>
[discord] client initialized as <bot_id>; awaiting gateway readiness
                                                ^^^ silent forever — no further [discord] events
  1. Smoking-gun isolation test — same machine, same token, same ws library bundled with OpenClaw:
node -e '
const WebSocket = require("/path/to/openclaw/node_modules/ws");
const ws = new WebSocket("wss://gateway.discord.gg/?v=10&encoding=json");
ws.on("open",    () => console.log("OPEN"));
ws.on("message", (d) => { console.log("MSG:", d.toString().slice(0, 200)); ws.close(); process.exit(0); });
ws.on("error",   (e) => { console.log("ERROR:", e.message); process.exit(2); });
'

Output (every time, in <1s):

OPEN: WebSocket opened to Discord
MSG: {"t":null,"s":null,"op":10,"d":{"heartbeat_interval":41250,"_trace":[...]}}

So Discord IS sending HELLO. Carbon's GatewayPlugin just never processes it — the 'open' and 'message' handlers either aren't registered before the frame arrives, or registration completes after the frame is already buffered/dropped. This matches the un-awaited-registerClient race documented in #56492 root-cause.

Expected behavior

After [discord] client initialized as ..., within ~5 seconds:

[discord] logged in to discord as <bot_id> (<username>)

And inbound MESSAGE_CREATE events should be delivered to the agent.

Actual behavior

OpenClaw version

2026.5.3-1

Operating system

macOS 26.3.1 (build 25D771280a, Apple Silicon)

Install method

npm global (npm install -g openclaw), launched by ~/Library/LaunchAgents/ai.openclaw.gateway.plist

Model

anthropic/claude-opus-4-7 (via agentRuntime.id: "claude-cli" subprocess to local Claude Code CLI)

Provider / routing chain

discord ← openclaw-gateway → claude-cli subprocess → claude.ai OAuth (overage disabled at org level, irrelevant to this bug)

Additional provider/model setup details

Logs, screenshots, and evidence

2026-05-04T20:09:42.638-07:00 [discord] users resolved: <user_id>
2026-05-04T20:09:42.639-07:00 [discord] channel users resolved: <user_id>
2026-05-04T20:09:43.106-07:00 [discord] client initialized as <bot_id>; awaiting gateway readiness
[silence — no further discord events for 30+ minutes]
2026-05-04T20:12:20.690-07:00 [gateway] http server listening (3 plugins: anthropic, discord, memory-core; 3.8s)
2026-05-04T20:12:22.636-07:00 [discord] [default] starting provider (@K2)
2026-05-04T20:12:23.497-07:00 [discord] channels resolved: <guild_id>
2026-05-04T20:12:23.672-07:00 [discord] client initialized as <bot_id>; awaiting gateway readiness
[silence — same hang on second bootout/bootstrap cycle]

Last successful login over the past 36 hours: 2026-05-03 09:57:36 (one success across ~10 restart attempts on this host).

Discord session_start_limit on the affected bot at the time of writing: remaining: 960/1000, resets in 588 min — confirming Discord-side identify quota is not the cause.

Impact and severity

Additional information

Regression timeline (all closed as duplicates of root-cause issue #56492 which itself is closed):

Workarounds that work some of the time:

Workarounds that do NOT work on 2026.5.3-1:

  • kill -9 $GATEWAY_PID followed by launchd respawn
  • launchctl kickstart -k (leaves stale TCP sockets attached)
  • openclaw doctor --fix (does not touch the discord plugin lifecycle)

Suggested fix priority order:

  1. Make Client.registerClient(plugin) async-aware in @buape/carbon so the constructor's plugin loop awaits the promise. Per Discord gateway never connects: Carbon Client constructor doesn't await async GatewayPlugin.registerClient #56492 the constructor currently does for (const p of plugins) { p.registerClient?.(this); } with no await. This is the upstream fix.
  2. In OpenClaw's discord provider, after new Client(...), explicitly await client.getPlugin('gateway')?.registerClient?.(client) so the WebSocket connection is guaranteed to be established before lifecycleGateway.isConnected is polled. This is the OpenClaw-side fix that doesn't require waiting on Carbon upstream.
  3. Restore the [discord] logged in to discord as ... log line even when the underlying event flow has been refactored. Watchdogs and operators rely on this for liveness. Currently it is silently dropped per Gateway becomes zombie after system CA rotation; internal reconnect loop cannot recover; Discord READY log line also missing in 2026.4.5 #63223.
  4. Replace the ?? true nullish-default in the waitForGatewayReady guard with ?? false (or surface the actual hung-promise state) so the documented 15-second timeout actually fires. Per [Bug]: Discord plugin: post-restart gateway wedges at "awaiting gateway readiness" on macOS (no 15s timeout fires) #70841 this is currently masking the failure.

I am happy to test patches on this affected macOS host — the bug reproduces deterministically here.


Cross-references: #56492 (root cause, closed dup) · #57075 (single-account, closed dup) · #58290 (closed dup) · #59820 (closed dup) · #70841 (closed dup, macOS, ~80% rate) · #63223 (open, missing READY log) · #60559 (closed, --verbose workaround)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions