Skip to content

Discord gateway hangs at 'awaiting gateway readiness' on single-account setup (v2026.3.22–v2026.3.28) #57075

@wlcarden

Description

@wlcarden

Summary

Single-account Discord bot hangs indefinitely at discord client initialized as <id> (<name>); awaiting gateway readiness on every start for all versions >= v2026.3.22. The gateway WebSocket never completes the IDENTIFY → READY handshake. v2026.3.13 works reliably with identical configuration.

Related to #53132 (multi-account variant), but this reproduces with a single bot account and is not fixed by v2026.3.24 or v2026.3.28, contrary to the multi-account resolution reported there.

Environment

  • OpenClaw versions tested: v2026.3.22, v2026.3.24, v2026.3.28 (all hang); v2026.3.13 (works)
  • OS: Linux Mint (6.17.0-19-generic, x86_64)
  • Node: 25.8.0 (linuxbrew)
  • Service: systemd user unit
  • Discord accounts: 1 bot ("Kit"), 2 guilds, 63 native slash commands
  • Gateway config: single-account, loopback bind, voice enabled

Reproduction

  1. Install openclaw >= v2026.3.22 (tested .22, .24, .28)
  2. Configure a single Discord bot account with native commands enabled
  3. Start the gateway: systemctl --user start openclaw-gateway
  4. Observe logs:
[discord] native commands using Carbon reconcile path
[discord] client initialized as 1479251097339166872 (Kit); awaiting gateway readiness
  1. Bot never transitions to logged in to discord as <id> (<name>). No timeout error is logged. Hangs indefinitely (tested 60+ seconds).
  2. Downgrade to v2026.3.13: bot logs in immediately with gatewayConnected=true.

100% reproducible across 6+ clean starts on each version (stop → install → start with no rapid restarts).

Expected

Bot should reach logged in to discord within ~15 seconds, as it does on v2026.3.13.

Actual

Bot hangs at awaiting gateway readiness forever. The 15-second DISCORD_GATEWAY_READY_TIMEOUT_MS poll at provider.lifecycle.ts:6778 never fires its timeout branch (no "gateway was not ready after 15000ms" error is ever logged).

Diagnostic Evidence

Discord API is healthy

REST API works — bot identity, command deployment, and gateway info all succeed:

$ curl -s https://discord.com/api/v10/gateway/bot -H "Authorization: Bot <token>"
{"url":"wss://gateway.discord.gg","session_start_limit":{"remaining":727,"total":1000,...},"shards":1}

Raw WebSocket test succeeds instantly

Using the same token and Node.js ws module from OpenClaw's own node_modules:

WS OPEN
OP: 10 t: null d: {"heartbeat_interval":41250,...}
IDENTIFY sent
OP: 0 t: READY d: {"v":10,"user":{"username":"Kit",...},"session_type":"normal",...}
OP: 0 t: GUILD_CREATE ...

HELLO → IDENTIFY → READY completes in under 2 seconds. The token, intents, and network path are all valid.

Socket buffer shows unread data

During the hang, ss -tp showed 106 bytes sitting unread in the gateway process's receive buffer — Discord sent data but the Node.js event loop never consumed it:

ESTAB 106  0  10.2.0.2:55566  162.159.137.232:https  users:(("openclaw-gatewa",...))

The connection was later silently dropped without processing.

Root Cause Analysis

Traced through the minified source in provider-CmA0Hwes.js.

The race condition (same as #53132 comment 3, but single-account)

  1. Client constructor (line 151127 in pi-embedded-CzQCqSlH.js) calls plugin.registerClient?.(this) without awaiting the returned promise
  2. SafeGatewayPlugin.registerClient (line 6247) is async — it awaits fetchDiscordGatewayInfoWithTimeout() before calling super.registerClient(client) (which calls this.connect())
  3. The constructor returns immediately. The gateway's registerClient is a fire-and-forget async call
  4. OpenClaw proceeds through command deployment, identity fetch, and enters runDiscordGatewayLifecycle()
  5. The lifecycle polls gateway.isConnected every 250ms for 15 seconds

Why the timeout never fires

The 15-second timeout at line 6778 (waitForDiscordGatewayReady) should fire and trigger a forced reconnect at line 6796–6797. But in practice, no timeout error is ever logged. Two possible explanations:

Key code paths

// provider-CmA0Hwes.js:6777-6797
if (gateway && !gateway.isConnected && !lifecycleStopping) {
    const initialReady = await waitForDiscordGatewayReady({
        gateway,
        timeoutMs: 15000,  // DISCORD_GATEWAY_READY_TIMEOUT_MS
        beforePoll: drainPendingGatewayErrors
    });
    if (initialReady === "timeout" && !lifecycleStopping) {
        // This branch is NEVER reached in practice
        runtime.error?.("discord: gateway was not ready after 15000ms; forcing a fresh reconnect");
        gateway?.disconnect();
        gateway?.connect(false);  // connect() silently no-ops if this.client is undefined
    }
}
// pi-embedded-CzQCqSlH.js:151126-151128 (Client constructor)
for (const plugin of plugins) {
    plugin.registerClient?.(this);  // NOT awaited — async registerClient is fire-and-forget
    plugin.registerRoutes?.(this);
}

Why v2026.3.13 works

v2026.3.13 uses the older deploy-commands flow (REST PUT to /applications/{id}/commands) instead of the "Carbon reconcile path". Its GatewayPlugin.registerClient appears to complete synchronously or fast enough that the gateway connects before the lifecycle check. The gatewayConnected=true consistently appears 1–2 seconds after "WebSocket connection opened".

Difference from #53132

Aspect #53132 This issue
Accounts 4 bots, non-deterministic subset hangs 1 bot, always hangs
v2026.3.24 Fixed Still broken
v2026.3.28 Not tested Still broken
Reproduction Non-deterministic (0–2 of 4 succeed) 100% deterministic

The single-account reproduction suggests the race condition is more fundamental than concurrent IDENTIFY contention. It may be environment-dependent (Node 25.8.0, Linux, specific gateway latency).

Workaround

Pinned to v2026.3.13. Gateway connects immediately and reliably on every start.

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already exists

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions