Skip to content

Discord can go silent after 2026.5.5 upgrade/doctor model-runtime migration despite connected bot and working send path #78609

@ramitrkar-hash

Description

@ramitrkar-hash

Summary

After upgrading a live macOS OpenClaw install to 2026.5.5 (Discord plugin 2026.5.6), Discord appeared healthy and inbound events had previously been observed, but Jarvis/main stopped producing normal Discord replies. Manual Discord send through the gateway still worked, so the Discord token/send permissions were not the root cause.

The failure appears to be a model/runtime migration + stale session/task recovery issue:

  • upgrade/doctor rewrote working openai-codex/* model refs toward openai/* and stamped some agents/sessions with Codex runtime fields
  • this install did not have a usable openai provider route for that rewritten config
  • gateway logs showed model-provider resolution errors and Codex app-server fallback errors
  • existing Discord channel sessions could remain stalled/queued, and old background tasks remained stale_running
  • externally, the user-visible symptom was simply: Discord bot is connected, but no agent reply appears

Environment

  • OpenClaw: 2026.5.5
  • Discord plugin: 2026.5.6
  • OS: macOS Darwin arm64
  • Gateway: LaunchAgent, local loopback 127.0.0.1:18789
  • Channel: Discord guild channels
  • Main bot: Jarvis/main
  • Install/config paths sanitized, but this was an npm/Homebrew global install with OpenClaw config under the user OpenClaw home

Observed symptoms

Before cleanup:

  • openclaw gateway status reported gateway running and connectivity probe OK
  • Discord accounts reported running/connected
  • lastInboundAt had advanced during the failure window
  • lastOutboundAt stayed null
  • user saw no replies in Discord
  • direct Discord app/channel send had not yet been isolated

Task audit showed three stale running tasks from earlier repair/subagent/heartbeat work:

openclaw tasks list --status running --json
count: 3

openclaw tasks audit --json classified them as:

stale_running: 3
errors: 3

Gateway logs around the failure window included errors of this shape:

CodexAppServerRpcError: failed to load configuration: Model provider `anthropic` not found
CodexAppServerRpcError: failed to load configuration: Model provider `crofai` not found
CodexAppServerRpcError: failed to load configuration: Model provider `minimax-direct` not found
Error: Codex app-server auth profile "zai:default" must belong to provider "openai-codex" or a supported alias.
FailoverError: LLM request timed out.
stalled session: sessionKey=agent:main:discord:channel:<channel-id> state=processing queueDepth=1

The important user-visible gap: none of these were surfaced as a clear Discord error/reply. The bot simply appeared silent.

What fixed this install locally

  1. Restored the config to a working openai-codex/gpt-* model route for the affected agents instead of the upgrade/doctor-rewritten openai/gpt-* + Codex runtime path.
  2. Removed stale per-session Codex runtime/harness overrides from session stores.
  3. Cancelled the three stale running tasks with openclaw tasks cancel <taskId>.
  4. Restarted the gateway.
  5. Verified gateway and Discord accounts came back healthy.
  6. Verified direct Discord outbound send path:
openclaw message send --channel discord --account main --target channel:<channel-id> --message "OpenClaw upgrade diagnostic: Jarvis outbound Discord send path is live. Testing only." --json

Result:

{
  "ok": true,
  "result": {
    "messageId": "<discord-message-id>",
    "channelId": "<channel-id>"
  }
}

After cleanup:

openclaw tasks list --status running --json
count: 0

openclaw health --json showed:

ok: true
Discord main connected: true
Discord specialists connected: true
lastError: null
restartPending: false

Expected behavior

Upgrade/doctor migration should not leave a Discord install in a state where:

  • Discord is connected
  • inbound messages are accepted or were recently accepted
  • manual Discord send path works
  • but normal agent replies silently fail or stall due to model/runtime/session state

If model/runtime migration cannot be made safely, OpenClaw should either:

  • preserve the known-good model/runtime route
  • emit a hard validation warning before restart
  • surface a clear channel-visible or status-visible error
  • provide a targeted repair that does not also rewrite working model refs

Actual behavior

The install became Discord-silent. Gateway/channel status looked broadly healthy, but the agent reply path failed/stalled. The actionable errors were only discoverable by combining gateway logs, model status, task audit, and session-state inspection.

Why this is hard to debug

Several signals point in different directions:

  • openclaw gateway status: OK
  • Discord accounts: connected
  • Discord token/send path: OK after manual send test
  • channel security: no warnings
  • doctor --fix: recommended, but in this install it was exactly the risky path because it would rewrite openai-codex/* back toward the route that broke the install
  • stale task/session state: only obvious through tasks audit and session inspection

Suggested fixes

  • Make doctor/upgrade model-runtime migration transactional and validate the post-migration provider route before writing config/session runtime overrides.
  • If openai/* provider is not usable, do not rewrite a working openai-codex/* route into it.
  • Add a targeted command to clear stale session runtime overrides without running all of doctor --fix.
  • Include tasks audit stale-running findings in doctor/status when channel replies are stalled.
  • When Discord inbound-to-agent fails before producing a reply, send a visible fallback error or at least update channel health with a clear lastReplyError.
  • Consider a diagnostic that differentiates:
    • Discord gateway connected
    • Discord outbound API send works
    • inbound message routed to agent session
    • model turn completed
    • Discord reply emitted

Impact

High for live Discord installs: the bot appears present and healthy, but users get no replies. The repair requires knowing to inspect model-runtime migration state, stale tasks, session runtime overrides, and Discord send path separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions