Summary
I'm seeing Discord replies fail with:
Discord inbound worker timed out.
In the service logs this appears as:
2026-04-26T10:04:19+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1497687704866000906, messageId=1497772780744347771)
2026-04-26T12:06:09+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1489624037758861415, messageId=1497803443287625800)
The gateway process itself remains active, but Discord inbound work appears to get stuck long enough to hit the 30 minute default timeout. Around the same periods I also see gateway/local worker health symptoms like websocket handshake timeouts, subagent announce timeouts, session locks, qmd timeouts, and context overflow recovery.
What I expected
A Discord inbound message should either complete, fail with the underlying agent/model error, or surface enough diagnostic context to tell which internal worker/session is stuck.
If this timeout is expected behavior, it would help if the user-facing Discord reply included the agent/session/run id or a clearer reason than just Discord inbound worker timed out.
What actually happens
The Discord channel gets a generic timeout reply after 1800 seconds.
Nearby logs show related pressure/errors:
[ws] handshake timeout ... peer=127.0.0.1:...->127.0.0.1:18789
Subagent announce failed: Error: gateway timeout after 10000ms
[session-write-lock] releasing lock held for 66843ms / 71211ms / 97029ms
[memory] qmd embed failed ... timed out after 600000ms
[agent/embedded] [context-overflow-diag] ... Context overflow: estimated context size exceeds safe threshold during tool loop.
There were also Discord gateway reconnect/session churn events in the same general window:
[discord] gateway error: Error: socket hang up
[discord] gateway: Gateway websocket closed: 1006
[discord] gateway: Gateway reconnect scheduled ... (invalid-session, resume=false)
System/config context
- OpenClaw:
2026.4.24 (46d2415)
- OS: LMDE 6 / Debian kernel
6.1.0-44-amd64
- Node:
v24.15.0
- npm:
11.12.1
- Codex CLI:
0.120.0
- Running as user systemd service:
openclaw-gateway.service
- Gateway mode: local loopback, port
18789
- Discord enabled with multiple accounts/bots
- Discord
healthMonitor.enabled=false
- Discord
threadBindings.enabled=true
- Discord
threadBindings.spawnSubagentSessions=true
- Discord
threadBindings.spawnAcpSessions=true
- Agent defaults:
contextTokens=120000
- primary model
openai-codex/gpt-5.5
timeoutSeconds=3600
subagents.maxConcurrent=5
subagents.maxChildrenPerAgent=5
subagents.announceTimeoutMs=300000
- compaction reserve floor
24000
- Discord inbound worker timeout appears to be using the default
1800000ms / 1800s; I did not find an explicit per-account channels.discord.accounts.<id>.inboundWorker.runTimeoutMs override in my config.
At the time of inspection the service was still running but using substantial resources:
Tasks: 80
Memory: 6.5G
CPU: 3d+ accumulated
Why I think this might be an OpenClaw issue
The timeout itself is documented/configured, but in practice it seems to be acting as the only visible failure mode for several possible internal stalls:
- queued Discord inbound run stuck behind session locks
- qmd embed/search/update timeouts
- subagent announce timeouts
- local gateway websocket handshake timeouts
- context overflow recovery taking a long time or looping
It would be useful if the inbound worker timeout carried the underlying run/session state into the Discord error reply and logs, or if the queue could cancel/unblock the stuck worker more cleanly before the full 1800s elapses.
Possible improvement
When the inbound worker times out, include something like:
- account id / agent id
- session key
- run id
- queue depth
- whether the agent was waiting on model, tool, qmd, session lock, or gateway connect
- whether the timeout came from default
inboundWorker.runTimeoutMs or an explicit account override
That would make this much easier to diagnose from Discord without digging through journal logs.
Summary
I'm seeing Discord replies fail with:
In the service logs this appears as:
The gateway process itself remains active, but Discord inbound work appears to get stuck long enough to hit the 30 minute default timeout. Around the same periods I also see gateway/local worker health symptoms like websocket handshake timeouts, subagent announce timeouts, session locks, qmd timeouts, and context overflow recovery.
What I expected
A Discord inbound message should either complete, fail with the underlying agent/model error, or surface enough diagnostic context to tell which internal worker/session is stuck.
If this timeout is expected behavior, it would help if the user-facing Discord reply included the agent/session/run id or a clearer reason than just
Discord inbound worker timed out.What actually happens
The Discord channel gets a generic timeout reply after 1800 seconds.
Nearby logs show related pressure/errors:
There were also Discord gateway reconnect/session churn events in the same general window:
System/config context
2026.4.24 (46d2415)6.1.0-44-amd64v24.15.011.12.10.120.0openclaw-gateway.service18789healthMonitor.enabled=falsethreadBindings.enabled=truethreadBindings.spawnSubagentSessions=truethreadBindings.spawnAcpSessions=truecontextTokens=120000openai-codex/gpt-5.5timeoutSeconds=3600subagents.maxConcurrent=5subagents.maxChildrenPerAgent=5subagents.announceTimeoutMs=300000240001800000ms/1800s; I did not find an explicit per-accountchannels.discord.accounts.<id>.inboundWorker.runTimeoutMsoverride in my config.At the time of inspection the service was still running but using substantial resources:
Why I think this might be an OpenClaw issue
The timeout itself is documented/configured, but in practice it seems to be acting as the only visible failure mode for several possible internal stalls:
It would be useful if the inbound worker timeout carried the underlying run/session state into the Discord error reply and logs, or if the queue could cancel/unblock the stuck worker more cleanly before the full 1800s elapses.
Possible improvement
When the inbound worker times out, include something like:
inboundWorker.runTimeoutMsor an explicit account overrideThat would make this much easier to diagnose from Discord without digging through journal logs.