Summary
A failed delivery-recovery entry with a lowercased Matrix room ID causes the Matrix sync loop to crash permanently on gateway restart. The gateway stays running but Matrix is completely dead — no inbound messages are processed. The poisoned delivery persists across restarts, causing a crash loop.
Related: #19278 (same root cause — room ID case normalization)
Root cause
Session keys normalize Matrix room IDs to lowercase (e.g. !IEjZDNPucuFvKLrAQC:server → !iejzdnpucufvklraqc:server). During normal operation, the Matrix SDK sends messages using the correct-case room ID from sync state, so this is invisible. However, when a message is queued for delivery recovery, the lowercased room ID from the session key is stored as the "to" target. On retry, Synapse returns 403 because the lowercased ID does not match.
Steps to reproduce
- Configure Matrix channel with a room that has mixed-case characters in its room ID
- Trigger a gateway crash while a message send is in-flight (e.g. via Anthropic API overload → websocket reconnect failure)
- A delivery entry is persisted with the lowercased room ID
- Restart the gateway
delivery-recovery retries the send with the lowercased room ID
- Synapse returns 403
M_FORBIDDEN: User @bot:server not in room !lowercase...
Observed behavior
Queue 'message' giving up on event ~!iejzdnpucufvklraqc:matrix.lucidpacket.com:m1774818664725.0
[delivery-recovery] Retry failed for delivery 9b8f01a2: MatrixError: [403] User @dax:matrix.lucidpacket.com not in room !iejzdnpucufvklraqc:matrix.lucidpacket.com
[MatrixClient.sync] Sync no longer running: exiting.
[MatrixClient] FetchHttpApi: <-- GET .../sync [76ms AbortError: This operation was aborted]
After this, the gateway is running but Matrix sync is permanently dead. The delivery entry persists in delivery-queue/, so every subsequent restart triggers the same crash.
Expected behavior
- Room IDs should preserve original case in delivery entries (or be mapped back to the canonical form before retry)
- A failed delivery should not crash the Matrix sync loop — it should be moved to
failed/ and sync should continue
- Delivery entries should expire after N retries instead of retrying indefinitely across restarts
Workaround
Manually move the poisoned delivery JSON from ~/.openclaw/delivery-queue/ to delivery-queue/failed/ and restart the gateway.
Environment
- OpenClaw: 2026.3.24
- Homeserver: Synapse (self-hosted)
- OS: Manjaro Linux (x64)
- Install method: npm
Summary
A failed delivery-recovery entry with a lowercased Matrix room ID causes the Matrix sync loop to crash permanently on gateway restart. The gateway stays running but Matrix is completely dead — no inbound messages are processed. The poisoned delivery persists across restarts, causing a crash loop.
Related: #19278 (same root cause — room ID case normalization)
Root cause
Session keys normalize Matrix room IDs to lowercase (e.g.
!IEjZDNPucuFvKLrAQC:server→!iejzdnpucufvklraqc:server). During normal operation, the Matrix SDK sends messages using the correct-case room ID from sync state, so this is invisible. However, when a message is queued for delivery recovery, the lowercased room ID from the session key is stored as the"to"target. On retry, Synapse returns 403 because the lowercased ID does not match.Steps to reproduce
delivery-recoveryretries the send with the lowercased room IDM_FORBIDDEN: User @bot:server not in room !lowercase...Observed behavior
After this, the gateway is running but Matrix sync is permanently dead. The delivery entry persists in
delivery-queue/, so every subsequent restart triggers the same crash.Expected behavior
failed/and sync should continueWorkaround
Manually move the poisoned delivery JSON from
~/.openclaw/delivery-queue/todelivery-queue/failed/and restart the gateway.Environment