Bug type
Crash (process/app exits or hangs)
Beta release blocker
No
Summary
OpenClaw lowercases Matrix event IDs when constructing thread session keys (sessionKey = ...thread:$<lowercased_event_id>), but also creates a second session with the original mixed-case event ID. This causes two compounding failures:
-
Duplicate stuck sessions: Every Matrix thread spawns two sessions — one with the original event ID and one with the lowercased version. They deadlock each other (one reports active_embedded_run, the other no_active_work), and neither recovers.
-
Thread reply delivery failures: When constructing m.relates_to relations for thread replies, the lowercased event ID is used. Synapse rejects these with [400] Can't send relation to unknown event because Matrix event IDs are case-sensitive per the spec.
Gateway restarts temporarily clear the stuck sessions, but any new thread reply immediately fails again because the case mismatch persists.
Steps to reproduce
- Configure OpenClaw with a Matrix channel and
threadReplies: "always"
- Have a user send a message in a Matrix room thread
- Observe two session keys created for the same thread (e.g.,
thread:$lSTsAlY... and thread:$lstsaly...)
- Thread delivery fails with
MatrixError: [400] Can't send relation to unknown event
- Both sessions enter
state=processing and never recover
Expected behavior
- Matrix event IDs should be treated as case-sensitive throughout the pipeline (per the Matrix spec)
- Only one session should be created per thread, using the original event ID
- Thread replies should use the original event ID in
m.relates_to relations
Actual behavior
- Two sessions created per thread: one with original case, one lowercased
- Both sessions deadlock (diagnostic logs show alternating
active_embedded_run / no_active_work)
- Thread replies fail:
MatrixError: [400] Can't send relation to unknown event
- 443 delivery failures logged across 3 rooms over 10 days
- 490 case-sensitive unique thread event IDs collapse to 249 case-insensitive — nearly every thread is affected
Example stuck session pair from logs:
[diagnostic] stuck session: sessionId=unknown sessionKey=...thread:$lSTsAlYrc_KOmteNbX6zqQxY5ZKMlYa79A7EArC4Jrg state=processing age=144s
[diagnostic] stuck session: sessionId=main sessionKey=...thread:$lstsalyrc_komtenbx6zqqxy5zkmlya79a7earc4jrg state=processing age=134s
Workaround
Setting threadReplies: "off" in the Matrix channel config stops both the duplicate sessions and delivery failures. All messages route to the room session instead.
OpenClaw version
2026.4.29 (a448042)
Operating system
macOS 26.2 (Darwin 25.2.0, arm64)
Install method
npm global (/opt/homebrew/lib/node_modules/openclaw), Node v25.8.1, launched via launchd (ai.openclaw.gateway)
Model
anthropic/claude-opus-4-6
Provider / routing chain
openclaw -> anthropic (direct)
Additional provider/model setup details
- Matrix homeserver: Synapse (self-hosted, private network)
- Active plugins: lossless-claw
- The issue affects all Matrix rooms with threads, not specific to any room or thread
Logs, screenshots, and evidence
Stuck session diagnostic pairs (case collision visible):
2026-05-01T07:28:49.423 stuck session: sessionKey=...thread:$lSTsAlYrc_KOmteNbX6zqQxY5ZKMlYa79A7EArC4Jrg state=processing age=144s queueDepth=1
2026-05-01T07:28:49.424 stuck session: sessionKey=...thread:$lstsalyrc_komtenbx6zqqxy5zkmlya79a7earc4jrg state=processing age=134s queueDepth=0
Thread delivery failures:
2026-05-01T08:23:30.711 [delivery-recovery] Retry failed: MatrixError: [400] Can't send relation to unknown event
2026-05-01T08:38:07.270 [restart-sentinel] outbound delivery failed: MatrixError: [400] Can't send relation to unknown event
Scale: 443 "unknown event" failures logged from 2026-04-21 to 2026-05-01 across rooms !CtQaaSFRhaLfsgIJFh (196), !ZatHbfixtvTOjbQoYr (196), !dYPXGBxGPiWXDPdUnz (50).
Related issues
Partially related to #71127 (stuck sessions not auto-aborted), but this is a distinct root cause — the case normalization creates the stuck condition in the first place, and thread delivery fails independently of session state.
Bug type
Crash (process/app exits or hangs)
Beta release blocker
No
Summary
OpenClaw lowercases Matrix event IDs when constructing thread session keys (
sessionKey = ...thread:$<lowercased_event_id>), but also creates a second session with the original mixed-case event ID. This causes two compounding failures:Duplicate stuck sessions: Every Matrix thread spawns two sessions — one with the original event ID and one with the lowercased version. They deadlock each other (one reports
active_embedded_run, the otherno_active_work), and neither recovers.Thread reply delivery failures: When constructing
m.relates_torelations for thread replies, the lowercased event ID is used. Synapse rejects these with[400] Can't send relation to unknown eventbecause Matrix event IDs are case-sensitive per the spec.Gateway restarts temporarily clear the stuck sessions, but any new thread reply immediately fails again because the case mismatch persists.
Steps to reproduce
threadReplies: "always"thread:$lSTsAlY...andthread:$lstsaly...)MatrixError: [400] Can't send relation to unknown eventstate=processingand never recoverExpected behavior
m.relates_torelationsActual behavior
active_embedded_run/no_active_work)MatrixError: [400] Can't send relation to unknown eventExample stuck session pair from logs:
Workaround
Setting
threadReplies: "off"in the Matrix channel config stops both the duplicate sessions and delivery failures. All messages route to the room session instead.OpenClaw version
2026.4.29 (a448042)
Operating system
macOS 26.2 (Darwin 25.2.0, arm64)
Install method
npm global (
/opt/homebrew/lib/node_modules/openclaw), Node v25.8.1, launched via launchd (ai.openclaw.gateway)Model
anthropic/claude-opus-4-6
Provider / routing chain
openclaw -> anthropic (direct)
Additional provider/model setup details
Logs, screenshots, and evidence
Stuck session diagnostic pairs (case collision visible):
Thread delivery failures:
Scale: 443 "unknown event" failures logged from 2026-04-21 to 2026-05-01 across rooms
!CtQaaSFRhaLfsgIJFh(196),!ZatHbfixtvTOjbQoYr(196),!dYPXGBxGPiWXDPdUnz(50).Related issues
Partially related to #71127 (stuck sessions not auto-aborted), but this is a distinct root cause — the case normalization creates the stuck condition in the first place, and thread delivery fails independently of session state.