Summary
A Discord-routed agent turn can complete useful side effects, then remain stuck in processing until the CLI timeout fires. OpenClaw then surfaces the generic user-facing failure message even though the work may already have been posted/applied. Later routing can still appear to target the wedged session, which makes verifier/worker state ambiguous and can trigger redundant follow-up dispatches.
This may be related to the claude-cli regression tracked in #72434, but the problematic behavior here is the session-health/routing outcome after a terminal timeout.
Environment
- OpenClaw:
2026.4.24 from npm stable
- Channel: Discord
- Agent model:
anthropic/claude-opus-4-7 via the Claude CLI-backed path
- OS: macOS
Observed behavior
- A Discord agent turn starts and performs useful side effects. In the observed case, a review/verdict message and a local state update were successfully recorded.
- The same session remains in
processing and is reported as stuck for several minutes.
- At the 900s CLI timeout, OpenClaw terminates the candidate and posts/surfaces the generic failure text:
Something went wrong while processing your request. Please try again, or use /new to start a fresh session.
- Follow-up routing is ambiguous: the agent looks like it can still receive work, but the session is effectively dead/wedged. A later verification had to be recovered from a separate route, while a redundant follow-up dispatch was created because the original verifier path looked silent.
Sanitized log shape
[diagnostic] lane task error: lane=session:agent:<agent>:discord:channel:<redacted>:active-memory:<redacted> durationMs=<small> error="Error: Requested agent harness "claude-cli" is not registered and PI fallback is disabled."
[diagnostic] stuck session: sessionId=unknown sessionKey=agent:<agent>:discord:channel:<redacted> state=processing age=<minutes>s queueDepth=1
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/claude-opus-4-7 candidate=anthropic/claude-opus-4-7 reason=timeout next=none detail=CLI exceeded timeout (900s) and was terminated.
Embedded agent failed before reply: CLI exceeded timeout (900s) and was terminated.
Expected behavior
After a fatal timeout or pre-reply embedded-agent failure, OpenClaw should make the session health unambiguous. Any of these would be safer than silently continuing to route to the wedged session:
- mark the session failed/dead and require
/new,
- automatically reset/roll the session before accepting more work,
- route the next turn to a fresh session,
- or surface a clear
session timed out; previous side effects may have completed state instead of only the generic failure message.
If side effects completed before the final timeout, the user-facing state should distinguish partial-success/late-failure from total failure.
Impact
- Users cannot tell whether the work failed or succeeded.
- Verifier/worker workflows can create duplicate dispatches because the original route appears silent.
- A watchdog sees
processing for many minutes but the user-facing chat only gets a generic failure at the end.
- The recovery path becomes manual: inspect logs/state, identify whether side effects completed, and route a fresh verifier/session by hand.
Redaction note
This report intentionally redacts Discord IDs, session IDs, dispatch IDs, local paths, project names, internal agent nicknames, and exact local timestamps. The included log snippets preserve only the error shape needed to diagnose the runtime behavior.
Summary
A Discord-routed agent turn can complete useful side effects, then remain stuck in
processinguntil the CLI timeout fires. OpenClaw then surfaces the generic user-facing failure message even though the work may already have been posted/applied. Later routing can still appear to target the wedged session, which makes verifier/worker state ambiguous and can trigger redundant follow-up dispatches.This may be related to the
claude-cliregression tracked in #72434, but the problematic behavior here is the session-health/routing outcome after a terminal timeout.Environment
2026.4.24from npm stableanthropic/claude-opus-4-7via the Claude CLI-backed pathObserved behavior
processingand is reported as stuck for several minutes.Sanitized log shape
Expected behavior
After a fatal timeout or pre-reply embedded-agent failure, OpenClaw should make the session health unambiguous. Any of these would be safer than silently continuing to route to the wedged session:
/new,session timed out; previous side effects may have completedstate instead of only the generic failure message.If side effects completed before the final timeout, the user-facing state should distinguish partial-success/late-failure from total failure.
Impact
processingfor many minutes but the user-facing chat only gets a generic failure at the end.Redaction note
This report intentionally redacts Discord IDs, session IDs, dispatch IDs, local paths, project names, internal agent nicknames, and exact local timestamps. The included log snippets preserve only the error shape needed to diagnose the runtime behavior.