Summary
OpenClaw diagnostics can retain stale native tool-call activity for a session after stuck-session recovery, session reset, or session replacement. Future work on the same sessionKey can then be classified as blocked_tool_call even when the original tool process is no longer running and the affected session transcript has already been reset/archived.
This is narrower than the existing stuck-session umbrella issues: the suspected defect is not just that a session stalls, but that the diagnostic activity tracker can keep an orphaned activeWorkKind=tool_call / activeTool=bash entry long enough to poison later recovery decisions for the same session key.
Environment
- OpenClaw:
2026.5.20 (e510042)
- OS:
Linux 7.0.10-arch1-1 x86_64
- Node:
v24.11.1
- Gateway: systemd user service, loopback gateway
- Runtime observed: embedded Codex app-server / native tool execution
Observed behavior
A local gateway repeatedly emitted stalled-session diagnostics with this shape, redacted:
stalled session: sessionId=<redacted> sessionKey=<redacted>
state=processing
reason=blocked_tool_call
classification=blocked_tool_call
activeWorkKind=tool_call
activeTool=bash
activeToolAgeMs=<very large>
lastProgress=<old tool start>
recovery=checking
The active tool ages were on the order of many hours. The two stale tool ids found in local state resolved only to reset/trajectory archives, not live active sessions. One stale tool was a long foreground bash command that had started a dev server; another was an rg command. At investigation time, there was no matching live process for the dev-server case, and the current gateway stability view showed no active work after recovery.
Recovery later emitted an abort/drain outcome for an embedded run, for example:
stuck session recovery outcome: status=aborted action=abort_embedded_run
activeWorkKind=embedded_run
aborted=true drained=true forceCleared=false released=0
The concerning part is that stale native tool activity and embedded-run/reply-run recovery appear to be tracked by separate state paths. If the native tool never emits the matching completion event, or if recovery/reset replaces the session before the completion event is reconciled, the diagnostic tracker can continue to report tool_call as active and drive blocked_tool_call classification for future turns.
Expected behavior
Stuck-session recovery, session reset, and session replacement should clear or reconcile diagnostic active-work state for the affected sessionId/sessionKey, including native tool calls.
After a recovery aborts/drains an embedded run or a session is reset/replaced:
- stale
activeTools entries for the old session/run should not continue to classify later turns as blocked_tool_call;
blocked_tool_call should mean there is still an owned active native tool, not just an orphaned diagnostic record;
- if the original tool cannot be cancelled or observed, the diagnostic state should be explicitly evicted/quarantined with a structured recovery event.
Actual behavior
A stale activeTool=bash record can remain associated with a session key for many hours, repeatedly producing session.stalled with classification=blocked_tool_call. Recovery can abort/drain the embedded run but does not clearly guarantee that stale native tool activity for the session key is cleared.
Source-level suspect
Relevant current source paths:
src/logging/diagnostic-run-activity.ts
recordToolStarted adds native tools to activeTools.
recordToolEnded removes them.
recordRunCompleted clears active tools/model calls/embedded runs.
markDiagnosticEmbeddedRunEnded can clear run activity, but callers can opt out.
src/auto-reply/reply/reply-run-registry.ts
markReplyRunDiagnosticWorkEnded calls markDiagnosticEmbeddedRunEnded(..., clearRunActivity: false).
src/logging/diagnostic-session-attention.ts
- stale
tool_call activity is classified as blocked_tool_call.
src/logging/diagnostic.ts
isBlockedToolCallRecoveryEligible allows recovery once the blocked tool call crosses the abort threshold.
src/logging/diagnostic-stuck-session-recovery.runtime.ts
- recovery can abort active embedded work, but the cleanup contract for orphaned native tool activity is not obvious from the observed outcome.
The missing primitive may be something like a targeted diagnostic cleanup/reconciliation path, for example clearDiagnosticSessionActivity({ sessionId, sessionKey, reason }), called when recovery aborts/drains a run, when a session is reset/replaced, and when a diagnostic tool-call owner is no longer present.
Suggested fix shape
- Add a diagnostic activity cleanup primitive that removes or quarantines all active tool/model/embedded-run state for a given
sessionId and/or sessionKey.
- Call it from stuck-session recovery when
reason=blocked_tool_call recovery aborts/drains a run or determines the owner is gone.
- Call it from session reset/replacement paths after the old session is archived or superseded.
- Add regression coverage for:
- native
tool_call starts and never emits completion;
- session is reset/replaced or embedded recovery aborts/drains;
- later work on the same
sessionKey is not classified as blocked_tool_call from the old tool;
- genuine active native tools still classify as blocked until completion/abort.
- Emit a structured diagnostic event when stale tool activity is evicted, so operators can distinguish a real running tool from recovered stale state.
Related issues
This is a narrow follow-up to prior stuck-session/recovery work rather than a duplicate:
Impact
A single orphaned native tool diagnostic record can keep a session lane looking blocked long after the original command is gone. The user-visible effect is delayed or lost replies, repeated stalled-session logs, and confusing recovery outcomes that make the gateway appear to still be blocked by bash when there is no corresponding live tool process.
Summary
OpenClaw diagnostics can retain stale native tool-call activity for a session after stuck-session recovery, session reset, or session replacement. Future work on the same
sessionKeycan then be classified asblocked_tool_calleven when the original tool process is no longer running and the affected session transcript has already been reset/archived.This is narrower than the existing stuck-session umbrella issues: the suspected defect is not just that a session stalls, but that the diagnostic activity tracker can keep an orphaned
activeWorkKind=tool_call/activeTool=bashentry long enough to poison later recovery decisions for the same session key.Environment
2026.5.20 (e510042)Linux 7.0.10-arch1-1 x86_64v24.11.1Observed behavior
A local gateway repeatedly emitted stalled-session diagnostics with this shape, redacted:
The active tool ages were on the order of many hours. The two stale tool ids found in local state resolved only to reset/trajectory archives, not live active sessions. One stale tool was a long foreground
bashcommand that had started a dev server; another was anrgcommand. At investigation time, there was no matching live process for the dev-server case, and the current gateway stability view showed no active work after recovery.Recovery later emitted an abort/drain outcome for an embedded run, for example:
The concerning part is that stale native tool activity and embedded-run/reply-run recovery appear to be tracked by separate state paths. If the native tool never emits the matching completion event, or if recovery/reset replaces the session before the completion event is reconciled, the diagnostic tracker can continue to report
tool_callas active and driveblocked_tool_callclassification for future turns.Expected behavior
Stuck-session recovery, session reset, and session replacement should clear or reconcile diagnostic active-work state for the affected
sessionId/sessionKey, including native tool calls.After a recovery aborts/drains an embedded run or a session is reset/replaced:
activeToolsentries for the old session/run should not continue to classify later turns asblocked_tool_call;blocked_tool_callshould mean there is still an owned active native tool, not just an orphaned diagnostic record;Actual behavior
A stale
activeTool=bashrecord can remain associated with a session key for many hours, repeatedly producingsession.stalledwithclassification=blocked_tool_call. Recovery can abort/drain the embedded run but does not clearly guarantee that stale native tool activity for the session key is cleared.Source-level suspect
Relevant current source paths:
src/logging/diagnostic-run-activity.tsrecordToolStartedadds native tools toactiveTools.recordToolEndedremoves them.recordRunCompletedclears active tools/model calls/embedded runs.markDiagnosticEmbeddedRunEndedcan clear run activity, but callers can opt out.src/auto-reply/reply/reply-run-registry.tsmarkReplyRunDiagnosticWorkEndedcallsmarkDiagnosticEmbeddedRunEnded(..., clearRunActivity: false).src/logging/diagnostic-session-attention.tstool_callactivity is classified asblocked_tool_call.src/logging/diagnostic.tsisBlockedToolCallRecoveryEligibleallows recovery once the blocked tool call crosses the abort threshold.src/logging/diagnostic-stuck-session-recovery.runtime.tsThe missing primitive may be something like a targeted diagnostic cleanup/reconciliation path, for example
clearDiagnosticSessionActivity({ sessionId, sessionKey, reason }), called when recovery aborts/drains a run, when a session is reset/replaced, and when a diagnostic tool-call owner is no longer present.Suggested fix shape
sessionIdand/orsessionKey.reason=blocked_tool_callrecovery aborts/drains a run or determines the owner is gone.tool_callstarts and never emits completion;sessionKeyis not classified asblocked_tool_callfrom the old tool;Related issues
This is a narrow follow-up to prior stuck-session/recovery work rather than a duplicate:
model_callandtool_callrecovery gaps.blocked_tool_call/ hung tool calls contributing to gateway instability.Impact
A single orphaned native tool diagnostic record can keep a session lane looking blocked long after the original command is gone. The user-visible effect is delayed or lost replies, repeated stalled-session logs, and confusing recovery outcomes that make the gateway appear to still be blocked by
bashwhen there is no corresponding live tool process.