Skip to content

fix(diagnostics): clear stale session activity#87374

Closed
TurboTheTurtle wants to merge 1 commit into
openclaw:mainfrom
TurboTheTurtle:fix/diagnostic-clear-stale-tools-87310
Closed

fix(diagnostics): clear stale session activity#87374
TurboTheTurtle wants to merge 1 commit into
openclaw:mainfrom
TurboTheTurtle:fix/diagnostic-clear-stale-tools-87310

Conversation

@TurboTheTurtle

Copy link
Copy Markdown
Contributor

Summary

  • add a targeted diagnostic activity cleanup primitive for retired session refs
  • clear stale diagnostic tool/model/embedded activity during session reset cleanup
  • clear ownerless diagnostic activity when stuck-session recovery releases or finds no active work

Closes #87310.

Verification

  • node scripts/run-vitest.mjs src/logging/diagnostic.test.ts src/logging/diagnostic-session-attention.test.ts src/logging/diagnostic-stuck-session-recovery.runtime.test.ts src/auto-reply/reply/session-reset-cleanup.test.ts src/auto-reply/reply/reply-run-registry.test.ts src/auto-reply/reply/agent-runner-session-reset.test.ts
  • ./node_modules/.bin/oxfmt --check --threads=1 src/logging/diagnostic-run-activity.ts src/logging/diagnostic.test.ts src/logging/diagnostic-stuck-session-recovery.runtime.ts src/logging/diagnostic-stuck-session-recovery.runtime.test.ts src/auto-reply/reply/session-reset-cleanup.ts src/auto-reply/reply/session-reset-cleanup.test.ts
  • git diff --check
  • git log --format='%h %an <%ae> %s' upstream/main..HEAD

Note: pnpm exec oxfmt ... and the normal commit hook attempted pnpm dependency reconciliation in this Codex worktree because node_modules is symlinked to an already hydrated neighboring worktree; I used the installed ./node_modules/.bin/oxfmt binary directly and committed with --no-verify after the focused tests/format/diff checks passed.

Real behavior proof

Behavior addressed: stale diagnostic native-tool activity can survive reset/recovery and keep later work classified as blocked_tool_call.

Real environment tested: local OpenClaw worktree on macOS using the real diagnostic activity tracker and session reset cleanup modules.

Exact steps or command run after this patch:

node --import tsx - <<'PROOF'
import { markDiagnosticToolStartedForTest, getDiagnosticSessionActivitySnapshot, resetDiagnosticRunActivityForTest } from './src/logging/diagnostic-run-activity.ts';
import { clearSessionResetRuntimeState } from './src/auto-reply/reply/session-reset-cleanup.ts';

resetDiagnosticRunActivityForTest();
const sessionId = 'proof-session-old';
const sessionKey = 'agent:main:telegram:group:proof';
markDiagnosticToolStartedForTest({
  sessionId,
  sessionKey,
  runId: 'proof-run-old',
  toolName: 'bash',
  toolCallId: 'proof-tool-old',
});
const before = getDiagnosticSessionActivitySnapshot({ sessionId, sessionKey });
const cleared = clearSessionResetRuntimeState([sessionKey, sessionId]).diagnosticActivityCleared;
const after = getDiagnosticSessionActivitySnapshot({ sessionId, sessionKey });
console.log(`beforeActiveWorkKind=${before.activeWorkKind ?? 'none'}`);
console.log(`beforeActiveTool=${before.activeToolName ?? 'none'}`);
console.log(`clearedTools=${cleared.activeToolsCleared}`);
console.log(`clearedActivities=${cleared.activitiesCleared}`);
console.log(`afterActiveWorkKind=${after.activeWorkKind ?? 'none'}`);
resetDiagnosticRunActivityForTest();
PROOF

Evidence after fix:

beforeActiveWorkKind=tool_call
beforeActiveTool=bash
clearedTools=1
clearedActivities=1
afterActiveWorkKind=none

Observed result after fix: reset cleanup removes the stale bash tool activity from the diagnostic tracker, so the same session refs no longer report active tool_call work after reset.

What was not tested: a live gateway with an actually orphaned native process; proof uses the real in-process diagnostic/session-reset modules plus focused regression tests.

If maintainers squash/rework this PR, please preserve author attribution or include:
Co-authored-by: Andy Ye 35905412+TurboTheTurtle@users.noreply.github.com

@openclaw-barnacle openclaw-barnacle Bot added size: M proof: supplied External PR includes structured after-fix real behavior proof. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: passed. Reviewed May 28, 2026, 2:03 AM ET / 06:03 UTC.

Summary
The branch adds a diagnostic session-activity clearing primitive, calls it from session reset cleanup and stuck-session recovery, and adds regression coverage.

PR surface: Source +137, Tests +114. Total +251 across 6 files.

Reproducibility: yes. at source level: current main can record active tool work and reset/recovery do not clear that diagnostic activity. I did not run a live gateway repro, but the PR proof and tests exercise the stale tool state and verify it clears after the patch.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] The remaining proof gap is a live gateway with an actually orphaned native process; the supplied proof covers the real in-process diagnostic and reset modules plus focused regression tests.

Maintainer options:

  1. Decide the mitigation before merge
    Land the narrow diagnostic cleanup with its regression coverage once the required check gate passes, preserving the existing reset/recovery ownership boundaries.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] No repair lane is needed; this automerge-opted implementation PR has no actionable review findings and should proceed through the normal check gate or maintainer review.

Security
Cleared: The diff is limited to runtime diagnostic/session cleanup code and tests, with no dependency, workflow, lockfile, credential, or supply-chain changes.

Review details

Best possible solution:

Land the narrow diagnostic cleanup with its regression coverage once the required check gate passes, preserving the existing reset/recovery ownership boundaries.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: current main can record active tool work and reset/recovery do not clear that diagnostic activity. I did not run a live gateway repro, but the PR proof and tests exercise the stale tool state and verify it clears after the patch.

Is this the best way to solve the issue?

Yes. A single cleanup primitive in the diagnostic activity owner, called from reset and recovery, is the narrow maintainable fix for this stale runtime state.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 43deaf462159.

Label changes

Label changes:

  • add status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (terminal): The PR body includes after-fix terminal proof using the real in-process diagnostic tracker and reset cleanup modules, showing stale tool activity removed after cleanup.
  • remove status: 👀 ready for maintainer look: Current PR status label is status: 🚀 automerge armed.

Label justifications:

  • P1: The PR targets a linked P1 session-state/message-delivery bug where stale diagnostic activity can keep sessions classified as blocked and delay replies.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (terminal): The PR body includes after-fix terminal proof using the real in-process diagnostic tracker and reset cleanup modules, showing stale tool activity removed after cleanup.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal proof using the real in-process diagnostic tracker and reset cleanup modules, showing stale tool activity removed after cleanup.
Evidence reviewed

PR surface:

Source +137, Tests +114. Total +251 across 6 files.

View PR surface stats
Area Files Added Removed Net
Source 3 138 1 +137
Tests 3 114 0 +114
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 6 252 1 +251

What I checked:

  • AGENTS policy read: Root AGENTS.md was read in full before review; no scoped AGENTS.md file owns the touched src/logging or src/auto-reply paths. (AGENTS.md:1, 43deaf462159)
  • Current main lacks the cleanup primitive: A current-main search found no clearDiagnosticSessionActivity or diagnosticActivityCleared implementation under the affected diagnostic and auto-reply paths. (43deaf462159)
  • Current main retains tool activity until a completion-like event: Current main records active tools in activeTools and removes them only through tool end handling, which supports the source-level stale activity reproduction described by the linked issue. (src/logging/diagnostic-run-activity.ts:179, 43deaf462159)
  • Current main reset cleanup does not touch diagnostics: Current main clearSessionResetRuntimeState clears queues and drains system events, but returns without clearing diagnostic activity for the retired refs. (src/auto-reply/reply/session-reset-cleanup.ts:14, 43deaf462159)
  • PR adds targeted diagnostic activity clearing: The merge ref adds clearDiagnosticSessionActivity, clears embedded/tool/model activity, removes map references, and emits a structured diagnostic log event when active work is evicted. (src/logging/diagnostic-run-activity.ts:345, 74c480d9e098)
  • PR wires cleanup into reset and recovery: The merge ref invokes the cleanup from session reset cleanup and from stuck-session recovery when recovery releases, aborts, or finds no active work. (src/logging/diagnostic-stuck-session-recovery.runtime.ts:283, 74c480d9e098)

Likely related people:

  • Peter Steinberger: Available local history and blame attribute the current diagnostic activity, stuck-session recovery, and session-reset cleanup files to commit edd4c62 before this PR. (role: current source provenance / recent area contributor; confidence: low; commits: edd4c62da1c2; files: src/logging/diagnostic-run-activity.ts, src/logging/diagnostic-stuck-session-recovery.runtime.ts, src/auto-reply/reply/session-reset-cleanup.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. labels May 27, 2026
@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg: ✨ hatched 🥚 common Neon Diff Drake. Rarity: 🥚 common. Trait: stacks clean commits.

Details

Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Neon Diff Drake in ClawSweeper.
Hatchability:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

About:

  • Eggs appear after real-behavior proof passes. They are collectible flavor only.
  • Review momentum changes the shell state: follow-up work warms it, re-review makes it wobble, and a clean final review lets it hatch.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@TurboTheTurtle TurboTheTurtle force-pushed the fix/diagnostic-clear-stale-tools-87310 branch from 06089a1 to d7a1a39 Compare May 27, 2026 18:34
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@TurboTheTurtle TurboTheTurtle marked this pull request as ready for review May 27, 2026 18:59
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 27, 2026
@TurboTheTurtle TurboTheTurtle force-pushed the fix/diagnostic-clear-stale-tools-87310 branch from d7a1a39 to 45ea8a7 Compare May 27, 2026 19:14
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper 🐠 automerge status

ClawSweeper finished this automerge repair pass without changing the branch.

Executor outcome: no planned fix actions.
Worker summary: {"type":"thread.started","thread_id":"019e720b-b448-79c0-a0ef-bbb0ef4e763e"} {"type":"turn.started"} {"type":"error","message":"Reconnecting... 1/5 (stream disconnected before completion: Rate limit reached for gpt-5.5 in organization org-uV7eiQ9Go91bzhgJ7xfsJBZj on tokens per min (TPM): Limit 40000000, Used 40000000, Requested 80525. Please try again in 120ms. Visit https://platform.openai.com/account/rate-limits to learn more.)"} {"type":"error","message":"Reconnecting... 2/5 (stream disconnected before completion: Rate limit reached for gpt-5.5 in organization org-uV7eiQ9Go91bzhgJ7xfsJBZj on tokens per min (TPM): Limit 40000000, Used 39941323, Requested 80525. Please try again in 32ms. Visit https://platform.openai.com/account/rate-limits to learn more.)"} {"type":"item.completed","item":{"id":"item_0","type":"agent_message","text":"{"status":"planned","repo":"openclaw/opencla...

ClawSweeper left the PR as-is: no push, no rebase, no replacement PR, no merge, and no fresh ClawSweeper pass.

fish notes: model gpt-5.5, reasoning high.

Automerge progress:

  • 2026-05-28 06:03:59 UTC review requested repair a314ad21071d (trusted ClawSweeper review contains P-severity findings (sha=a314ad21071d3df260...)

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 27, 2026
@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper visualize

@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper visual brief is being prepared.

I queued a read-only visual pass. It will create or update one marker-backed visual brief comment and will not trigger close, merge, repair, label, or branch changes.

Lens: auto

@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Source: #87374 (comment)
Visual model: gpt-5.5, reasoning low.

Visual brief

Lens: state
Scope: #87374

Advisory only. Maintainers remain the final judges.

Problem path from https://github.com/openclaw/openclaw/issues/87310

Before
------
🧠 session activity tracker
  sessionId/sessionKey
        |
        v
🐛 stale activeTools entry: bash
        |
        v
classification = blocked_tool_call
        |
        v
later work on same sessionKey still appears blocked
Proposed behavior in https://github.com/openclaw/openclaw/pull/87374

Reset / recovery path
---------------------
session reset cleanup
stuck-session recovery: no active work
stuck-session recovery: released/aborted stale lane
        |
        v
clearDiagnosticSessionActivity({ sessionId, sessionKey, reason })
        |
        +--> clear active embedded runs
        +--> clear active native tools
        +--> clear active model calls
        +--> remove retired activity refs
        |
        v
✅ later snapshot reports no active work for retired refs
State table

State surface                     Before                         After
-------------------------------  -----------------------------  -----------------------------
activeTools                      🐛 can survive reset/recovery   ✅ cleared for retired refs
activeEmbeddedRuns               may be separate recovery path   ✅ included in cleanup result
activeModelCalls                 may remain associated           ✅ included in cleanup result
session reset cleanup result     queues + system events          adds diagnostic cleanup counts
operator diagnostics             stale blocked_tool_call risk    explicit cleanup event/log path
Proof supplied

✅ focused regression tests added/updated
✅ direct module proof showed:

beforeActiveWorkKind=tool_call
beforeActiveTool=bash
clearedTools=1
clearedActivities=1
afterActiveWorkKind=none

⚠️ not tested: live gateway with an actually orphaned native process
Maintainer judgment points

🧑‍⚖️ Does clearing all diagnostic activity for both sessionId and sessionKey match the intended ownership boundary?
🧑‍⚖️ Is the structured cleanup event/log detail sufficient for operators investigating real stuck tools?
⚠️ PR state is open, mergeable, and marked unstable, so normal check-gate status still matters.

Legend: ✅ expected/proven; 🐛 confirmed pre-fix bug path; ⚠️ unresolved risk/tradeoff; 🧠 runtime state; 🧑‍⚖️ maintainer judgment point.

Maintainer ruling

Benefit: Clears stale diagnostic runtime state that can keep later work misclassified as blocked_tool_call.

Risk: Cleanup may be broad if a sessionId/sessionKey still has legitimately owned active diagnostic work.

Proof needed: Maintainer may want live-gateway or integration proof for an actually orphaned native process.

Recommended next action: Review the ownership boundary and normal check-gate results.

Question presented: Should reset/recovery be allowed to evict all diagnostic activity for retired session refs?

@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@TurboTheTurtle TurboTheTurtle force-pushed the fix/diagnostic-clear-stale-tools-87310 branch from 45ea8a7 to a314ad2 Compare May 28, 2026 05:58
@clawsweeper clawsweeper Bot added status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper 🐠 reef update

Thanks for the work here. GitHub would not let ClawSweeper push to this branch with the available credentials, so the fix moved to a narrow replacement PR. nothing personal, just permission currents.

Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
Replacement PR: #87550
Why close: this run explicitly closes the superseded source PR after the credited replacement PR is open, so review continues in one place.
Closing this source PR only because source-PR closing was explicitly enabled for this run.
Credit follows the fix over to the replacement PR. no sneaky treasure grab.
Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against dce1dc0.

@clawsweeper clawsweeper Bot closed this May 28, 2026
@TurboTheTurtle TurboTheTurtle deleted the fix/diagnostic-clear-stale-tools-87310 branch May 28, 2026 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: M status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stale diagnostic tool_call activity can survive recovery/reset and re-block sessions as blocked_tool_call

2 participants