fix(cli-runner): write-side flush gate + orphan-tool-use invalidator#84234
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 30, 2026, 12:32 AM ET / 04:32 UTC. Summary PR surface: Source +231, Tests +788. Total +1019 across 22 files. Reproducibility: yes. source-reproducible: current main resumes Claude CLI sessions through the missing-transcript probe and lacks the PR's write-side stale-binding gate and orphan-tool invalidator. I did not establish fresh live reproduction of the latest PR head. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Keep the coordinated Claude CLI recovery direction, but land it only after the write-side probe has one bounded timing budget and latest-head proof shows both stale binding clear and orphan invalidation in a real session. Do we have a high-confidence way to reproduce the issue? Yes, source-reproducible: current main resumes Claude CLI sessions through the missing-transcript probe and lacks the PR's write-side stale-binding gate and orphan-tool invalidator. I did not establish fresh live reproduction of the latest PR head. Is this the best way to solve the issue? No, not yet. The recovery direction is appropriate, but the write-side flush gate should not retry a 250ms grace-aware helper three times, and the latest head still needs real behavior proof before merge. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 7b3104fe4c1f. Label changesLabel justifications:
Evidence reviewedPR surface: Source +231, Tests +788. Total +1019 across 22 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Sunspot Clawlet Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
martingarramon
left a comment
There was a problem hiding this comment.
Both fixes verified on pr-84234-head.
Fix 1 — isCliBindingFlushed:
- Provider guard (
isClaudeCliProvider(provider)) returnstrueunconditionally for non-claude-cli providers — those don't write to~/.claude/projects, so probing them would always fail. - Three-probe loop at
cli-runner.ts:89runs at 0 / 50 / 150 ms. - The bare
sessionIdzeroing atcli-runner.ts:574matters:bindingFlushOk === falseclears thesessionIdin the returned run result sosession-store.ts:setCliSessionIddoesn't re-persist the unflushed sid — both writes must drop together (comment at line 565 explains the dependency). isCliBindingFlushedalso runs on thesession_expiredretry success path aroundcli-runner.ts:737— that handler is pre-existing code.
Fix 2 — claudeCliSessionTranscriptHasOrphanedToolUse:
- Sidechain skip at
attempt-execution.helpers.ts:168matches the precedent atcli-session-history.claude.ts:224; the comment explains why sidechain orphans don't block the main conversation. lastAssistantToolUseIdsresets on each new assistant message, so only the final unpaired tool_use triggers invalidation — earlier completed tool rounds are not relevant."orphaned-tool-use"appears inRAW_TRANSCRIPT_RESEED_ALLOWED_REASONSin the diff, so the reseed path handles this invalidation reason.
Coordination with #81048:
The PR author calls this out: if #81048 lands first (migrating claudeCliSessionTranscriptHasContent to deterministic v4 path), the new claudeCliSessionTranscriptHasOrphanedToolUse helper should follow — otherwise two scan strategies coexist for the same JSONL probe family. If this PR lands first, #81048 should still decide whether to migrate the orphan scan as explicit follow-up work. Worth coordinating landing order with @benjamin1492.
CI: Verified via gh pr view --json statusCheckRollup — 0 failures.
|
Thanks for the thorough verification, Martin. Good to see the provider guard and the dual-write zeroing confirmed independently — the |
a6df1dc to
afdbdd8
Compare
153ce5b to
7e281c9
Compare
…flush When a claude-cli turn produces a session id but the underlying claude subprocess fails to flush an assistant-role record to its ~/.claude/projects/<cwd>/<sid>.jsonl transcript (e.g. mid-turn kill from a concurrent fingerprint-mismatched turn, supervisor restart, internal failure), buildCliRunResult was still persisting that session id into cliSessionBinding. The next turn ran claudeCliSessionTranscriptHasContent, didn't find the file, logged 'cli session reset: reason=missing-transcript', and started a brand-new claude session with empty memory. End-user symptom: agent forgets prior conversation between turns. Gate the cliSessionBinding spread on the same predicate the next-turn invalidator uses, evaluated at write time. Also clear agentMeta.sessionId in the same case so the session-store fallback at command/session-store.ts (which reads agentMeta.sessionId via setCliSessionId when the binding is absent) doesn't re-persist the unflushed sid through a different field path. The fallback is what makes the binding-only gate insufficient on its own; both writes must drop together. The gate only fires for claude-cli providers — other CLI providers don't write to ~/.claude/projects, so probing them would always return false and incorrectly strip valid binding metadata. isCliBindingFlushed now takes the provider id and returns true unconditionally for non-claude-cli sessions. A bounded retry (0 / 50 / 150 ms) tolerates the brief gap between claude-cli's stdio close and the OS making the JSONL line visible to readers (cooperative fsync semantics on APFS, but not guaranteed under stress). The transcript-probe is exposed as an injectable dep (setCliRunnerTestDeps / restoreCliRunnerTestDeps) mirroring the existing pattern in src/agents/cli-runner/prepare.ts so isCliBindingFlushed is testable without touching ~/.claude/projects. AI-assisted: yes. Tooling: Claude Opus + claude-cli. Codex review caught the fallback path and the missing provider gate before this hit upstream. Real-Behavior-Proof: dist-side patch on M5 gateway; branch-build follow-up pending — see PR body.
…-tool
A claude-cli session whose JSONL transcript ends with an assistant
`tool_use` content block that was never answered by a `tool_result` user
message cannot resume — claude-cli will sit waiting for the missing
`tool_result`, hit its no-output watchdog, and the runtime kills it
with `reason=abort`. The dispatcher then sees an empty payload and emits
NO_REPLY, which to the user looks like the agent silently ignored their
message — same end-user symptom as the binding-flush amnesia bug, but a
different root cause.
The orphan can be left behind when:
- Gateway restarts mid-tool (brew upgrade, manual kickstart, OOM,
crash) — claude was waiting on a tool result that never arrived.
- `claude-live-session.ts` no-output watchdog fires while a tool is
actively running and OC kills the subprocess.
- The tool itself crashed or hung past its own deadline.
In all cases the resumed session is dead until the binding gets cleared,
because every subsequent resume hits the same trailing tool_use and the
same kill cycle. Observed in production on a personal OpenClaw gateway
(3d-engineer agent, 50-message-deep transcript ending in a Bash
`tool_use`; every Telegram message after the orphan landed silently
aborted at the 180s no-output mark).
Add `claudeCliSessionTranscriptHasOrphanedToolUse` to the helpers that
walks the JSONL, finds the last assistant message, and returns true if
any of its `tool_use` ids has no matching `tool_result` later in the
file. Wire into `prepareCliRunContext` as a second invalidator gate
alongside `missing-transcript`. The new `invalidatedReason:
"orphaned-tool-use"` follows the same path as missing-transcript: the
binding is dropped, this turn starts a fresh session, and the prior
context is reseeded into the new session via `RAW_TRANSCRIPT_RESEED`.
Detection only considers TRAILING orphans — an unanswered tool_use
deeper in history is inert because a later assistant message already
moved past it. Only the most recent assistant message's tool_use ids
matter for forward progress.
Probe runs only for claude-cli providers and only when the transcript-
content gate already passed, so we add no I/O on already-invalidated
sessions and no behavior change for non-claude providers.
AI-assisted: yes. Tooling: Claude Opus + claude-cli.
7e281c9 to
52c2b34
Compare
|
Landed via rebase onto main.
Thanks @adele-with-a-b! |
fix(cli-runner): write-side flush gate + orphan-tool-use invalidator
Summary
Two narrow fixes that together close the most common Claude-CLI session-recovery context-loss failures on the write side of the binding lifecycle. Both ship as a unit because the orphan-tool-use invalidator depends on the binding-flush gate's
setCliRunnerTestDepstest seam to test cleanly.Partial fix for #77974 (write-side; the read-side is in #81048 by @benjamin1492).
This is a split-out from PR #81821 (still open as draft) that contains only the two solidly-proven commits. The reseed and diagnostic commits in #81821 are dropped from this PR and will be pursued separately.
1. Binding-flush gate (
6952778fc6/ original80cad2240a)When a claude-cli turn produces a session id but the underlying claude subprocess fails to flush an assistant-role record to its
~/.claude/projects/<cwd>/<sid>.jsonltranscript (mid-turn kill from a concurrent fingerprint-mismatched turn, supervisor restart, internal failure),buildCliRunResultwas still persisting that session id intocliSessionBinding. The next turn ranclaudeCliSessionTranscriptHasContent, didn't find the file, loggedcli session reset: reason=missing-transcript, and started a brand-new claude session with empty memory.End-user symptom: agent forgets prior conversation between turns.
The fix adds an
isCliBindingFlushed(sessionId, provider)predicate that probes the transcript with a bounded retry (0 / 50 / 150 ms). When the gate fails:cliSessionBindingis dropped from the result (so next-turn binding lookup returns nothing rather than a ghost id).agentMeta.sessionIdis also cleared in the same case. This is load-bearing: the session-store fallback atcommand/session-store.tsreadsagentMeta.sessionIdviasetCliSessionIdwhen the binding is absent, so a binding-only gate would not fully remove the unflushed sid — both writes must drop together.The gate fires only for
claude-cliproviders; other CLIs don't write to~/.claude/projectsso probing them would always return false and incorrectly strip valid binding metadata.isCliBindingFlushedtakes the provider id and returnstrueunconditionally for non-claude-cli sessions.The transcript-probe is exposed as an injectable dep (
setCliRunnerTestDeps/restoreCliRunnerTestDeps) mirroring the existing pattern insrc/agents/cli-runner/prepare.ts, soisCliBindingFlushedis testable without touching~/.claude/projects.2. Orphan-tool-use invalidator (
1803c146a7/ originaldfa2617117)When a claude-cli session crashes mid-tool (gateway OOM, kickstart, manual kill), the project JSONL transcript ends with an assistant
tool_useblock that has no matching usertool_result. The next turn's--resumekeeps that orphan in the resumed context; claude-cli then trips[Request interrupted by user]and the agent emits no reply at all (NO_REPLY-on-resume).The fix adds
claudeCliSessionTranscriptHasOrphanedToolUse({sessionId})insrc/agents/command/attempt-execution.helpers.ts. The helper walks the project JSONL, skips sidechain entries, tracks the latest assistanttool_useids, and returnstrueif the transcript ends with atool_usethat has no matchingtool_result. When that fires,prepareCliRunContextsetsreusableCliSession = { invalidatedReason: "orphaned-tool-use" }so the next turn starts a fresh session.The new invalidation reason is added to the
CliInvalidatedReasonunion and to theloadCliSessionReseedMessagesallowed-reasons set so the reseed path can pull the prior transcript across the invalidation boundary.Coordination with PR #81048
PR #81048 (also open, by @benjamin1492) addresses the read-side of the same family of issues — it tightens
claudeCliSessionTranscriptHasContentwith a deterministic<homeDir>/.claude/projects/<encoded(workspaceDir)>/<sessionId>.jsonlpath resolution and a single-step grace window, replacing the v3 "scan all subdirs" strategy.The two PRs touch the same helper file (
attempt-execution.helpers.ts) but the changes don't textually conflict. They DO, however, represent two different strategies for the same probe family:claudeCliSessionTranscriptHasOrphanedToolUsehelper, mirroring the shape of the existingclaudeCliSessionTranscriptHasContentat the time of branching.claudeCliSessionTranscriptHasContentto requireworkspaceDirand use the deterministic-path strategy.If #81048 lands first, this PR's
claudeCliSessionTranscriptHasOrphanedToolUseshould migrate to the same v4 deterministic-path strategy in a follow-up (call sites atprepare.tswould also need to threadworkspaceDirthrough). If this PR lands first, #81048's signature change toclaudeCliSessionTranscriptHasContentwould land cleanly on top.Happy to adapt to whichever order maintainers prefer.
Out of scope
Two commits from the original #81821 are intentionally NOT in this PR; they'll be pursued separately:
3a6de1b62ain fix(agents/cli-runner): gate cliSessionBinding persist on transcript flush #81821) — covers the user-visible-amnesia tail of the orphan-tool-use case by reading the invalidated transcript throughbuildClaudeCliFallbackContextPreludeand prepending it as a retry prelude. Has unit-test coverage but I have not verified it end-to-end in production after cleaning up my local symlink-heavy test environment narrowed the testable scenarios. Will refile when real-runtime evidence is available.8b8e82584din fix(agents/cli-runner): gate cliSessionBinding persist on transcript flush #81821) — logs which fingerprint key triggered a live-session restart. Did its job for me locally; arguable whether it's general-utility production telemetry. Easier to land separately if maintainers want it.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Test plan
pnpm install pnpm build node scripts/run-vitest.mjs run \ src/agents/cli-runner.binding-flush.test.ts \ src/agents/cli-runner/prepare.test.ts \ src/agents/command/attempt-execution.test.ts pnpm tsgo:core pnpm tsgo:core:test pnpm exec oxfmt --check --threads=1 \ src/agents/cli-runner.ts \ src/agents/cli-runner.binding-flush.test.ts \ src/agents/cli-runner/prepare.ts \ src/agents/cli-runner/session-history.ts \ src/agents/cli-runner/types.ts \ src/agents/command/attempt-execution.helpers.ts \ src/agents/command/attempt-execution.test.ts node scripts/run-oxlint.mjs \ src/agents/cli-runner.ts \ src/agents/cli-runner.binding-flush.test.ts \ src/agents/cli-runner/prepare.ts \ src/agents/cli-runner/session-history.ts \ src/agents/cli-runner/types.ts \ src/agents/command/attempt-execution.helpers.ts \ src/agents/command/attempt-execution.test.tsAll green locally on
fix/cli-runner/binding-flush-orphan-toolrebased onupstream/mainat78d226bb3b. 154/154 tests pass across the touched surface.Real behavior proof
External-contributor real-environment proof, captured on macOS / Node 22 /
2026.5.12brew-install of OpenClaw with the Telegram channel + Claude CLI backend.Behavior addressed:
cliSessionBinding.sessionIdon the session entry. The next turn finds the JSONL missing on disk, logscli session reset: reason=missing-transcript, and starts a fresh session with no prior context. Symptom: agent forgets prior conversation between turns on a working setup.tool_useblock that has no matching usertool_result. Resuming via--resumereturns[Request interrupted by user]from claude-cli; OpenClaw emits no reply (NO_REPLY-on-resume). Symptom: agent stops responding after gateway/Claude crash mid-turn until the session is manually reset.Real environment tested: OpenClaw
2026.5.12brew-installed on M5 Max / macOS 15.x / Node 22; Anthropic claude-cli backend over Telegram. Both fixes have been running on my M5 gateway as adist-side patch for several days; the binding-flush gate was specifically tested by inducing the race (concurrent fingerprint-mismatched turns) and confirming the ghost binding no longer persists.Exact steps or command run after this patch:
git checkout fix/cli-runner/binding-flush-orphan-toolpnpm installpnpm build— cleannode scripts/run-vitest.mjs run src/agents/cli-runner.binding-flush.test.ts src/agents/cli-runner/prepare.test.ts src/agents/command/attempt-execution.test.ts— 154 tests passpnpm tsgo:coreandpnpm tsgo:core:test— both cleanpnpm exec oxfmt --check --threads=1 <touched files>— cleannode scripts/run-oxlint.mjs <touched .ts files>— 0 warnings, 0 errorsEvidence after fix:
Bundle proof that both predicates flow through:
Observed result after fix:
cliSessionBinding.sessionIdANDagentMeta.sessionIdare cleared in the result. The next turn sees no binding and starts a fresh session with the prior OpenClaw transcript reseeded, instead of resuming a missing claude-cli session and losing context.tool_use, the next turn'sprepareCliRunContextsetsinvalidatedReason: "orphaned-tool-use", forcing a fresh claude-cli session and avoiding the[Request interrupted by user]failure.What was not tested: end-to-end live reproduction of an orphan-tool-use scenario with the freshly-built bundle. The orphan path was reproduced and verified on my M5 gateway via the
dist-side patch (the gateway was killed mid-Bashtool; the next turn correctly invalidated and started fresh). The unit-level test atsrc/agents/command/attempt-execution.test.tscovers the helper's matching logic; the integration-level test atsrc/agents/cli-runner/prepare.test.tscovers the prepareCliRunContext branch that consumes it.Reviewer Pass before push
Ran our own reviewer agent on this diff before pushing (same posture as PR #80046's rebase). Findings addressed in this commit:
cli-runner.binding-flush.test.ts:54with avi.useFakeTimers()version that asserts the scheduled delays (0 + 50 + 150 ms) and the probe-call count, instead of measuring real elapsed time.Closes #77974→Partial fix for #77974— the binding-flush gate is the write-side fix; the read-side is in fix(command): retry claude-cli transcript probe to close flush race #81048. Closing the issue here would prematurely close it before the read-side lands.76ce72cbe5) now causesexecuteCliAttemptto throwFailoverError(reason: "empty_response")before the binding-flush gate runs. That's correct (no point flushing a binding for an empty response), but worth knowing — the gate's reachability shrank slightly post-cherry-pick from where the original commit was authored.Implementation notes
isCliBindingFlushedreturns true unconditionally for non-claude-cli providers; the per-provider gate is intentional rather than generalized so that future codex-cli / gemini-cli probes (with their own transcript layouts) can be added explicitly when those layouts are known.setCliRunnerTestDepsif a future environment needs different bounds.loadCliSessionReseedMessagesallowed-reason set entry ("orphaned-tool-use") keeps the reseed path consistent with the new invalidation reason — without it, the orphan-invalidated session would lose access to its prior OpenClaw transcript when reseeding.