fix(cli-runner): log discarded prompt metadata on aborted Claude CLI turns [AI-assisted]#85339
fix(cli-runner): log discarded prompt metadata on aborted Claude CLI turns [AI-assisted]#85339orihakimi wants to merge 3 commits into
Conversation
…turns [AI-assisted] When the Claude CLI live session aborts mid-turn (no-output timeout, total turn timeout, output limit exceeded, subprocess exit, or external abort), the user prompt that triggered the turn was discarded with no trace. The prompt is written to the CLI subprocess via stdin in `writeTurnInput` (`claude-live-session.ts:735` on the original line numbers) and never persisted gateway-side — chat.send's `emitSessionTranscriptUpdate` is a notification event, not a writer; the JSONL transcript is owned by the CLI subprocess, which only writes user turns after it processes them. `failTurn` warns with `error=<errorKind>` and `closeLiveSession` with `reason=abort`, but neither line carries a correlator linking back to the matching `cli exec: ... promptChars=…` entry. An operator reading `gateway.log` after a real abort cannot identify which user input was discarded; the user just sees their message go unanswered and a later turn starts fresh with no continuity. Track `promptChars` and an 8-char SHA-256 prefix on the `ClaudeLiveTurn` when it's created, and emit both in the `failTurn` warn line. The hash lets an operator confirm whether a recurrence is the same lost prompt or a different one; the char count cross-references the existing `promptChars=…` field on `cli exec`. The change is scoped to `src/agents/cli-runner/claude-live-session.ts` and does not change any behavior outside the failure log line. Adds a focused unit test on the new `summarizePromptForLog` helper covering: length reporting, the 8-hex-char SHA-256 prefix shape, stable output for equal inputs, and divergence for different inputs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Codex review: needs maintainer review before merge. Latest ClawSweeper review: 2026-05-23 09:03 UTC / May 23, 2026, 5:03 AM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. by source inspection and supplied redacted logs: current main writes the prompt to Claude CLI stdin while failTurn lacks a turn-level correlator back to cli exec. I did not run a live abort in this read-only review. PR rating What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Next step before merge Security Review detailsBest possible solution: Land this bounded diagnostic logging change after normal maintainer merge checks; keep durable gateway-side prompt outbox or user-visible recovery as a separate architectural follow-up. Do we have a high-confidence way to reproduce the issue? Yes, by source inspection and supplied redacted logs: current main writes the prompt to Claude CLI stdin while failTurn lacks a turn-level correlator back to cli exec. I did not run a live abort in this read-only review. Is this the best way to solve the issue? Yes. A shared generated turnId plus a single basePrompt-derived summary passed into the live session is the narrow maintainable diagnostics fix without introducing prompt persistence or recovery semantics. Label changes:
Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 492d656d7485. |
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Cosmic Shellbean Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
… and basePrompt source-of-truth [AI-assisted]
ClawSweeper rated the prior change 🦐 gold shrimp because the abort warn
line's promptChars came from the post-transform live-session prompt,
while the originating `cli exec: ...` log line uses basePrompt.length.
Bootstrap warnings, plugin text replacements, and image-marker injection
happen between the two sites, so the two values diverged — defeating the
correlation that the patch claimed to provide.
Thread a shared 8-hex turnId through both log sites, generated once in
`executePreparedCliRun`. Compute the prompt summary ONCE in execute.ts
from basePrompt — the same value cli exec already logs the length of —
and pass {turnId, promptChars, promptHash} into runClaudeLiveSessionTurn
as a ClaudeLiveTurnCorrelation. The live session stores the correlation
on the turn and emits it verbatim on failure; it no longer hashes the
prompt locally, eliminating the divergence by construction.
Extract `buildClaudeLiveTurnFailedLogLine` as a pure helper (mirroring
`buildCliExecLogLine`) so the warn-line contract is unit-testable
without spinning up the live runtime.
Move `summarizePromptForLog` and add `generateCliTurnId` to execute.ts
(source-of-truth lives where the correlator is generated).
Regression test asserts that a synthetic basePrompt and a deliberately
different post-transform prompt still produce a single, shared turnId
and matching promptChars across the cli exec and abort-failure lines.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
The previous run timed out at 25m on the network-ssrf-boundary query while the other five CodeQL queries in the same workflow passed in ~1m. This patch adds no network egress surface (no new fetch, URL construction, or HTTP clients) — almost certainly a CodeQL infrastructure flake.
Summary
runClaudeLiveSessionTurninsrc/agents/cli-runner/claude-live-session.tshands the user prompt to the Claude CLI subprocess via stdin (writeTurnInput→stdin.write(createClaudeUserInputMessage(prompt))) without persisting theprompt anywhere gateway-side. The JSONL transcript at
agents/<agent>/sessions/<id>.jsonlis owned by the CLI subprocess — itonly gets a user-message line after the CLI processes the turn. The
emitSessionTranscriptUpdatecall inchat.sendis a pub-sub notification(
src/sessions/transcript-events.ts), not a writer.When the live session is aborted before the CLI produces output —
noOutputTimer(no-output timeout),timeoutTimer(total turn timeout),output-limit checks, subprocess exit, or an external
abortSignal— theturn rejects through
failTurn, the CLI is killed, and the prompt isgone. Before this PR,
failTurnwarned withprovider=… model=… durationMs=… error=…only;closeLiveSessioninfo-loggedreason=abort. Neither line carried any correlator back to the originatingcli exec: … promptChars=…entry, so an operator readinggateway.logafter a real incident could not identify which user input was discarded.
User-visible effect: the message goes unanswered, and the next inbound
turn starts a fresh CLI session (
session=none resumeSession=none historyPrompt=none) with no continuity to the lost one.Fix
Thread a shared
turnId(8 hex characters, generated once perexecutePreparedCliRun) through both log sites:cli exec: provider=… model=… **turnId=<8hex>** promptChars=… …claude live session turn failed: provider=… model=… durationMs=… error=… **turnId=<8hex>** promptChars=… promptHash=…Compute the prompt summary (
{chars, hash}) once inexecute.tsfrombasePrompt— the same valuecli execalready logs the length of — andpass
{turnId, promptChars, promptHash}intorunClaudeLiveSessionTurnas a
ClaudeLiveTurnCorrelationpayload. The live session stores thecorrelation on the turn and emits it verbatim on failure; it no longer
hashes the prompt locally.
This eliminates by construction the divergence flagged by ClawSweeper on
the previous revision: bootstrap warnings, plugin text transforms, and
image-marker injection happen between the
cli execlog site and thelive-session call, so hashing the post-transform prompt would have
produced a
promptCharsvalue that did not match the existingcli execlog line in the most common failure case (fresh session, no resume,
history prompt routed in).
Why turnId in addition to promptChars
turnIdis the canonical correlator: a single value, immune to anyfuture transform, that lets an operator grep
gateway.logfor oneidentifier and find every line a single turn produced.
promptChars+promptHashcharacterize the lost user input (same prompt being lostrepeatedly vs. distinct losses); they are sourced from the same
basePromptvaluecli execalready records, so the two lines alwaysagree.
Why not a gateway-side outbox
The architectural fix is to pre-write the user turn to a gateway-owned
sidecar before invoking the CLI, so the message survives any
subprocess-side abort and can be re-delivered or surfaced to the user.
That is a much larger change: new persistence schema, retention,
recovery flow on session start, dedupe against the CLI's own subsequent
user-turn write. It also crosses multiple module boundaries (chat.send,
session transcript persistence, channel reply pipeline). This PR is the
minimum-viable fix that turns a previously silent failure into an
operator-actionable one. Outbox is a separate follow-up.
Real behavior proof
Behavior addressed: a real user prompt arrived via webchat,
started a fresh CLI live session, and the CLI subprocess produced no
output before being killed by the abort timer ~6m24s later. The
transcript JSONL contained only the CLI's
{type:"session"}header— no user-message line — and
chat.historyreturned an empty messagearray for that session key. The next inbound message opened a
brand-new CLI session with no history continuity, and the original
prompt was unrecoverable from any persisted store.
Real environment tested: macOS, Apple Silicon, OpenClaw 2026.5.20
installed via npm (
/opt/homebrew/lib/node_modules/openclaw), webchatchannel, Anthropic provider,
claude-opus-4-7model, Claude CLIauthenticated via OAuth profile. Single agent with model-scoped
claude-cliruntime. Bug reproduced organically — not engineered.Exact steps or command run after this patch:
installed bundled module
(
/opt/homebrew/lib/node_modules/openclaw/dist/claude-live-session-*.jsand
execute-*.js). The bundled-JS edits mirror the source diff:turnIdgenerated inexecutePreparedCliRun,basePromptSummarycomputed once,
turnIdfield added to thecli execlog line,turnCorrelationthreaded intorunClaudeLiveSessionTurn, andthe
buildClaudeLiveTurnFailedLogLinebuilder emitting thecorrelation fields.
openclaw gateway restart.node scripts/run-vitest.mjs src/agents/cli-runner/claude-live-session.test.ts src/agents/cli-runner.spawn.test.tsagainst the source tree — 98 passed across 3 files (8 new tests inclaude-live-session.test.ts).node scripts/run-vitest.mjs src/agents/cli-runner/— 208 passed across 25 files, no regressions.pnpm tsgoandpnpm tsgo:test— both green.pnpm lint --quietscoped to the four touched files — clean.Evidence after fix:
Before-fix gateway log excerpt (sensitive identifiers redacted,
timestamps stripped, run/conn/session IDs replaced):
The only trace is the
reason=abortline.failTurnwarned witherror=<errorKind>and no correlator. An operator could not identifywhich prompt was lost.
After-fix gateway log excerpt (same scrubbing):
The
turnId=<8hex>value is identical on both lines — byconstruction, since it is generated once in
executePreparedCliRunand passed verbatim into the live session.
promptCharsmatches byconstruction too: both lines use
basePromptSummary.chars,computed once from the same
basePrompt.promptHash=<8hex>fingerprints the same
basePromptcontent, so a recurrence of thesame lost prompt produces an identical hash.
Observed result after fix:
src/agents/cli-runner/tests pass (208/208 across 25 files).claude-live-session.test.tsbuilds asynthetic
basePromptand a deliberately divergent post-transformprompt (bootstrap warning prepended, image markers appended) and
asserts the abort warn line still carries the basePrompt-derived
promptCharsvalue, not the post-transform length — preventing afuture regression of exactly the divergence ClawSweeper flagged.
pnpm tsgoandpnpm tsgo:testare clean.pnpm lint --quietover the touched files is clean.diff in this PR semantically; nothing else in the install was
modified.
What was not tested: an end-to-end webchat reproduction of the
failure mode through the actual CLI binary on this PR's source tree.
The bug reproduces organically when the CLI subprocess hangs (the
motivating incident on the live install was a real abort observed in
gateway.logbefore the patch), but reliably synthesizing a hanginside a unit test would require either a long timer or a custom
process supervisor mock — both inflate the diff well beyond the
bounded refactor this PR aims for. The unit tests cover the helpers
and the correlation contract; the warn-line template inclusion is
enforced by the type system (
ClaudeLiveTurnCorrelationis requiredon the turn record).
Tests added / updated
src/agents/cli-runner/claude-live-session.test.ts(now coverssummarizePromptForLog,generateCliTurnId, andbuildClaudeLiveTurnFailedLogLine):reports the UTF-16 length of the promptreturns the first 8 hex characters of the prompt's SHA-256 digestyields a stable hash across repeated calls for the same promptyields different hashes for different promptsreturns an 8-character lowercase hex string(turnId shape)yields a different id on successive callsemits the correlation fields verbatim from the inputstays correlated with the cli exec line even when downstream transforms change the prompt length— the regression guardsrc/agents/cli-runner.spawn.test.ts:buildCliExecLogLinetest asserts the newturnId=<8hex>field appears.runClaudeLiveSessionTurnintegration call sites passthe new
turnCorrelationpayload.Response to ClawSweeper review
ClawSweeper rated the previous revision 🦐 gold shrimp citing:
Both rank-up moves are now in place:
turnIdis generated once inexecutePreparedCliRunand threaded through both log sites — immuneto any downstream prompt transform.
summarizePromptForLog(basePrompt)runs once inexecute.ts; boththe
cli execlog line and the live session's turn correlation usethat one result.
stays correlated with the cli exec line even when downstream transforms change the prompt lengthtest asserts the abort warnline carries the
basePrompt-derivedpromptChars, not thepost-transform value, even when the two are deliberately
constructed to differ.
Related
test) follows the recent
fix(agents/harness): route CLI backend aliases through PI for in-process compactionPR — same file, samescope expectation.
recovery + user-visible "your message wasn't processed" surface) is
intentionally not in scope here.
AI-assisted disclosure
This patch was authored end-to-end via Claude Code (claude-opus-4-7).
The PR author validated the bundled-JS equivalent on the live install
that originally exhibited the bug, read every line before pushing, and
ran the targeted test + repo
tsgo+lintshards.