fix(agents): preserve streamed assistant text when Claude CLI result event is empty#90450
Conversation
…event is empty The claude stream-json dialect can emit a final result event with empty text even though assistant text deltas were streamed (or the turn was tool-only). Both JSONL parsers trusted the final event and discarded the accumulated text, which made the CLI runner throw "CLI backend returned an empty response." and triggered model fallback, losing a reply that had already fully arrived. - createCliJsonlStreamingParser: fall back to the accumulated streamed text (or collected message texts) when the result event has no text, mirroring how session metadata is already preserved. - parseCliJsonl: accumulate streamed deltas and apply the same fallback. - CliOutput.hadToolCalls: structured flag set when tool_use / server_tool_use / mcp_tool_use blocks are seen, so tool-only turns are not treated as empty replies. - cli-runner: skip the empty-response FailoverError when the turn had tool calls. Genuinely empty turns keep the previous behavior: with no deltas the accumulated text is also empty, so the FailoverError still fires.
|
Codex review: needs maintainer review before merge. Reviewed June 9, 2026, 12:10 AM ET / 04:10 UTC. Summary PR surface: Source +76, Tests +243. Total +319 across 4 files. Reproducibility: yes. by source inspection and production logs: current main returns the empty Claude Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the parser-level preservation after maintainer acceptance of the Claude fallback semantics, with the PR body updated to say truly empty/tool-only turns still follow the existing silent-empty policy. Do we have a high-confidence way to reproduce the issue? Yes, by source inspection and production logs: current main returns the empty Claude Is this the best way to solve the issue? Yes: preserving deltas inside the CLI JSONL parser is the narrowest owner-boundary fix because both normal and live Claude CLI paths consume that parser output. The runner now keeps the existing default failover contract for truly empty output. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 9fdd56da2106. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +76, Tests +243. Total +319 across 4 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Addressing the two P2 risk notes from the review, since they're the design trade-offs maintainers need to accept: 1. Malformed empty result with a tool_use block → silent turn instead of fallback This is a real trade-off, but silence is the safer side of it. By the time the empty result arrives, the turn's tool calls have already executed (side effects included). Falling back re-answers the turn on a model that never saw those tool calls or their results, so the fallback reply is generated without the context of actions that already happened — arguably worse than a silent turn. Also note the window is narrow: the guard only suppresses the failover when the turn verifiably emitted tool_use blocks (structured detection, not heuristics) and both the final result text and every streamed text delta were empty. 2. Requested-provider/no-fallback contract for tool-only turns
If maintainers prefer a more conservative rollout, gating the Separately: we left an instrumented hook in our production gateway that dumps the raw JSONL stream the next time Claude emits an empty final result after streaming deltas; we'll attach a sanitized capture here when it fires. |
…er merge Fix import statement for CliBackendConfig type.
|
The malformed-import blocker from the last review was a casualty of syncing this branch with import { normalizeLowercaseStringOrEmpty } from "@openclaw/normalization-core/string-coerce";The prior verdict was computed against On the two P2 notes (tool-only → silent vs. fallback): covered in my earlier comment — silence is the safer side, since by the time the empty result arrives the tool calls have already executed, so a fallback re-answers without that context. Still happy to gate the @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…antReplyAsSilent The empty-response guard skipped failover whenever a Claude tool_use block was seen, even when allowEmptyAssistantReplyAsSilent was false — turning an existing fallback answer into no visible reply. Gate tool-only empty output on the existing silent-empty policy: empty turns (tool-only or not) fail over by default and stay silent only when the caller opted in. Parser-level streamed-text preservation is unchanged. Add cli-runner.reliability coverage for both policy states on tool-only empty output (default -> FailoverError, explicit silent -> SILENT_REPLY_TOKEN). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed Change ( Tests (
Local verification on This directly addresses the P1 at @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
The jsonl/text ternary widened to string without a const assertion (tsc TS2322), but a full type assertion tripped oxlint no-unnecessary-type-assertion. Use per-branch `as const` to satisfy both lint and typecheck. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
stream-jsondialect can emit a finalresultevent with empty text even though assistant text deltas were already streamed (or the turn was tool-only). Both JSONL parsers trust the final event and discard everything accumulated, so the CLI runner throwsCLI backend returned an empty response.(reason=empty_response) and the gateway falls back to the next model — losing a reply that had already fully arrived.resultevent has no text, fall back to the accumulated streamed deltas instead of dropping them, in bothcreateCliJsonlStreamingParserandparseCliJsonl; add a structuredCliOutput.hadToolCallsflag (via the existingisClaudeToolUseBlockType) so tool-only turns stop triggering the empty-response failover.parseClaudeCliJsonlResultdeliberately keeps session metadata when the result text is empty ("Keep the resolved session handle and usage instead of dropping them"). This PR extends the same reasoning to the streamed assistant text itself.FailoverErrorfires exactly as before. Non-Claude dialects are untouched (hadToolCallsstaysundefined).output.hadToolCalls !== trueguard incli-runner.ts.if (result) { output = result; return; }) is present in the published2026.6.1and2026.6.2-beta.1npm tarballs (dist/claude-live-session-*.js).Tests
cli-output.test.ts: 4 new cases — streamed text preserved on empty result (both parsers), tool-only flag set (both parsers); updated the existing "preserves Claude session metadata even when the final result text is empty" expectation (hadToolCalls: false).cli-runner.reliability.test.ts: existing empty-response throw/silent tests pass unchanged.pnpm build && pnpm check && pnpm testrun locally (macOS): the only failures are environment-specific (host locale leaking intoIntlformatting asserts and/tmp→/private/tmpsymlink canonicalization), none in the touched files.Real behavior proof (required for external PRs)
Behavior or issue addressed: Claude CLI turns streamed a full reply, but the final
resultevent arrived with empty text; the gateway threwCLI backend returned an empty response.(reason=empty_response) and fell back toopenai/gpt-5.5, discarding the streamed reply. Tool-only turns hit the same false failover.Real environment tested: production OpenClaw 2026.6.1 gateway on an Ubuntu VPS (systemd service),
anthropic/claude-sonnet-4-6andclaude-haiku-4-5via the claude-cli backend, fallback chain toopenai/gpt-5.5, live Discord/WhatsApp channels and cron agents.Exact steps or command run after this patch: applied the equivalent fix as a dist-bundle patch (same logic as this PR's source change) to
dist/claude-live-session-*.jsanddist/cli-runner-*.js, validated withnode --check, thensystemctl restart openclaw-gateway(2026-06-04 13:56 BRT) and watchedjournalctl -u openclaw-gateway -funder normal production traffic; later verified withsshto the host andjournalctl -u openclaw-gateway --since "14:26" | grep -c empty_response.Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): redacted runtime logs / copied live output below. After the restart:
journalctl -u openclaw-gateway --since "14:26" | grep -c empty_response→0(vs 3 occurrences earlier that same day), and the gateway boot shows claude sessions healthy:Observed result after fix: zero
empty_responsefallbacks across several hours of live traffic since the restart; claude turns that previously producedoutBytes=0now deliver the streamed text and stay on the requested model. The tool-only half of this fix has been running in the same production gateway since 2026-05-31 with no falseempty_responsefallbacks.What was not tested: non-Claude CLI backends (gemini/codex dialects — code paths untouched,
hadToolCallsstaysundefinedfor them);allowEmptyAssistantReplyAsSilent=truebeyond existing unit coverage.Proof limitations or environment constraints: the empty-result event is intermittent (3 occurrences on 2026-06-04 before the patch, none reproducible on demand); an instrumented hook in the production gateway dumps the raw JSONL stream on the next occurrence, and we will attach a sanitized capture as a follow-up comment.
Before evidence (optional but encouraged): three occurrences on 2026-06-04 before the patch. Note
outHash=e3b0c44298fc— the SHA-256 of the empty string — after 40/66/86 streamed JSONL lines: a full stream arrived, exactly zero bytes were delivered, and the turn was answered by the fallback model: