feat(cli-output): emit thinking_delta events; handle redacted single-block shape#85381
feat(cli-output): emit thinking_delta events; handle redacted single-block shape#85381adele-with-a-b wants to merge 1 commit into
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 2, 2026, 12:25 AM ET / 04:25 UTC. Summary PR surface: Source +125, Tests +145. Total +270 across 2 files. Reproducibility: not applicable. as a feature PR; current main lacks the callback, and the PR body supplies after-fix live CLI replay for the streaming path. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Keep this as a parser-only callback if maintainers accept the boundary, then require downstream consumers to wire it through existing reasoning gates with live proof. Do we have a high-confidence way to reproduce the issue? Not applicable as a feature PR; current main lacks the callback, and the PR body supplies after-fix live CLI replay for the streaming path. Is this the best way to solve the issue? Mostly yes; the parser is the narrow owner for decoding Claude CLI stream-json thinking events, while delivery should remain a separate gated downstream change. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 63ed9adfe90a. Label changesLabel justifications:
Evidence reviewedPR surface: Source +125, Tests +145. Total +270 across 2 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Frosted Proofling Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
|
@clawsweeper re-review PR body's Real behavior proof section now contains live opus-4-6 stream-json capture + end-to-end driver showing the patched parser emits 13 |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
This comment was marked as spam.
This comment was marked as spam.
…block shape
Addresses anagnorisis2peripeteia review 4345523435.
Mirrors the onAssistantDelta surface for thinking events:
- Streaming path: accumulate per-index, emit {text,delta} per chunk
- Single-block path: emit one full event when content_block_stop
fires for a thinking block that received no deltas (redacted
thinking on adaptive models returns the block fully-formed)
Downstream consumers openclaw#82285 (Telegram interleave) and openclaw#81851
(cli-interactive backend) consume this surface directly. Without
the single-block detection, redacted thinking would silently
drop on those rendering paths.
9f6d40e to
83741c7
Compare
|
@clawsweeper re-review Rebased onto current Remaining content: What the rebase touched:
Validation (post-rebase):
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Summary
Mirrors the
onAssistantDeltasurface for thinking events on the claude-cli stream-json parser insrc/agents/cli-output.ts. Adds anonThinkingDeltacallback tocreateCliJsonlStreamingParserand a publicCliThinkingDeltatype that two downstream PRs depend on:Stacked on #80046; merge after that lands. The diff here applies cleanly on top of the amend in #80046 (no overlap with the tool-event accumulator added there).
Two emission shapes are handled:
Streaming-delta path (extended-thinking on cli-interactive backend).
content_block_startwithblock.type === "thinking"initialises a per-index accumulator. Eachcontent_block_deltawithdelta.type === "thinking_delta"and a stringdelta.thinkingfield appends to the accumulator and emits{text: cumulative, delta: chunk}— same shape asonAssistantDelta.Single-block path (redacted thinking on adaptive models). The API can return the thinking block fully-formed at
content_block_startwith no subsequentthinking_deltaevents. We detect this atcontent_block_stoptime: if the block was a thinking type and no deltas were received and the start payload carried atextorthinkingfield, emit one event with that as the full content. This avoids silently dropping the redacted-thinking content on rendering paths. Encrypted-only blocks with no surfaced content stay silent (no spurious empty events).Addresses anagnorisis2peripeteia review 4345523435.
Real behavior proof (required for external PRs)
External-contributor real-environment proof, captured 2026-05-22 on macOS 15.x / Node 22 / claude-cli
2.1.148against a live Anthropic Bedrock-routed Claude Opus 4.6 stream.Behavior or issue addressed:
createCliJsonlStreamingParserdid not surfacethinking_deltaevents emitted byclaude-cli --output-format stream-json --include-partial-messagesagainst extended-thinking-enabled models. Downstream consumers (Telegram interleave preview, cli-interactive backend) had no API to subscribe to thinking output, so live thinking content was silently dropped on rendering paths even when the model emitted it record-by-record.Real environment tested: Live
claude-cliinvocation againstclaude-opus-4-6(production-shape model: same model the OpenClaw gateway uses on M5 today). Captured the raw stream-json output and drove it through the patchedcreateCliJsonlStreamingParsersource on this branch (no mocks, no stub harness — direct import ofsrc/agents/cli-output.ts).Exact steps or command run after this patch:
git checkout feature/cli-output-thinking-deltatsx:drive-parser.mtsinstantiatescreateCliJsonlStreamingParserfromsrc/agents/cli-output.tswithonThinkingDelta+onAssistantDeltacounters and replays the stream record-by-record.pnpm test src/agents/cli-output.test.ts— 28 tests pass.Evidence after fix:
Live opus-4-6 stream produced:
One representative
thinking_deltarecord (signature redacted, content preserved verbatim from the raw capture for shape proof):Patched parser end-to-end against that opus stream:
Unit-test surface still green:
Three new tests covering both emission shapes:
emits onThinkingDelta for streaming thinking_delta chunks with accumulating text(streaming-delta path)emits a single onThinkingDelta when a thinking block arrives fully-formed at content_block_start with no subsequent deltas(single-block redacted path)does not emit onThinkingDelta when a thinking block has no seed content and receives no deltas(silent-on-empty)Observed result after fix: the patched
createCliJsonlStreamingParseremitsonThinkingDeltacallbacks for everycontent_block_deltaof typethinking_deltain a real opus stream-json output, with cumulative text + per-chunk delta — matching the existingonAssistantDeltashape. The 25 pre-existing tests on this file continue to pass unchanged (no regression to assistant-delta or tool-event surfaces).What was not tested: the single-block redacted-thinking path was not exercised against a live model in this proof — opus-4-6 in
--include-partial-messagesmode emits the full streaming-delta sequence rather than the fully-formed-at-start single-block shape. The single-block path is exercised by the unit test (it constructs the API-documented redacted shape and asserts a single callback). Live coverage of that path will land via the downstream cli-interactive backend PR (feat(anthropic): claude-cli-interactive backend — stream reasoning via local TLS proxy #81851) which targets the redacted-thinking surface explicitly.