feat(cli-output): emit thinking_delta events; handle redacted single-block shape by adele-with-a-b · Pull Request #85381 · openclaw/openclaw

adele-with-a-b · 2026-05-22T13:48:41Z

Summary

Mirrors the onAssistantDelta surface for thinking events on the claude-cli stream-json parser in src/agents/cli-output.ts. Adds an onThinkingDelta callback to createCliJsonlStreamingParser and a public CliThinkingDelta type that two downstream PRs depend on:

feat(telegram): interleave CLI tool-progress + reasoning + rolling timer in one Telegram message #82285 (Telegram interleave) — needs the streaming-delta path to render thinking inline alongside assistant text in Telegram message previews.
feat(anthropic): claude-cli-interactive backend — stream reasoning via local TLS proxy #81851 (cli-interactive backend) — needs both shapes to render extended-thinking and redacted-thinking output through the cli-interactive surface.

Stacked on #80046; merge after that lands. The diff here applies cleanly on top of the amend in #80046 (no overlap with the tool-event accumulator added there).

Two emission shapes are handled:

Streaming-delta path (extended-thinking on cli-interactive backend). content_block_start with block.type === "thinking" initialises a per-index accumulator. Each content_block_delta with delta.type === "thinking_delta" and a string delta.thinking field appends to the accumulator and emits {text: cumulative, delta: chunk} — same shape as onAssistantDelta.
Single-block path (redacted thinking on adaptive models). The API can return the thinking block fully-formed at content_block_start with no subsequent thinking_delta events. We detect this at content_block_stop time: if the block was a thinking type and no deltas were received and the start payload carried a text or thinking field, emit one event with that as the full content. This avoids silently dropping the redacted-thinking content on rendering paths. Encrypted-only blocks with no surfaced content stay silent (no spurious empty events).

Addresses anagnorisis2peripeteia review 4345523435.

Real behavior proof (required for external PRs)

External-contributor real-environment proof, captured 2026-05-22 on macOS 15.x / Node 22 / claude-cli 2.1.148 against a live Anthropic Bedrock-routed Claude Opus 4.6 stream.

Behavior or issue addressed: createCliJsonlStreamingParser did not surface thinking_delta events emitted by claude-cli --output-format stream-json --include-partial-messages against extended-thinking-enabled models. Downstream consumers (Telegram interleave preview, cli-interactive backend) had no API to subscribe to thinking output, so live thinking content was silently dropped on rendering paths even when the model emitted it record-by-record.
Real environment tested: Live claude-cli invocation against claude-opus-4-6 (production-shape model: same model the OpenClaw gateway uses on M5 today). Captured the raw stream-json output and drove it through the patched createCliJsonlStreamingParser source on this branch (no mocks, no stub harness — direct import of src/agents/cli-output.ts).
Exact steps or command run after this patch:
1. git checkout feature/cli-output-thinking-delta
2. Capture a real claude-cli stream:
```
echo "Show a brief vivid mental image of a sunrise. Then immediately stop." \
  | claude -p \
      --output-format stream-json \
      --include-partial-messages \
      --model opus \
      --verbose \
      > opus-stream.jsonl
```
3. Drive the captured stream through the patched parser via tsx:
```
node --import tsx drive-parser.mts < opus-stream.jsonl
```
  where drive-parser.mts instantiates createCliJsonlStreamingParser from src/agents/cli-output.ts with onThinkingDelta + onAssistantDelta counters and replays the stream record-by-record.
4. pnpm test src/agents/cli-output.test.ts — 28 tests pass.

Evidence after fix:

Live opus-4-6 stream produced:

$ grep -oE '"type":"[^"]*"' opus-stream.jsonl | sort | uniq -c | sort -rn | head
     49 "type":"stream_event"
     42 "type":"content_block_delta"
     27 "type":"text_delta"
     14 "type":"thinking_delta"
      4 "type":"system"
      3 "type":"message"
      2 "type":"thinking"
      2 "type":"text"
      2 "type":"content_block_stop"
      2 "type":"content_block_start"

One representative thinking_delta record (signature redacted, content preserved verbatim from the raw capture for shape proof):

{"type":"stream_event","event":{"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":"","signature":""}},...}
{"type":"stream_event","event":{"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"[redacted-thinking-chunk]"}},...}
...
{"type":"stream_event","event":{"type":"content_block_stop","index":0},...}

Patched parser end-to-end against that opus stream:

$ node --import tsx drive-parser.mts < opus-stream.jsonl

--- opus stream → patched parser results ---
onThinkingDelta callbacks: 13
onAssistantDelta callbacks: 27

Unit-test surface still green:

Test Files  1 passed (1)
     Tests  28 passed (28)

Three new tests covering both emission shapes:

emits onThinkingDelta for streaming thinking_delta chunks with accumulating text (streaming-delta path)
emits a single onThinkingDelta when a thinking block arrives fully-formed at content_block_start with no subsequent deltas (single-block redacted path)
does not emit onThinkingDelta when a thinking block has no seed content and receives no deltas (silent-on-empty)

Observed result after fix: the patched createCliJsonlStreamingParser emits onThinkingDelta callbacks for every content_block_delta of type thinking_delta in a real opus stream-json output, with cumulative text + per-chunk delta — matching the existing onAssistantDelta shape. The 25 pre-existing tests on this file continue to pass unchanged (no regression to assistant-delta or tool-event surfaces).
What was not tested: the single-block redacted-thinking path was not exercised against a live model in this proof — opus-4-6 in --include-partial-messages mode emits the full streaming-delta sequence rather than the fully-formed-at-start single-block shape. The single-block path is exercised by the unit test (it constructs the API-documented redacted shape and asserts a single callback). Live coverage of that path will land via the downstream cli-interactive backend PR (feat(anthropic): claude-cli-interactive backend — stream reasoning via local TLS proxy #81851) which targets the redacted-thinking surface explicitly.

clawsweeper · 2026-05-22T13:50:49Z

Codex review: needs maintainer review before merge. Reviewed June 2, 2026, 12:25 AM ET / 04:25 UTC.

Summary
The PR adds an optional Claude CLI JSONL onThinkingDelta parser callback/type with accumulated thinking_delta handling and focused parser tests.

PR surface: Source +125, Tests +145. Total +270 across 2 files.

Reproducibility: not applicable. as a feature PR; current main lacks the callback, and the PR body supplies after-fix live CLI replay for the streaming path.

Review metrics: 1 noteworthy metric.

New raw-thinking parser surface: 1 optional callback and 1 exported type added. The new surface is small, but it carries reasoning text that must be gated correctly by downstream consumers.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P2] Confirm the real Claude CLI single-block redacted shape before downstream consumers depend on it.

Risk before merge

[P1] The new callback intentionally exposes raw model thinking text; downstream channel or interactive consumers must keep it behind the existing reasoning visibility gates rather than final assistant text, logs, or ungated previews.
[P1] The single-block redacted path is unit-only in the PR proof, and official SDK types use redacted_thinking with data, so maintainers should confirm the real Claude CLI shape before relying on that branch downstream.

Maintainer options:

Keep Parser-Only Boundary (recommended)
Land the callback as an opt-in parser surface after maintainer review, and require downstream PRs to prove reasoning-gated delivery before any channel-visible use.
Prove Redacted Shape First
Ask the author or downstream feat(anthropic): claude-cli-interactive backend — stream reasoning via local TLS proxy #81851 to add real Claude CLI evidence for the single-block redacted shape before merge.
Pause For Downstream Design
Leave this draft paused if maintainers want the parser, cli-interactive, and Telegram reasoning surfaces reviewed as one product boundary.

Next step before merge

[P2] Needs human review because accepting a raw thinking parser surface is a visibility/security boundary, not an automation repair.

Security
Cleared: No concrete supply-chain or code-execution issue was found; the diff is parser/test only, with raw thinking exposure tracked as a maintainer-visible boundary risk.

Review details

Best possible solution:

Keep this as a parser-only callback if maintainers accept the boundary, then require downstream consumers to wire it through existing reasoning gates with live proof.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a feature PR; current main lacks the callback, and the PR body supplies after-fix live CLI replay for the streaming path.

Is this the best way to solve the issue?

Mostly yes; the parser is the narrow owner for decoding Claude CLI stream-json thinking events, while delivery should remain a separate gated downstream change.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 63ed9adfe90a.

Label changes

Label justifications:

P2: This is a normal-priority agent CLI streaming feature with limited current-main blast radius and clear downstream value.
merge-risk: 🚨 security-boundary: The PR creates an opt-in raw thinking-text surface, so downstream routing could expose sensitive reasoning if maintainers do not preserve existing gates.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes live terminal proof replaying a real Claude CLI stream through the patched parser and observing onThinkingDelta callbacks; the single-block branch remains unit-only.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes live terminal proof replaying a real Claude CLI stream through the patched parser and observing onThinkingDelta callbacks; the single-block branch remains unit-only.

Evidence reviewed

PR surface:

Source +125, Tests +145. Total +270 across 2 files.

View PR surface stats

Area	Files	Added	Net
Source	1	125	+125
Tests	1	145	+145
Docs	0	0	0
Config	0	0	0
Generated	0	0	0
Other	0	0	0
Total	2	270	+270

Acceptance criteria:

[P1] Review supplied node scripts/run-vitest.mjs run src/agents/cli-output.test.ts result after the rebase.
[P1] Review supplied pnpm tsgo:core and pnpm build results after the rebase.
[P1] Before downstream merge, verify real redacted single-block Claude CLI output shape.

What I checked:

PR state and scope: Live PR API shows this PR is open, draft, mergeable clean, and scoped to 2 files with +270/-0 at head 83741c7. (83741c73db52)
Current main gap: Current main createCliJsonlStreamingParser accepts assistant/tool callbacks only; rg -n onThinkingDelta found no current-main implementation. (src/agents/cli-output.ts:612, 63ed9adfe90a)
PR parser surface: The diff adds CliThinkingDelta, a ThinkingTracker, dispatchClaudeCliStreamingThinkingEvent, and optional onThinkingDelta wiring into the existing parser. (src/agents/cli-output.ts:42, 83741c73db52)
PR tests: The diff adds tests for streaming thinking_delta accumulation, a fully formed seed-content block, and empty thinking blocks staying silent. (src/agents/cli-output.test.ts:893, 83741c73db52)
Caller boundary: Current live and JSONL CLI runners construct the parser with assistant/tool callbacks only, so the PR is parser-only until downstream callers wire the new callback. (src/agents/cli-runner/execute.ts:551, 63ed9adfe90a)
Existing reasoning bridge: Current Claude CLI reply flow bridges assistant text into reasoning as a coarse existing workaround, which explains why a native thinking callback is a distinct improvement. (src/auto-reply/reply/agent-runner-cli-dispatch.ts:209, 63ed9adfe90a)

Likely related people:

@adele-with-a-b: Authored the merged fix(agents/cli): bridge CLI tool_use lifecycle events to channel preview #80046 CLI tool-event parser path and this PR's one remaining thinking-parser commit. (role: recent area contributor; confidence: high; commits: 9de6abd8d775, 83741c73db52; files: src/agents/cli-output.ts, src/agents/cli-output.test.ts)
@obviyus: Merged fix(agents/cli): bridge CLI tool_use lifecycle events to channel preview #80046 and appears in recent CLI/autoreply history for claude-cli routing and session behavior. (role: merger and adjacent owner; confidence: medium; commits: 9de6abd8d775, 2b726457d898; files: src/agents/cli-output.ts, src/auto-reply/reply/agent-runner-cli-dispatch.ts)
@steipete: Recent commits refactored normalization/core parser-adjacent code and current-main blame in the parser region points through recent main work. (role: recent area contributor; confidence: medium; commits: b9fe0894a6da, 00d8d7ead059, 722af385d29a; files: src/agents/cli-output.ts, src/agents/cli-runner/execute.ts)
@benjamin1492: Recent merged claude-cli transcript/session work touched the same parser file and adjacent CLI runtime behavior. (role: adjacent contributor; confidence: low; commits: de455304cc1c; files: src/agents/cli-output.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

clawsweeper · 2026-05-22T13:59:05Z

ClawSweeper PR egg

✨ Hatched: 🥚 common Frosted Proofling

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

Merged PRs are hatchable.
Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: stacks clean commits.
Image traits: location proof lagoon; accessory CI status badge; palette moonlit blue and soft silver; mood mischievous; pose holding its accessory up for inspection; shell matte ceramic shell; lighting subtle sparkle highlights; background subtle branch markers.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Frosted Proofling in ClawSweeper.

What is this egg doing here?

Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

adele-with-a-b · 2026-05-22T18:18:49Z

@clawsweeper re-review

PR body's Real behavior proof section now contains live opus-4-6 stream-json capture + end-to-end driver showing the patched parser emits 13 onThinkingDelta + 27 onAssistantDelta callbacks against the real stream. Replaces the previous unit-test-only proof.

clawsweeper · 2026-05-22T18:18:52Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26304728436
Updated: 2026-05-22T18:28:35.295Z

…block shape Addresses anagnorisis2peripeteia review 4345523435. Mirrors the onAssistantDelta surface for thinking events: - Streaming path: accumulate per-index, emit {text,delta} per chunk - Single-block path: emit one full event when content_block_stop fires for a thinking block that received no deltas (redacted thinking on adaptive models returns the block fully-formed) Downstream consumers openclaw#82285 (Telegram interleave) and openclaw#81851 (cli-interactive backend) consume this surface directly. Without the single-block detection, redacted thinking would silently drop on those rendering paths.

adele-with-a-b · 2026-06-02T04:15:57Z

@clawsweeper re-review

Rebased onto current upstream/main (1e7a0d89). The PR is now scoped to a single commit (83741c73) — the previous two commits (ee29656a and d53614568d) were absorbed upstream when #80046 merged 2026-05-29 (squash commit 9de6abd8, fix(agents): bridge CLI tool progress events). Both input_json_delta accumulation and server_tool_use/mcp_tool_use recognition shipped in that squash, so this PR no longer needs to add them.

Remaining content: thinking_delta event surfacing + redacted-single-block-shape handling. That's the unique merit of #85381 and it's NOT in upstream.

What the rebase touched:

Conflict in src/agents/cli-output.test.ts — import block. Both CliThinkingDelta (this PR's addition) and CliToolUseStartDelta (upstream's addition via the squash) needed to coexist. Resolved by taking both.
One pre-existing oxlint warning surfaced post-rebase (no-useless-return at cli-output.ts:722 and an unused-import for CliToolResultDelta); fixed in the amended commit.

Validation (post-rebase):

node scripts/run-vitest.mjs run src/agents/cli-output.test.ts — 29/29 passing
pnpm exec oxfmt --check — clean
node scripts/run-oxlint.mjs src/agents/cli-output.ts src/agents/cli-output.test.ts — 0 warnings 0 errors
pnpm tsgo:core — clean
pnpm build — clean (61s)

clawsweeper · 2026-06-02T04:15:59Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26798045352
Updated: 2026-06-02T04:25:43.776Z

openclaw-barnacle Bot added agents Agent runtime and tooling size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026

adele-with-a-b changed the title ~~feat(cli-output): emit thinking_delta events; handle redacted single-block shape [AI]~~ feat(cli-output): emit thinking_delta events; handle redacted single-block shape May 22, 2026

adele-with-a-b mentioned this pull request May 22, 2026

fix(agents/cli): bridge CLI tool_use lifecycle events to channel preview #80046

Merged

openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026

This was referenced May 26, 2026

feat(anthropic): claude-cli-interactive backend — stream reasoning via local TLS proxy #81851

Open

feat(telegram): interleave CLI tool-progress + reasoning + rolling timer in one Telegram message #82285

Closed

This comment was marked as spam.

Sign in to view

anagnorisis2peripeteia mentioned this pull request May 28, 2026

feat(telegram): opt-in interleaved progress lane #87072

Merged

2 tasks

adele-with-a-b force-pushed the feature/cli-output-thinking-delta branch from 9f6d40e to 83741c7 Compare June 2, 2026 04:15

openclaw-barnacle Bot added size: M and removed size: XL labels Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(cli-output): emit thinking_delta events; handle redacted single-block shape#85381

feat(cli-output): emit thinking_delta events; handle redacted single-block shape#85381
adele-with-a-b wants to merge 1 commit into
openclaw:mainfrom
adele-with-a-b:feature/cli-output-thinking-delta

adele-with-a-b commented May 22, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 22, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 22, 2026 •

edited

Loading

Uh oh!

adele-with-a-b commented May 22, 2026

Uh oh!

clawsweeper Bot commented May 22, 2026 •

edited

Loading

Uh oh!

This comment was marked as spam.

adele-with-a-b commented Jun 2, 2026

Uh oh!

clawsweeper Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

adele-with-a-b commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Real behavior proof (required for external PRs)

Uh oh!

clawsweeper Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hatch command

Uh oh!

adele-with-a-b commented May 22, 2026

Uh oh!

clawsweeper Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as spam.

adele-with-a-b commented Jun 2, 2026

Uh oh!

clawsweeper Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adele-with-a-b commented May 22, 2026 •

edited

Loading

clawsweeper Bot commented May 22, 2026 •

edited

Loading

clawsweeper Bot commented May 22, 2026 •

edited

Loading

clawsweeper Bot commented May 22, 2026 •

edited

Loading

clawsweeper Bot commented Jun 2, 2026 •

edited

Loading