Skip to content

feat(cli-output): emit thinking_delta events; handle redacted single-block shape#85381

Draft
adele-with-a-b wants to merge 1 commit into
openclaw:mainfrom
adele-with-a-b:feature/cli-output-thinking-delta
Draft

feat(cli-output): emit thinking_delta events; handle redacted single-block shape#85381
adele-with-a-b wants to merge 1 commit into
openclaw:mainfrom
adele-with-a-b:feature/cli-output-thinking-delta

Conversation

@adele-with-a-b

@adele-with-a-b adele-with-a-b commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Mirrors the onAssistantDelta surface for thinking events on the claude-cli stream-json parser in src/agents/cli-output.ts. Adds an onThinkingDelta callback to createCliJsonlStreamingParser and a public CliThinkingDelta type that two downstream PRs depend on:

Stacked on #80046; merge after that lands. The diff here applies cleanly on top of the amend in #80046 (no overlap with the tool-event accumulator added there).

Two emission shapes are handled:

  1. Streaming-delta path (extended-thinking on cli-interactive backend). content_block_start with block.type === "thinking" initialises a per-index accumulator. Each content_block_delta with delta.type === "thinking_delta" and a string delta.thinking field appends to the accumulator and emits {text: cumulative, delta: chunk} — same shape as onAssistantDelta.

  2. Single-block path (redacted thinking on adaptive models). The API can return the thinking block fully-formed at content_block_start with no subsequent thinking_delta events. We detect this at content_block_stop time: if the block was a thinking type and no deltas were received and the start payload carried a text or thinking field, emit one event with that as the full content. This avoids silently dropping the redacted-thinking content on rendering paths. Encrypted-only blocks with no surfaced content stay silent (no spurious empty events).

Addresses anagnorisis2peripeteia review 4345523435.

Real behavior proof (required for external PRs)

External-contributor real-environment proof, captured 2026-05-22 on macOS 15.x / Node 22 / claude-cli 2.1.148 against a live Anthropic Bedrock-routed Claude Opus 4.6 stream.

  • Behavior or issue addressed: createCliJsonlStreamingParser did not surface thinking_delta events emitted by claude-cli --output-format stream-json --include-partial-messages against extended-thinking-enabled models. Downstream consumers (Telegram interleave preview, cli-interactive backend) had no API to subscribe to thinking output, so live thinking content was silently dropped on rendering paths even when the model emitted it record-by-record.

  • Real environment tested: Live claude-cli invocation against claude-opus-4-6 (production-shape model: same model the OpenClaw gateway uses on M5 today). Captured the raw stream-json output and drove it through the patched createCliJsonlStreamingParser source on this branch (no mocks, no stub harness — direct import of src/agents/cli-output.ts).

  • Exact steps or command run after this patch:

    1. git checkout feature/cli-output-thinking-delta
    2. Capture a real claude-cli stream:
      echo "Show a brief vivid mental image of a sunrise. Then immediately stop." \
        | claude -p \
            --output-format stream-json \
            --include-partial-messages \
            --model opus \
            --verbose \
            > opus-stream.jsonl
      
    3. Drive the captured stream through the patched parser via tsx:
      node --import tsx drive-parser.mts < opus-stream.jsonl
      
      where drive-parser.mts instantiates createCliJsonlStreamingParser from src/agents/cli-output.ts with onThinkingDelta + onAssistantDelta counters and replays the stream record-by-record.
    4. pnpm test src/agents/cli-output.test.ts — 28 tests pass.
  • Evidence after fix:

    Live opus-4-6 stream produced:

    $ grep -oE '"type":"[^"]*"' opus-stream.jsonl | sort | uniq -c | sort -rn | head
         49 "type":"stream_event"
         42 "type":"content_block_delta"
         27 "type":"text_delta"
         14 "type":"thinking_delta"
          4 "type":"system"
          3 "type":"message"
          2 "type":"thinking"
          2 "type":"text"
          2 "type":"content_block_stop"
          2 "type":"content_block_start"
    

    One representative thinking_delta record (signature redacted, content preserved verbatim from the raw capture for shape proof):

    {"type":"stream_event","event":{"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":"","signature":""}},...}
    {"type":"stream_event","event":{"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"[redacted-thinking-chunk]"}},...}
    ...
    {"type":"stream_event","event":{"type":"content_block_stop","index":0},...}
    

    Patched parser end-to-end against that opus stream:

    $ node --import tsx drive-parser.mts < opus-stream.jsonl
    
    --- opus stream → patched parser results ---
    onThinkingDelta callbacks: 13
    onAssistantDelta callbacks: 27
    

    Unit-test surface still green:

    Test Files  1 passed (1)
         Tests  28 passed (28)
    

    Three new tests covering both emission shapes:

    • emits onThinkingDelta for streaming thinking_delta chunks with accumulating text (streaming-delta path)
    • emits a single onThinkingDelta when a thinking block arrives fully-formed at content_block_start with no subsequent deltas (single-block redacted path)
    • does not emit onThinkingDelta when a thinking block has no seed content and receives no deltas (silent-on-empty)
  • Observed result after fix: the patched createCliJsonlStreamingParser emits onThinkingDelta callbacks for every content_block_delta of type thinking_delta in a real opus stream-json output, with cumulative text + per-chunk delta — matching the existing onAssistantDelta shape. The 25 pre-existing tests on this file continue to pass unchanged (no regression to assistant-delta or tool-event surfaces).

  • What was not tested: the single-block redacted-thinking path was not exercised against a live model in this proof — opus-4-6 in --include-partial-messages mode emits the full streaming-delta sequence rather than the fully-formed-at-start single-block shape. The single-block path is exercised by the unit test (it constructs the API-documented redacted shape and asserts a single callback). Live coverage of that path will land via the downstream cli-interactive backend PR (feat(anthropic): claude-cli-interactive backend — stream reasoning via local TLS proxy #81851) which targets the redacted-thinking surface explicitly.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 2, 2026, 12:25 AM ET / 04:25 UTC.

Summary
The PR adds an optional Claude CLI JSONL onThinkingDelta parser callback/type with accumulated thinking_delta handling and focused parser tests.

PR surface: Source +125, Tests +145. Total +270 across 2 files.

Reproducibility: not applicable. as a feature PR; current main lacks the callback, and the PR body supplies after-fix live CLI replay for the streaming path.

Review metrics: 1 noteworthy metric.

  • New raw-thinking parser surface: 1 optional callback and 1 exported type added. The new surface is small, but it carries reasoning text that must be gated correctly by downstream consumers.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Confirm the real Claude CLI single-block redacted shape before downstream consumers depend on it.

Risk before merge

  • [P1] The new callback intentionally exposes raw model thinking text; downstream channel or interactive consumers must keep it behind the existing reasoning visibility gates rather than final assistant text, logs, or ungated previews.
  • [P1] The single-block redacted path is unit-only in the PR proof, and official SDK types use redacted_thinking with data, so maintainers should confirm the real Claude CLI shape before relying on that branch downstream.

Maintainer options:

  1. Keep Parser-Only Boundary (recommended)
    Land the callback as an opt-in parser surface after maintainer review, and require downstream PRs to prove reasoning-gated delivery before any channel-visible use.
  2. Prove Redacted Shape First
    Ask the author or downstream feat(anthropic): claude-cli-interactive backend — stream reasoning via local TLS proxy #81851 to add real Claude CLI evidence for the single-block redacted shape before merge.
  3. Pause For Downstream Design
    Leave this draft paused if maintainers want the parser, cli-interactive, and Telegram reasoning surfaces reviewed as one product boundary.

Next step before merge

  • [P2] Needs human review because accepting a raw thinking parser surface is a visibility/security boundary, not an automation repair.

Security
Cleared: No concrete supply-chain or code-execution issue was found; the diff is parser/test only, with raw thinking exposure tracked as a maintainer-visible boundary risk.

Review details

Best possible solution:

Keep this as a parser-only callback if maintainers accept the boundary, then require downstream consumers to wire it through existing reasoning gates with live proof.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a feature PR; current main lacks the callback, and the PR body supplies after-fix live CLI replay for the streaming path.

Is this the best way to solve the issue?

Mostly yes; the parser is the narrow owner for decoding Claude CLI stream-json thinking events, while delivery should remain a separate gated downstream change.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 63ed9adfe90a.

Label changes

Label justifications:

  • P2: This is a normal-priority agent CLI streaming feature with limited current-main blast radius and clear downstream value.
  • merge-risk: 🚨 security-boundary: The PR creates an opt-in raw thinking-text surface, so downstream routing could expose sensitive reasoning if maintainers do not preserve existing gates.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes live terminal proof replaying a real Claude CLI stream through the patched parser and observing onThinkingDelta callbacks; the single-block branch remains unit-only.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes live terminal proof replaying a real Claude CLI stream through the patched parser and observing onThinkingDelta callbacks; the single-block branch remains unit-only.
Evidence reviewed

PR surface:

Source +125, Tests +145. Total +270 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 125 0 +125
Tests 1 145 0 +145
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 270 0 +270

Acceptance criteria:

  • [P1] Review supplied node scripts/run-vitest.mjs run src/agents/cli-output.test.ts result after the rebase.
  • [P1] Review supplied pnpm tsgo:core and pnpm build results after the rebase.
  • [P1] Before downstream merge, verify real redacted single-block Claude CLI output shape.

What I checked:

Likely related people:

What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Frosted Proofling

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: stacks clean commits.
Image traits: location proof lagoon; accessory CI status badge; palette moonlit blue and soft silver; mood mischievous; pose holding its accessory up for inspection; shell matte ceramic shell; lighting subtle sparkle highlights; background subtle branch markers.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Frosted Proofling in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@adele-with-a-b adele-with-a-b changed the title feat(cli-output): emit thinking_delta events; handle redacted single-block shape [AI] feat(cli-output): emit thinking_delta events; handle redacted single-block shape May 22, 2026
@adele-with-a-b

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

PR body's Real behavior proof section now contains live opus-4-6 stream-json capture + end-to-end driver showing the patched parser emits 13 onThinkingDelta + 27 onAssistantDelta callbacks against the real stream. Replaces the previous unit-test-only proof.

@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 22, 2026
@BingqingLyu

This comment was marked as spam.

…block shape

Addresses anagnorisis2peripeteia review 4345523435.

Mirrors the onAssistantDelta surface for thinking events:
- Streaming path: accumulate per-index, emit {text,delta} per chunk
- Single-block path: emit one full event when content_block_stop
  fires for a thinking block that received no deltas (redacted
  thinking on adaptive models returns the block fully-formed)

Downstream consumers openclaw#82285 (Telegram interleave) and openclaw#81851
(cli-interactive backend) consume this surface directly. Without
the single-block detection, redacted thinking would silently
drop on those rendering paths.
@adele-with-a-b adele-with-a-b force-pushed the feature/cli-output-thinking-delta branch from 9f6d40e to 83741c7 Compare June 2, 2026 04:15
@adele-with-a-b

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

Rebased onto current upstream/main (1e7a0d89). The PR is now scoped to a single commit (83741c73) — the previous two commits (ee29656a and d53614568d) were absorbed upstream when #80046 merged 2026-05-29 (squash commit 9de6abd8, fix(agents): bridge CLI tool progress events). Both input_json_delta accumulation and server_tool_use/mcp_tool_use recognition shipped in that squash, so this PR no longer needs to add them.

Remaining content: thinking_delta event surfacing + redacted-single-block-shape handling. That's the unique merit of #85381 and it's NOT in upstream.

What the rebase touched:

  • Conflict in src/agents/cli-output.test.ts — import block. Both CliThinkingDelta (this PR's addition) and CliToolUseStartDelta (upstream's addition via the squash) needed to coexist. Resolved by taking both.
  • One pre-existing oxlint warning surfaced post-rebase (no-useless-return at cli-output.ts:722 and an unused-import for CliToolResultDelta); fixed in the amended commit.

Validation (post-rebase):

  • node scripts/run-vitest.mjs run src/agents/cli-output.test.ts — 29/29 passing
  • pnpm exec oxfmt --check — clean
  • node scripts/run-oxlint.mjs src/agents/cli-output.ts src/agents/cli-output.test.ts — 0 warnings 0 errors
  • pnpm tsgo:core — clean
  • pnpm build — clean (61s)

@clawsweeper

clawsweeper Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants