feat(auto-reply): durable inter-tool commentary via verbose standalone progress (supersedes #89850/#89890)#91976
Conversation
|
Codex review: needs changes before merge. Reviewed June 10, 2026, 10:40 PM ET / 02:40 UTC. Summary PR surface: Source +233, Tests +436. Total +669 across 10 files. Reproducibility: yes. from source inspection: enable verbose progress with a commentary-enabled Slack or Microsoft Teams draft and emit a preamble item; the channel draft handles it while shared dispatch also creates the durable progress message. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Copy recommended automerge instructionNext step before merge
Security Review findings
Review detailsBest possible solution: Use one explicit shared progress-ownership contract, migrate every current commentary-draft consumer in the same change, and preserve active/inactive exactly-once regression coverage per affected channel. Do we have a high-confidence way to reproduce the issue? Yes, from source inspection: enable verbose progress with a commentary-enabled Slack or Microsoft Teams draft and emit a preamble item; the channel draft handles it while shared dispatch also creates the durable progress message. Is this the best way to solve the issue? No. Durable verbose commentary is a reasonable direction, but changing the generic dispatch lane while coordinating only Telegram and Discord is not the narrowest complete fix; every existing commentary-draft consumer must adopt or be gated from the contract. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: reasoning high; reviewed against d4fcc3869621. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +233, Tests +436. Total +669 across 10 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
@clawsweeper re-review Addressed the P1: Discord now consumes the same durable-lane visibility getter as Telegram and yields its preamble draft lines while standalone verbose commentary is active (commit 4, with an active/inactive regression pair proving exactly-once rendering). Also folded in the sibling backend gap this review surfaced indirectly: CLI-backed runs emitted no durable tool summaries under verbose at all (the parser's tool result events were dropped at the runner bridge), so the durable lane was commentary-only on claude-cli. Commit 3 forwards those events and renders the same formatToolAggregate summaries the embedded runner emits. Fresh real-Telegram captures in the body now show the full interleaved durable record (commentary + tool summaries + clean answer) in both streaming modes. |
|
🦞🧹 I asked ClawSweeper to review this item again. |
… progress messages When verbose progress is enabled, preamble item events now flush as durable standalone progress messages through the same delivery path as tool summaries, instead of living only in ephemeral channel streaming drafts. The latest text per item id is buffered so snapshot-style producers send one message per item; the buffer flushes when the producer moves on (next item, tool event, block reply, or final reply) and drains before the final answer. Verbose runs also force commentary classification on (commentaryProgressEnabled), so inter-tool text routes to the commentary lane rather than being folded into the final answer text. Dispatch additionally exposes a live verbose-progress visibility getter via the new onVerboseProgressVisibility reply option, so draft-rendering channels can route progress to the durable lane while it is active.
…to the streaming draft With streaming on, the dispatcher diverted tool-kind payloads (including the new durable commentary messages) into the ephemeral progress draft, where they were discarded when the final answer arrived - so verbose runs lost their progress record whenever streaming was enabled. While the durable verbose lane is active (per the dispatch visibility getter), tool payloads are now sent as real standalone messages and the draft yields its commentary lines; tool/plan draft lines keep the draft since they have no durable counterpart. Reasoning lane and tool status reactions are unaffected.
…sults The CLI parser already emits tool result events (name, toolCallId, isError, sanitized result), but the runner bridge dropped them, so CLI-backed runs had no durable tool record under verbose while embedded runs did. The bridge now forwards result events, and both runners feed a summary tracker that renders the same formatToolAggregate line the embedded runner emits (meta captured from the start event args), plus the tool output block when full verbose output is enabled. Delivery rides each runner's existing tool-result route, so verbose gating, ordering ahead of the final answer, and the Telegram durable routing all apply unchanged.
…ss is active Discord consumes the dispatch verbose-progress visibility getter the same way Telegram does: while the durable lane is delivering commentary as standalone messages, the ephemeral progress draft skips its preamble lines so commentary renders exactly once. Covered by an active/inactive regression pair.
1b1fc20 to
f8d867e
Compare
|
Landed via rebase onto main (head
Thanks @anagnorisis2peripeteia — and for the careful reshape from the persist-mode PRs to this core-owned design! |
…ane is off After openclaw#91976, the claude-cli JSONL parser reclassifies assistant text that precedes a tool_use block as commentary. The classification gate (commentaryProgressEnabled !== undefined) was looser than the delivery gate (commentaryProgressEnabled === true && onItemEvent), so any channel that defined the flag as false engaged classification with no consumer wired: flushPendingClaudeCommentaryText() called an undefined onCommentaryText and silently discarded the text. On Discord with verbose off this dropped all inter-tool narration and the pre-final-answer preamble text. Two-layer fix: - Align the classify gate with the delivery gate in both CLI dispatch sites (agent-runner-execution, followup-runner) so classification only engages when a commentary consumer exists. - Defense in depth: flushPendingClaudeCommentaryText() now falls back to the assistant text lane instead of discarding when no consumer is wired, so no future gate mismatch can silently eat model output. Reported on Discord: claude-cli backend lost interleaved narration and the regular-text reasoning preamble with or without /verbose on.
…ane is off After #91976, the claude-cli JSONL parser reclassifies assistant text that precedes a tool_use block as commentary. The classification gate (commentaryProgressEnabled !== undefined) was looser than the delivery gate (commentaryProgressEnabled === true && onItemEvent), so any channel that defined the flag as false engaged classification with no consumer wired: flushPendingClaudeCommentaryText() called an undefined onCommentaryText and silently discarded the text. On Discord with verbose off this dropped all inter-tool narration and the pre-final-answer preamble text. Two-layer fix: - Align the classify gate with the delivery gate in both CLI dispatch sites (agent-runner-execution, followup-runner) so classification only engages when a commentary consumer exists. - Defense in depth: flushPendingClaudeCommentaryText() now falls back to the assistant text lane instead of discarding when no consumer is wired, so no future gate mismatch can silently eat model output. Reported on Discord: claude-cli backend lost interleaved narration and the regular-text reasoning preamble with or without /verbose on.
…ane is off After openclaw#91976, the claude-cli JSONL parser reclassifies assistant text that precedes a tool_use block as commentary. The classification gate (commentaryProgressEnabled !== undefined) was looser than the delivery gate (commentaryProgressEnabled === true && onItemEvent), so any channel that defined the flag as false engaged classification with no consumer wired: flushPendingClaudeCommentaryText() called an undefined onCommentaryText and silently discarded the text. On Discord with verbose off this dropped all inter-tool narration and the pre-final-answer preamble text. Two-layer fix: - Align the classify gate with the delivery gate in both CLI dispatch sites (agent-runner-execution, followup-runner) so classification only engages when a commentary consumer exists. - Defense in depth: flushPendingClaudeCommentaryText() now falls back to the assistant text lane instead of discarding when no consumer is wired, so no future gate mismatch can silently eat model output. Reported on Discord: claude-cli backend lost interleaved narration and the regular-text reasoning preamble with or without /verbose on.
The pin-from-here mirror replays the origin run's agent-event bus into each pinned target's own dispatch. The CLI runner emits stream:"tool" events with both phase:"start" and phase:"result", but the resolver routed EVERY tool event to onToolStart — so phase:"result" events (which drive the durable verbose tool summary, openclaw#91976) were mis-rendered and the mirror lost its tool record while still showing commentary. Run the bus tool events through the same createCliToolSummaryTracker the native CLI dispatch uses: "start" captures args-meta by toolCallId; "result" formats the aggregate and delivers it via the target dispatch's onToolResult (which still gates the actual send on verbose). The mirror's tool summaries are now byte-identical to a native turn's, in both streaming modes. toolProgressDetail is threaded from the origin config so the args detail matches. Tests: resolver renders a durable summary from start+result and routes result to onToolResult (not onToolStart); error propagation; existing stream-routing regression. echo-mirror-resolver 9 + mirror-dispatch 5 green.
Supersedes #89850 and #89890, reshaped per @obviyus's guidance there: instead of persisting the ephemeral streaming draft (new
persistProgressconfig key, Telegram-only), this adds a second consumer for inter-tool commentary in core's verbose standalone-progress path — the same place tool summaries become durable messages indispatch-from-config— gated by the existing/verbose on. No new config key, and it works on every channel that uses the generic dispatch path.It also closes the backend gap that made "verbose already does tool summaries" untrue for CLI runs: the CLI runner now emits the same durable tool summaries the embedded runner does, so
/verbose onyields the full interleaved record (💬commentary +🛠️tool summaries) on both backends and in both streaming modes.What it does
Commit 1 — core (
dispatch-from-config):kind:"preamble"item events (inter-tool commentary — emitted by the Claude CLI parser since feat(cli): emit commentary progress events from Claude CLI parser #89834, and natively by codex) are delivered as standalone💬progress messages through the same delivery/guard chain as tool summaries (shouldSendToolSummaries, progress-delivery suppression, late-text drop, route-to-originating vs dispatcher).itemIdis buffered so snapshot-style producers send one message per commentary block; the buffer flushes when the producer moves on (next item, tool start/result, block reply, final reply) and always drains before the final answer. Retractions (empty text for a buffered item) drop unsent blocks.commentaryProgressEnabled: true), so inter-tool narration routes to the commentary lane instead of being folded into the final answer text — the answer message stays purely the final answer.onVerboseProgressVisibilityreply option: dispatch hands the channel a live getter for whether the durable verbose lane is active.Commit 2 — telegram:
Commit 3 — CLI tool summaries:
formatToolAggregatesummaries the embedded runner emits (args-derived meta captured at tool start; output block under/verbose full), delivered through each runner's existing tool-result route.Commit 4 — discord (review finding):
Result (with
/verbose on)off💬commentary +🛠️tool messages in real time, clean answer lastprogress💬+🛠️record; draft keeps live tool status; answer lastReal behavior proof
/verbose onhad no durable commentary record. After this PR, commentary lands as standalone durable💬progress messages in both streaming modes.pnpm openclaw gateway) in a Linux container with a desktop; real Telegram bot + real Telegram account on Telegram Desktop; realclaude-cli/claude-opus-4-8backend (live Anthropic API); tdlib user-driver sending real DMs. The baseline build (merge-base050c0813b39) ran in the identical setup for before/after./new, then a DM asking the agent to rundate -uanduname -avia exec, narrating before each command; repeated withstreaming.mode: "off"andstreaming.mode: "progress"(agents.defaults.verboseDefault: "on"in both); screen-recorded Telegram Desktop throughout. This recreates the Mantis telegram-desktop proof flow locally (same mechanism: telegram-desktop + user-driver + screen capture).[telegram] sendMessage ok chat=… message=425,message=426(commentary), thenoutbound send ok … messageId=427(final answer).progress)Baseline runs answer with no visible commentary or tool record in both modes. PR runs show the full interleaved durable record —
💬 "I'll run these one at a time…"→🛠️ date -u→💬 "Got the time. Next: uname -a…"→🛠️ uname -a— landing in real time before the clean final answer, in both streaming modes.Motion captures (drafts building, messages landing live):
after-stream-off · after-stream-on · before-stream-off · before-stream-on
Tests
dispatch-from-configcases: ordering (commentary before its tool summary; trailing commentary before final), snapshot collapse per itemId, item-transition flush, retraction drop, verbose-off passthrough (channel callback still forwarded, nothing standalone, no classification forced), visibility getter on/off.tsgoclean;dispatch-from-config(200) +followup-runner/agent-runner-execution/agent-runner-cli-dispatch+ discordmessage-handler.process— 572 tests green on Windows and the dispatch suite in-box on Linux. The one telegram-suite failure on Windows ("records streamed final replies into the prompt context cache") is byte-identical on pristine upstream/main (pre-existing environment issue).