feat(telegram): interleave CLI tool-progress + reasoning + rolling timer in one Telegram message#82285
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 26, 2026, 4:48 PM ET / 20:48 UTC. Summary PR surface: Source +646, Tests +20. Total +666 across 7 files. Reproducibility: yes. for the blocking delivery issues by source inspection: PR head emits tool progress into assistantText and also emits a structured stream:"tool" event consumed by Telegram onToolStart. I did not run a live Telegram reproduction in this read-only review. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance: Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Land after structured tool events are the single source for Telegram tool-progress delivery, CLI thinking-delta scope is either implemented or removed, and current-head Telegram proof shows interleave, timer ticking, cleanup, and no duplicate answer text. Do we have a high-confidence way to reproduce the issue? Yes for the blocking delivery issues by source inspection: PR head emits tool progress into assistantText and also emits a structured stream:"tool" event consumed by Telegram onToolStart. I did not run a live Telegram reproduction in this read-only review. Is this the best way to solve the issue? No for the current patch. The UX direction is useful, but the maintainable fix is to keep one tool-progress delivery source per consumer and either support native CLI thinking_delta records or remove the dead reasoning routing from this PR. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 80655fe9552f. Label changesLabel justifications:
Evidence reviewedPR surface: Source +646, Tests +20. Total +666 across 7 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
b7a08fa to
afc942f
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
… subscription path
New `claude-cli-interactive` backend that runs `claude` without `-p`
under a per-turn `bun` wrapper. Bypasses the headless mode that
Anthropic's June 15, 2026 split moves onto the separate programmatic
credit pool — every event flows through a loopback HTTPS MITM proxy
that taps the API SSE stream and re-emits it as
`claude -p --output-format stream-json` JSONL records on the
wrapper's own stdout. Interactive subscription stays in play.
Backend wiring
- `extensions/anthropic/cli-backend-interactive.ts`: registers the
new backend (`claude-cli-interactive`); inherits user-configured
`claude-cli` overrides via `inheritUserConfigFrom` with `-p`-mode
flags stripped; normalizeConfig classifies inherited-vs-direct-
override using the user-configured claude-cli.command in
OpenClawConfig (no more basename heuristic). serialize:true to
queue same-session resumes. maxPromptArgChars: 30k (Windows
CreateProcess hard limit) / 200k (Unix ARG_MAX).
- `extensions/anthropic/interactive-proxy/wrapper.ts`: per-turn
process owning the proxy, cert lifecycle, claude spawn,
request-source classification routing, and the `result` record
on `end_turn`. Heartbeat ping every 30s of stdout silence so the
gateway keeps the Telegram typing indicator alive through long
tool-execution gaps. Spilled-prompt recovery: read overflow off
wrapper's stdin, write to a per-run subdir at
`${tmpdir()}/openclaw-interactive-proxy/run-<32-hex>/` (0700) +
file 0600, inject `--add-dir <perRunDir>` so claude's Read tool
resolves the path; per-run dir isolates concurrent runs.
- `extensions/anthropic/interactive-proxy/mitm-server.ts`: two-stage
proxy (CONNECT → loopback TLS terminator). Classifies each
outbound `/v1/messages` request body into `auxiliary` (no tools)
/ `tool_followup` (last msg has tool_result) / `compaction`
(compact.ts prompt markers) / `normal`. Tags every emitted SSE
event with `_reqId` + `_requestType`. Synthetic `api_error` event
on non-SSE 4xx/5xx so the wrapper exits nonzero and the CLI runner
triggers failover. Both stages bind port 0.
- `extensions/anthropic/interactive-proxy/cert-manager.ts`: lazy
CA + leaf cert generation under `~/.openclaw/proxy-certs/`. CERT
dir created at 0700 BEFORE openssl writes keys so there's no
world-readable window. Key files hardened to 0600 with fail-closed
regeneration if hardening fails.
- `extensions/anthropic/interactive-proxy/tty-spoof.cjs`: NODE_OPTIONS
preload — minimal TTY-property + tty.isatty patches that keep
claude in interactive mode under piped stdio.
Routing (wrapper)
- `auxiliary` → drop entire stream (haiku title-gen / classifier /
skill-search side-calls; OpenClaw injects MCP tools so the real
user turn always has tools).
- `compaction` → rewrite text_delta as thinking_delta so the
summary surfaces as model reasoning, not as the user's reply.
- `tool_followup` + `normal` → emit events through; emit `result`
record on `end_turn`. Belt-and-braces response-content fingerprint
(`<analysis>` + Primary-Request-and-Intent + Pending-Tasks)
catches anything that slipped past request-level classification.
Framework
- `src/agents/cli-backends.ts`: propagate `inheritUserConfigFrom`
through `resolveSetupCliBackendPolicy` so cold-setup / live-test /
fallback paths consult the same inheritance metadata as the
registered-backend path. Lookup now uses
`registered?.inheritUserConfigFrom ?? fallbackPolicy?.
inheritUserConfigFrom`.
- `src/plugins/cli-backend.types.ts`: declare
`inheritUserConfigFrom?` on `CliBackendPlugin`.
- `src/agents/command/attempt-execution.ts`: fallback session
binding uses `params.originalProvider` rather than the hardcoded
"claude-cli" id.
Scope boundary
- Existing `claude-cli` backend is untouched (its predicate
`liveSession === "claude-stdio"` continues to match it alone).
- Default backend selection is unchanged; new backend is opt-in
via the `claude-cli-interactive` id.
- Tool-use rendering / waiting-on-tool injection lands in openclaw#82285,
not here — this PR establishes the `stream_event` surface that
builds on.
(cherry picked from commit f4b9491)
(cherry picked from commit 724e260194516a0798620a9a876b91d120fa331d)
(cherry picked from commit 0e2a13e3743934efe4757393fb7dee785b9ed838)
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
No dependency changes detected. Learn more about Socket for GitHub. 👍 No dependency changes detected in pull request |
…lling-timer message Completes the rendering side of openclaw#82285's "tool-use injection + rolling-timer indicator". cli-output now emits structured stream:"tool" agent events (previous commit refactored away the in-text injection); this commit ports the live dist patch that consumes those events and renders them into one continuously-updating Telegram message in the reasoning lane. New state, scoped alongside splitReasoningOnNextStream / draftLaneEventQueue: - interleavedOutput — cumulative body (italicized reasoning chunks + timestamped tool/item/plan/approval/command-output/patch lines) - rawReasoningCheckpoint — index in the upstream reasoning text we've already mirrored, so re-deliveries don't duplicate - activeTimerSuffix / activeTimerInterval / activeToolStartTime — rolling `\n_Ns — HH:MM:SS_` timer state painted between tool starts New helpers: - clearActiveTimer — clears the interval and resets the suffix - formatTimerClock — local HH:MM:SS - updateInterleavedDisplay — writes "Thinking\n\n" + body + timer suffix into the reasoning lane stream (marks hasStreamedMessage so the lane isn't garbage-collected as empty) - startToolTimer — setInterval(3000) that repaints + flushes the lane so the timer actually rolls past the typing-indicator throttle. Catches flush errors so a torn-down lane doesn't crash the interval. - injectToolLineIntoInterleave(line, { startTimer? }) — enqueues onto the existing draftLaneEventQueue, appends `\n[HH:MM:SS] {line}\n`, optionally restarts the timer, updates the display, and flushes. Returns false when the reasoning lane isn't active so callers can fall back to the legacy per-tool channel-progress message. Callback integration: - onReasoningStream — replaced the ingestDraftLaneSegments-based path with a checkpoint-driven append into interleavedOutput. Each new portion of the upstream reasoning text gets italicized line-by-line and appended, the active timer is cleared, the lane is marked finalized AFTER the append (so post-run cleanup keeps the message instead of clearing it), and reasoningStepState bookkeeping fires. - onReasoningEnd — sets reasoningLane.finalized = true when hasStreamedMessage is set, plugging the same cleanup-deletes-message hole on the no-final-reasoning-burst path. - onToolStart — tries injectToolLineIntoInterleave first with startTimer:true; falls back to pushStreamToolProgress when no reasoning lane is active (room events, suppressed reasoning level, no streaming). Doesn't yet wire injectToolLineIntoInterleave into onItemEvent / onPlanUpdate / onApprovalEvent / onCommandOutput / onPatchSummary — those already render via pushStreamToolProgress today and the interleave-first pattern there is a follow-up; tool-start is the high-frequency signal that drives the rolling-timer UX. (cherry picked from commit 1346e0b630db9c96dd4f3700e7f7a498671558d3) (cherry picked from commit 60e90e7067628d37505801b0aedd220968da0079)
…vedOutput is set Mirrors a local hot-patch the runtime added today: when a turn ends in a path that doesn't fire onReasoningEnd (turn aborted, error mid-reasoning, supersession), the lane.finalized check in the dispatch finally block falls into the stream.clear() branch and the interleaved message is deleted from Telegram. Guard the lane-iteration loop with a forced finalized=true when interleavedOutput has content AND hasStreamedMessage is set — guarantees the visible message survives any non-onReasoningEnd exit path. (cherry picked from commit c958dc8b25acec95acef59794117199c872b03a4) (cherry picked from commit 45725d2a18ae0a95bfd955046013376e5d7db842)
The previous commit ported the dist hot-patch verbatim, including its explicit `await reasoningLane.stream.flush()` after each 3s timer tick and each tool-line injection. That flush was belt-and-braces — the draft-stream loop (createDraftStreamLoop) already self-pumps on update(): when `now - lastSentAt >= throttleMs` the next update fires startBackgroundFlush() immediately. The tool-timer ticks every 3000ms and DEFAULT_THROTTLE_MS is 1000ms, so every tick is already past the throttle window and delivers on its own. The explicit flush() actively hurt us during Telegram backpressure: a hard send-now bypasses the natural retry-after gating in the send path, making rolling timers contribute to 429 storms instead of riding the existing rate-limit handling. Dropping it lets Telegram's per-stream throttle govern delivery cadence and the existing 429 retry-after machinery in network-errors.ts gate us automatically. User-visible effect: timer continues to roll every ~1-3s (gated by the existing throttleMs), but during 429 backoff both the tick and the underlying send pause together — which is the correct behaviour. (cherry picked from commit 772444a608288e6cf71e67f9dfe1bc329ff9cbf5) (cherry picked from commit 967fe511e17144d5d9de9a88902ae6da62787621)
…eper P1s Two findings from the most recent ClawSweeper pass on openclaw#82285: 1. P1 — src/agents/cli-runner/execute.ts: `normalizeVerboseLevel`, `getAgentRunContext`, and `loadSessionEntryByKey` were left as imports when the previous commit (7b1e440) stripped the `shouldInjectToolInlineMarkers` resolver they fed. `tsconfig.core.json` has `noUnusedLocals: true`, so this fails typecheck before any runtime path can be exercised. Strip the three imports. 2. P1 — extensions/telegram/src/bot-message-dispatch.ts: `startToolTimer()` creates a `setInterval(3000ms)` repeating handle. The terminal cleanup path stops or clears the lane draft streams but never calls `clearActiveTimer()`, so a turn that ends mid-tool (or is superseded while a tool is running) leaks the interval for the rest of the process — every 3s it wakes to repaint a torn-down lane. Add `clearActiveTimer()` at the top of the dispatch `finally` block so it fires regardless of which exit path the turn takes. (cherry picked from commit 751db41387008467e71e3a636c74687f980e459e) (cherry picked from commit 83de5a7ff1fd14bafcedd290853f16b4106843b5)
(cherry picked from commit c292a6e1e39710dbe5971ceabdf6f3199f399ea8)
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…aude-stream-json backends
Tool calls during a Claude turn now render as an in-stream textual
marker (with a rolling 8s timer indicator) AND a structured
`stream: "tool"` agent event. Without this, the assistant text
stream just pauses for however long the tool runs — often tens of
seconds on `Read` / `Bash` / `Grep` / `WebFetch` — and the
end-user UI freezes with no signal about which tool is running.
Gated on `usesClaudeStreamJsonDialect({backend, providerId})` —
matches `backend.jsonlDialect === "claude-stream-json"` or
`providerId === "claude-cli"`. Non-Claude backends are unaffected.
Wiring
- `src/agents/cli-output.ts`: `createClaudeToolUseTracker` closure
watches `content_block_start` / `content_block_delta`
(`input_json_delta` partial-JSON accumulation) /
`content_block_stop` for `tool_use` / `server_tool_use` /
`mcp_tool_use` blocks. On block-stop, sanitizes args via
`sanitizeToolArgs`, paints an inline marker through the existing
`onAssistantDelta` path, and arms a `setInterval` keepalive that
refreshes the rolling timer line every 8s.
- `src/agents/cli-output.ts`: new optional `onToolEvent` callback
on `createCliJsonlStreamingParser` — additive, existing callers
unaffected. Emits `{phase: "start", name, args, itemId}`.
- `src/agents/cli-runner/execute.ts` and
`src/agents/cli-runner/claude-live-session.ts`: wire the parser
callback at both Claude code paths (headless JSONL + live
session) and forward to the agent event bus as
`emitAgentEvent({stream: "tool", data: {...}})`.
- `src/gateway/live-chat-projector.ts` +
`src/gateway/server-chat.ts`: thread the `replacement` flag so
the terminal timer-strip actually replaces the painted text
rather than appending a duplicate; resolve tool-verbose at
emission time (per-session, not once at parser construction).
Cleanup contract
- `clearToolKeepalive()` runs on `result`, on `finish()`, and on
the next `text_delta` — the parser closure owns the interval
handle and clears it on every terminal path.
- `sanitizeToolArgs` truncates and redacts known-sensitive keys
before either emission path so file paths / WebFetch URLs /
Bash commands don't leak to Telegram or the agent event bus.
Scope boundary
- Gated by dialect — no impact on OpenAI / xAI / other CLI
backends.
- `onToolEvent` is opt-in: existing parser consumers that only
subscribe to `onAssistantDelta` still work.
- No tool-output / tool-result event (only tool-start); the result
is reflected in claude's next `text_delta` and the timer-strip
on resumed text handles the UX.
(cherry picked from commit 834599963a2c0bcb5c1ed400d82ffa48f17a86e5)
(cherry picked from commit adbf73208065d6bc36085c62ed0e716427217b7a)
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…oning + drop stale thinking buffer
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…ext replace contract ClawSweeper flagged that the assistant-stream events carried a bespoke `replacement: true` field while the live-chat merger only honours `replace`. The field was never consumed: the CLI assistant bridge forwards full text (it does not append deltas), so `resolveDraftPartialText` already replaces by `text` and the rolling-timer terminal cleanup wins on the existing contract. Remove the parallel field end-to-end (CliStreamingDelta type, the two execute.ts emit sites, and the tick-cleanup emitter) instead of wiring a second replacement vocabulary. Cleanup behaviour is unchanged.
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…+ dedupe reasoning header - Restore the rolling-timer terminal-cleanup signal on the existing live-chat `replace` contract (cli-output -> execute agent event) and forward it through server-chat's emitChatDelta into resolveMergedAssistantText, so a shorter timer-stripped prefix replaces the buffer instead of being kept as a stale rollback in live chat / Control UI. Adds merge coverage for both the kept-prefix and replaced-prefix cases. - Interleaved reasoning display: store only the formatted body, not the full formatReasoningMessage output, so updateInterleavedDisplay's "Thinking" header is no longer duplicated.
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Superseded by #87072. That PR reimagines this work as an opt-in, projection-only Telegram interleaved progress lane built on the existing structured-event contract (no contract change), default-off, with unit tests — replacing this branch's bundled approach. Closing in favour of the cleaner split. |
Summary
Read/Bash/Grep/WebFetch) the user sees a frozen indicator and abandons a turn that's still in flight.stream-jsonJSONL parser now spotstool_use/server_tool_use/mcp_tool_useblocks and emits astream:"tool"agent-event ({phase, name, args, toolCallId}), whichfollowup-runneralready turns intoonToolStart.[HH:MM:SS] 🛠️ ToolName: detail+ a rollingNs — HH:MMtimer) instead of a separate channel-progress message, falling back to the legacy per-tool message when the reasoning lane isn't active.stream-jsondialect — non-Claude CLI backends untouched. Reuses upstream'sfollowup-runner(stream:"tool"→onToolStart) rather than adding a new consumer. No change to tool execution; the embedded path is unaffected (it already emits tool events upstream).Real behavior proof
[HH:MM:SS] 🛠️ ToolNamelines + a rolling elapsed timer in one updating message, instead of a silent gap or a separate progress message.unset NODE_OPTIONS && pnpm exec vitest run extensions/telegram/src/bot-message-dispatch.test.ts; then a live Telegram turn that invokes one or more tools; observe the interleaved tool lines + the rolling timer in the single reasoning message, and the timer stripping when assistant text resumes.[HH:MM:SS] 🛠️ …line is oneonToolStartinjected into the reasoning draft.[HH:MM:SS] 🛠️ ToolName: detaillines inside the reasoning message with the timer ticking; when assistant text resumes the timer line is stripped and text continues from a clean boundary.bot-message-dispatchinterleave unit test is the planned guardrail to land with this change.Change Type
Scope
Linked
onToolStartsurface thatfollowup-runnerdrives from thestream:"tool"agent-event bus — no new event surface introduced here.Risks and Mitigations
setIntervalleaks if a turn ends without a following assistant delta (e.g. abort mid-tool)._Ns — HH:MM_line could confuse a strict-Markdown consumer.Security Impact
Compatibility / Migration