feat(anthropic): claude-cli-interactive backend β stream reasoning via local TLS proxy#81851
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed June 9, 2026, 9:54 PM ET / 01:54 UTC. Summary PR surface: Source +2330, Tests +411, Config +32. Total +2773 across 37 files. Reproducibility: yes. for the review blockers: PR-head source inspection shows inherited proxy aliases, fail-open CONNECT routing, an undeclared Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Land only a hardened opt-in backend after fail-closed proxy routing, proxy env isolation, declared build dependencies, approved/documented SDK and config surfaces, clean rebase, and redacted current-head cross-platform Telegram/proxy proof. Do we have a high-confidence way to reproduce the issue? Yes for the review blockers: PR-head source inspection shows inherited proxy aliases, fail-open CONNECT routing, an undeclared Is this the best way to solve the issue? No. The opt-in backend is a plausible direction, but this patch is not the best merge shape until the proxy is fail-closed, env-isolated, build-clean, SDK-approved/documented, rebased, and proven in real cross-platform transport runs. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 9a1f2022b127. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +2330, Tests +411, Config +32. Total +2773 across 37 files. View PR surface stats
Security concerns:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
a654df2 to
601e44d
Compare
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
601e44d to
8582e51
Compare
|
All alerts resolved. Learn more about Socket for GitHub. This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored. |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
cd7f5eb to
a414825
Compare
|
@clawsweeper re-review |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
7c64e27 to
d40e83b
Compare
|
@clawsweeper re-review |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
β¦m-json JSONL
Extends createCliJsonlStreamingParser with three optional callbacks
(onReasoningDelta, onToolEvent) and adds an internal tool-use tracker
that watches stream_event records of type content_block_{start,delta,stop}
for tool_use / server_tool_use / mcp_tool_use blocks.
On content_block_stop, the tracker:
- injects a textual marker into assistantText via onAssistantDelta,
formatted "\n\n[HH:MM:SS] π οΈ ToolName: detail\n" (detail is the first
non-empty value from args.command|file_path|pattern|query|description|url,
truncated to 120 chars).
- emits a structured ClaudeToolEvent ({phase: "start", name, args, itemId,
sessionId, usage}) via onToolEvent for consumers that want to render the
tool call separately (dashboard, tool drawer, etc.).
- starts a 8s-interval rolling timer that repaints the tail of
assistantText as "_ <elapsed>s β <hh:mm:ss>_" until the next
text_delta, result, or finish() lands.
When a text_delta arrives while the timer is running, the timer is cleared,
the inline tick is stripped from assistantText, a \n\n separator is
inserted, and the resumed text delta is emitted.
When onReasoningDelta is provided, thinking_delta events are routed there
instead of through onAssistantDelta's legacy {thinkingDelta, thinkingText}
bridge. Existing callers that only provide onAssistantDelta keep the old
behaviour (back-compat).
Works on any backend with jsonlDialect="claude-stream-json" or providerId
"claude-cli" β i.e. both the existing claude-cli backend and the new
claude-cli-interactive backend from PR openclaw#81851.
Downstream consumer wiring (execute.ts agent-event emitters,
agent-runner-execution.ts reasoning bridge) lands in a follow-up commit.
|
@clawsweeper re-review |
|
@clawsweeper re-review |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review Pushed
Remaining: [P1] cross-platform proof β the proxy/Bun/OpenSSL path is still only proven on Windows; Linux/macOS live runs are pending. And the local-MITM design decision itself (decrypting |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review Pushed
Remaining are maintainer/operator-gated, not code defects: explicit acceptance of the opt-in local-MITM trust boundary (documented), and Linux/macOS live proof (the path is currently proven on Windows). |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
π¦π Command router queued. I will update this comment with the next step. Re-review progress:
|
|
@clawsweeper re-review Pushed two commits addressing both [P2] code findings:
Local: oxlint clean, tsgo:core clean, mitm-server tests green. Remaining items are the [P1]s β cross-platform (Linux/macOS) proof, the rebase onto current main, and maintainer acceptance of the loopback-MITM trust boundary. |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review Pushed The interactive-proxy tsconfig sets That clears all three code findings (CONNECT normalization, doctor parity, bun-types). Remaining are the [P1]s that need you/a maintainer: cross-platform (Linux/macOS) proof, the security trust-boundary acceptance, and the rebase against current main. |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
Dependency GuardThis PR changes dependency-related files. Maintainers should confirm these changes are intentional. Changed files:
Maintainer follow-up:
|
Dependency graph changes are blockedOpenClaw does not accept dependency graph changes through PRs unless a repository admin or security explicitly authorizes the current head SHA. Dependency updates are generated internally by maintainers so external PRs cannot change the resolved graph. Detected dependency graph changes:
If this PR intentionally needs a dependency graph change, ask a repository admin or member of The action will approve the current head SHA ( |
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
β¦a local TLS proxy Add a new `claude-cli-interactive` backend that spawns the Claude CLI binary behind a local TLS MITM proxy, enabling real-time streaming of reasoning/thinking tokens to channels that support interleaved progress. Key changes: - Interactive proxy server with auto-generated localhost TLS certificates - Wrapper around the CLI process that captures proxy traffic - Agent runner integration for lifecycle management and text bridging - Certificate manager with filesystem caching in OPENCLAW_STATE_DIR - Overflow recovery with scoped Read permission grants - De-duplication of CLI assistant text when reasoning bridge is active
The squash/rebase dropped the systemPromptWhen === "always" guard from the systemPromptFile and systemPromptFileArg conditions in helpers.ts and execute.ts. Without it, resumed CLI sessions skip the system prompt even when the backend declares systemPromptWhen: "always".
β¦l_use When Claude produces text -> tool_use -> text, the CLI JSONL streaming parser concatenated the second text block directly after the first without any whitespace. Now detects content_block_start type:"text" after a tool_use block and inserts a paragraph break into the accumulated assistant text.
β¦ background tasks The MITM proxy was relaying a spawned sub-agent's end_turn as the primary turn's end, so the agent reported the sub-agent's findings and stopped instead of continuing. - mitm classifier: tag sub-agent requests "subagent" via the Agent-tool signature. The primary turn always carries the `Agent` (Task) tool; sub-agents (websearch, research/Explore Task) are spawned without it. A tool-bearing normal/tool_followup request lacking `Agent` (or declaring server-side web_search) is a sub-agent. A max_tokens retry of the primary still carries `Agent`, so the primary is never mis-tagged. - wrapper: handle "subagent" like compaction-plus β text_delta -> thinking_delta (reasoning lane, never a new reply), message_start/message_stop suppressed, while tool_use content_block_start/input_json_delta/content_block_stop are forwarded so the tool rows render interleaved with the thinking. The primary's continuation (a later tool_followup carrying `Agent`) is the user-facing reply. - wrapper: set CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1 in claudeEnv. The tty-spoof makes claude believe it has a TTY (subscription mode), re-enabling the background-task feature that headless `-p` disables; its follow-up polls would surface as spurious extra turns.
β¦ignal) Replace the single-`Agent`-tool heuristic with an exported, unit-tested classifyRequest. Layered, first-decisive-wins: positive primary detection via a configurable spawner-tool set plus a conservative structural matcher (so a renamed/disguised Task spawner is still recognized as the primary); keep the web-search sub-agent catch; gate the by-absence rule on a spawner having been seen this run, so a deny-listed-Agent run can't be mis-suppressed into a hang. Add request-classifier.test.ts (12 cases) covering each layer.
|
π¦π§Ή I asked ClawSweeper to review this item again. Re-review progress:
|
Summary
New opt-in
claude-cli-interactivebackend. Each turn spawnsbun wrapper.tswhich boots a loopback HTTPS MITM proxy on 127.0.0.1, taps every Anthropic API SSE event, and re-emits them asclaude -p --output-format stream-jsonJSONL records. This lets Claude run in interactive (subscription) mode while preserving the streaming surface that reasoning and tool-use rendering depend on.Why: Starting June 15, 2026,
claude -p(headless/programmatic) mode draws from a separate fixed monthly credit pool billed at API rates. The interactive backend avoids this by keeping Claude in subscription mode.Scope boundary
claude-clibackend untouched β shared gate functions extended viaisClaudeCliCompatibleBackend(provider)to recognise both backend IDsinheritUserConfigFromwith-p-mode flags strippedNODE_EXTRA_CA_CERTSscoped to spawned child only)api.anthropic.comCONNECT tunnels pass through unmodifiedKey files
extensions/anthropic/src/interactive-proxy/wrapper.ts,mitm-server.ts,cert-manager.ts,tty-spoof.cjsextensions/anthropic/src/cli-backend-interactive.ts,normalizeConfigsrc/agents/cli-backends.ts,provider-id.ts,model-runtime-aliases.tssrc/auto-reply/reply/agent-runner-execution.ts(reasoning bridge dedup)Security notes
~/.openclaw/proxy-certs/at 0700/0600Test plan
cli-backend-interactive.test.tsβ 5 cases coveringnormalizeConfigbranchesLive proof (2026-05-29, current head)
Gateway startup:
Turn dispatch (interactive proxy + session resume):
Telegram β interactive proxy streaming reasoning + tool events: