fix: stabilize code-mode follow-up tool display and replay by jalehman · Pull Request #80663 · openclaw/openclaw

jalehman · 2026-05-11T13:31:54Z

Summary

Problem: code-mode follow-up turns could hide bridged inner tool identity and leave strict provider replay exposed to missing or delayed tool-result pairs.
Why it matters: follow-up runs could show generic wrapped tool labels, miss embedded Tool Search controls, or replay corrupted session histories after code-mode/tool-search activity.
What changed: renders bridged tool calls using native inner-tool labels, keeps Tool Search controls independent/exposed in embedded runs, preserves delayed tool results across display turns, repairs persisted missing code-mode results, and hardens provider replay/tool-call ID handling.
What did NOT change (scope boundary): this does not change plugin permission policy, provider auth, or public tool execution capability; the residual lossless-claw copied-transcript repair warning is handled separately in lossless PR Cron jobs with wakeMode: "now" mark complete without invoking agent session #652.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Real behavior proof (required for external PRs)

Behavior or issue addressed: Code-mode/tool-search bridge events should render with the underlying tool identity, embedded Tool Search controls should be available when configured, and follow-up replay should not manufacture missing tool-result noise when real results exist or can be repaired safely. The broader Telegram verbose/progress-vs-final-answer separation is tracked separately in Telegram: keep verbose tool results separate from final answers #80294.
Real environment tested: Local OpenClaw development checkout on macOS 15 / Node 22.17.0 / pnpm 11.0.8.
Exact steps or command run after this patch:
- Rebased branch on latest upstream/main.
- Ran targeted Vitest shard set for touched Codex, OpenAI provider, gateway, agent transcript/repair, and tool-display surfaces.
- Ran repo build.
Evidence after fix:
- CI=true OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test extensions/codex/src/app-server/event-projector.test.ts extensions/openai/openai-provider.test.ts src/agents/pi-embedded-runner.guard.test.ts src/agents/pi-embedded-runner.guard.waitforidle-before-flush.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/attempt.transcript-policy.test.ts src/agents/pi-tools.create-openclaw-coding-tools.test.ts src/agents/session-file-repair.test.ts src/agents/session-transcript-repair.test.ts src/agents/tool-display.test.ts src/gateway/server-chat.agent-events.test.ts src/plugins/provider-replay-helpers.test.ts
  - passed 5 Vitest shards in 24.19s.
- git diff --check upstream/main...HEAD
  - passed with no output.
- CI=true pnpm build
  - passed, including tsdown, CLI bootstrap import guard, runtime postbuild, plugin SDK dts, plugin SDK export check, and bundled plugin asset copy.
Observed result after fix: targeted tests and build pass after rebase; embedded Tool Search enablement test passes; transcript/file repair tests pass; Codex event projector tests pass.
What was not tested: live Telegram delivery screenshot on this final rebased branch; earlier live diagnosis drove the patch, but final verification here is unit/seam/build focused.
Before evidence: prior behavior was observed as mixed Telegram progress/final output plus live-history missing-tool-result repair noise around code-mode tool calls.

Root Cause (if applicable)

Root cause: code-mode bridge projection used wrapper-level display too often; embedded runs did not pass Tool Search enablement through one callsite; transcript repair/pairing paths were too narrow around provider tool-call shapes, delayed tool results, and persisted missing-result repair.
Missing detection / guardrail: coverage existed for individual replay paths, but not for bridge projection, embedded Tool Search enablement, or persisted code-mode missing-result repair together.
Contributing context (if known): lossless-claw also carried a stale copied transcript repair implementation; that separate copied-logic issue explained some residual warning noise and is not fixed by this OpenClaw PR.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- extensions/codex/src/app-server/event-projector.test.ts
- src/gateway/server-chat.agent-events.test.ts
- src/agents/pi-embedded-runner.guard.test.ts
- src/agents/pi-embedded-runner.guard.waitforidle-before-flush.test.ts
- src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts
- src/agents/pi-tools.create-openclaw-coding-tools.test.ts
- src/agents/session-file-repair.test.ts
- src/agents/session-transcript-repair.test.ts
- src/agents/tool-display.test.ts
- extensions/openai/openai-provider.test.ts
- src/plugins/provider-replay-helpers.test.ts
Scenario the test should lock in: bridged tool events project native labels, Tool Search controls are exposed in embedded runs when configured, and tool-result repair preserves/matches real results before inserting deterministic synthetic repairs.
Why this is the smallest reliable guardrail: these tests exercise the exact projection, gateway event, embedded-runner, and transcript repair seams without broad E2E setup.
Existing test that already covers this (if any): partial coverage existed for replay repair; this PR extends targeted coverage around the missing seams.
If no new test is added, why not: N/A.

User-visible / Behavior Changes

Bridged code-mode tool calls should display as the underlying tool instead of generic wrapper labels.
Follow-up turns after code-mode/tool-search activity should be more robust against malformed or incomplete persisted session history.

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: macOS 15 / Darwin 24.6.0 arm64
Runtime/container: Node 22.17.0, pnpm 11.0.8
Model/provider: OpenAI-compatible/Codex-oriented code-mode surfaces; provider replay tests include OpenAI-compatible helpers.
Integration/channel (if any): Telegram/gateway event delivery and Codex app-server projection surfaces.
Relevant config (redacted): Tool Search enabled in embedded-runner test fixtures; no secrets printed.

Steps

Run the targeted test command listed in Real behavior proof.
Run git diff --check upstream/main...HEAD.
Run CI=true pnpm build.

Expected

Targeted tests pass.
Diff check reports no whitespace/conflict-marker issues.
Build completes successfully.

Actual

Targeted tests passed: 5 Vitest shards in 24.19s.
Diff check passed with no output.
Build completed successfully.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios: targeted tests for Codex event projection, gateway agent events, embedded Tool Search enablement, transcript/file repair, tool-display rendering, provider replay helpers, and build after rebase. Broader Telegram verbose/final separation remains covered by Telegram: keep verbose tool results separate from final answers #80294.
Edge cases checked: delayed tool results across display turns; persisted missing code-mode results; embedded Tool Search controls; OpenAI-compatible replay payload handling; bridged tool display labels.
What you did not verify: live Telegram screenshot on the final rebased branch.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: transcript repair could insert synthetic tool results too eagerly.
- Mitigation: repair checks for matching real results globally and tests cover missing, delayed, duplicate, and orphan result behavior.
Risk: display projection could expose wrapper/internal labels incorrectly.
- Mitigation: shared display helper and Codex/gateway/tool-display tests lock native inner-tool projection behavior.

clawsweeper · 2026-05-11T13:35:35Z

Codex review: needs real behavior proof before merge.

Summary
Review failed before ClawSweeper could summarize the requested change.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Real behavior proof
Not applicable: Real behavior proof was not assessed because the Codex review failed.

Next step before merge
Review did not complete, so no work-lane recommendation was made.

Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

What I checked:

failure reason: timeout.
codex failure detail: Codex review failed for this PR: spawnSync codex ETIMEDOUT.

Likely related people:

unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)

Remaining risk / open question:

No close action taken because the review did not complete.

Codex review notes: model gpt-5.5, reasoning high; reviewed against fe0bb3083e99.

* fix: project tool-search bridge event display * fix: keep codex tool progress out of final replies * fix: preserve tool result pairs on cleanup * fix: restore tool search display target helper * fix: keep tool search controls independent * fix: render bridged tool calls like native tools * fix: abort timed out tool search bridge calls * fix: preserve code-mode tool results across display turns * fix: repair missing code-mode tool results on disk * fix: expose tool search controls in embedded runs * docs: add code-mode followups changelog * fix: update session repair agent-core import * fix: harden code-mode follow-up repair * fix: use stable session repair ids --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> # Conflicts: # src/agents/pi-embedded-runner/run/attempt.subscription-cleanup.ts # src/agents/pi-embedded-runner/run/attempt.ts # src/agents/transcript-state-repair.ts

…80663) * fix: project tool-search bridge event display * fix: keep codex tool progress out of final replies * fix: preserve tool result pairs on cleanup * fix: restore tool search display target helper * fix: keep tool search controls independent * fix: render bridged tool calls like native tools * fix: abort timed out tool search bridge calls * fix: preserve code-mode tool results across display turns * fix: repair missing code-mode tool results on disk * fix: expose tool search controls in embedded runs * docs: add code-mode followups changelog * fix: update session repair agent-core import * fix: harden code-mode follow-up repair * fix: use stable session repair ids --------- Co-authored-by: Peter Steinberger <steipete@gmail.com>

openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling extensions: openai extensions: codex size: L maintainer Maintainer-authored PR labels May 11, 2026

clawsweeper Bot added the mantis: telegram-visible-proof Mantis should capture Telegram visible proof. label May 11, 2026

jalehman self-assigned this May 11, 2026

jalehman force-pushed the buce/code-mode-followups-pr branch from e3bd15e to 731f13c Compare May 11, 2026 21:07

clawsweeper Bot removed the mantis: telegram-visible-proof Mantis should capture Telegram visible proof. label May 11, 2026

jalehman and others added 14 commits May 11, 2026 15:22

fix: project tool-search bridge event display

694ef40

fix: keep codex tool progress out of final replies

0824c31

fix: preserve tool result pairs on cleanup

ab72c02

fix: restore tool search display target helper

a6807b9

fix: keep tool search controls independent

52e82fc

fix: render bridged tool calls like native tools

3841a1c

fix: abort timed out tool search bridge calls

df98960

fix: preserve code-mode tool results across display turns

403d391

fix: repair missing code-mode tool results on disk

ad4332e

fix: expose tool search controls in embedded runs

539fb69

docs: add code-mode followups changelog

ed36298

fix: update session repair agent-core import

0982f4e

fix: harden code-mode follow-up repair

7f93d33

fix: use stable session repair ids

c14df1c

jalehman force-pushed the buce/code-mode-followups-pr branch from 731f13c to c14df1c Compare May 11, 2026 22:23

jalehman merged commit 4bfd741 into openclaw:main May 11, 2026
85 of 87 checks passed

jalehman deleted the buce/code-mode-followups-pr branch May 11, 2026 22:31

github-actions Bot mentioned this pull request May 11, 2026

📡 Upstream Digest — 2026-05-11 22:46 UTC curtismercier/openclaw-mods#834

Open

clawsweeper Bot mentioned this pull request May 12, 2026

Bug: reopen can leave unresolved tail tool calls and relies on local transcript compensation #64530

Open

clawsweeper Bot mentioned this pull request May 16, 2026

fix(sessions): drop dead-end orphan entries when retry forks parentId chain (#48810) #79635

Open

clawsweeper Bot mentioned this pull request May 24, 2026

[Bug]: 2026.5.22 Telegram direct gets generic /new fallback after successful tool turn #86184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stabilize code-mode follow-up tool display and replay#80663

fix: stabilize code-mode follow-up tool display and replay#80663
jalehman merged 14 commits into
openclaw:mainfrom
jalehman:buce/code-mode-followups-pr

jalehman commented May 11, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jalehman commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Real behavior proof (required for external PRs)

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jalehman commented May 11, 2026 •

edited

Loading

clawsweeper Bot commented May 11, 2026 •

edited

Loading