Skip to content

fix: render WebChat message tool replies#81144

Closed
100yenadmin wants to merge 4 commits into
openclaw:mainfrom
electricsheephq:fix/webchat-message-tool-renderer
Closed

fix: render WebChat message tool replies#81144
100yenadmin wants to merge 4 commits into
openclaw:mainfrom
electricsheephq:fix/webchat-message-tool-renderer

Conversation

@100yenadmin

@100yenadmin 100yenadmin commented May 12, 2026

Copy link
Copy Markdown
Contributor

This PR restores the visible WebChat reply for Codex same-session message(action="send") calls by carrying the sanitized tool-result text into the Codex telemetry/rendering path. The safety boundary is that WebChat displays the sanitized result detail, not the raw tool arguments that may include markdown or same-session routing internals.

Summary

This is the option-2 alternative for #81109, alongside #81110. It keeps the Codex message tool path available for WebChat so the model can deliberately send the visible reply after a tool-heavy turn, but now renders only the sanitized same-session message text. That preserves the personality-restoring message tool behavior without leaking raw reasoning-tag content from original tool arguments.

  • Problem: same-session WebChat message(action="send") calls could be treated as external delivery or suppressed, and the first renderer version could have reused raw telemetry text.
  • Why it matters: the message tool is the mechanism that lets Codex recover a warm, user-facing reply after tool execution instead of ending with a sterile final answer or Sent..
  • What changed: same-session WebChat sends return status: "ok", keep semantic deliveryStatus: "sent", feed sanitized result text into Codex telemetry, and render that text through source-suppression-safe reply payloads.
  • What did NOT change (scope boundary): no duplicate Codex-native workspace tools, no Codex app-server filtering change, and no external channel delivery behavior change.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: WebChat turns configured for message_tool_only can use the message tool for the visible reply without routing through an external channel, without being suppressed by final source-reply suppression, and without rendering unsanitized reasoning-tag text.
  • Real environment tested: local OpenClaw checkout at /Volumes/LEXAR/repos/openclaw-webchat-message-renderer, branch fix/webchat-message-tool-renderer, latest patch 45e0d6de92a, using real source modules and focused Vitest coverage. This is not a full browser WebChat E2E run.
  • Exact steps or command run after this patch:
OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test src/agents/tools/message-tool.test.ts extensions/codex/src/app-server/dynamic-tools.test.ts src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-runner/run/tool-media-payloads.test.ts
pnpm exec oxfmt --check --threads=1 src/agents/tools/message-tool.ts src/agents/tools/message-tool.test.ts extensions/codex/src/app-server/dynamic-tools.ts extensions/codex/src/app-server/dynamic-tools.test.ts
git diff --check
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):
[test] passed 3 Vitest shards in 9.61s
Test Files  2 passed (2), 1 passed (1), 1 passed (1)
Tests  29 passed (29), 50 passed (50), 25 passed (25)

oxfmt: All matched files use the correct format.
git diff --check: passed

Focused regressions now assert:

same-session WebChat result.details.message === "Actual visible reply after tools."
Codex bridge telemetry.messagingToolSentTexts === ["Visible reply from Codex."]
Codex bridge telemetry target text === "Visible reply from Codex."
  • Observed result after fix: same-session WebChat message sends are bridge-successful (status: "ok", deliveryStatus: "sent"), the renderer receives source-suppression-safe payload metadata, and Codex telemetry prefers sanitized tool-result details over raw original args.
  • What was not tested: full browser WebChat DOM rendering and broad repo type/lint. Those should remain GitHub CI/Testbox work per local-resource policy.
  • Before evidence (optional but encouraged): ClawSweeper identified that raw attempt.messagingToolSentTexts could render <think>hidden</think>Visible reply even though createMessageTool sanitized its copied params. The new test locks the sanitized result path.

Root Cause (if applicable)

  • Root cause: the first renderer consumed Codex bridge messaging telemetry collected from original tool args, while createMessageTool strips reasoning tags on a copied params object before returning same-session result details.
  • Missing detection / guardrail: coverage proved the WebChat message-tool payload existed, but did not assert reasoning-tag sanitization at the Codex bridge telemetry boundary.
  • Contributing context (if known): WebChat same-session delivery is intentionally not an outbound channel send, so the safe display text must come from the sanitized tool result, not raw transport args.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/tools/message-tool.test.ts, extensions/codex/src/app-server/dynamic-tools.test.ts
  • Scenario the test should lock in: same-session WebChat message sends sanitize reasoning tags before display, and Codex dynamic bridge telemetry records sanitized result text.
  • Why this is the smallest reliable guardrail: it covers both the source sanitizer and the bridge telemetry seam without requiring a browser WebChat E2E.
  • Existing test that already covers this (if any): none before the sanitizer follow-up.
  • If no new test is added, why not: N/A, new regressions are included.

User-visible / Behavior Changes

WebChat users can receive a real visible assistant reply from the message tool path in message_tool_only mode. The rendered text is sanitized and can pass through source-reply suppression intentionally.

Diagram (if applicable)

flowchart LR
  call["message(action=send)"] --> sanitize["Message tool sanitizes visible text"]
  sanitize --> result["Tool result detail text"]
  result --> telemetry["Codex dynamic-tool telemetry"]
  telemetry --> payload["Embedded-run payload"]
  payload --> webchat["WebChat visible reply"]
  raw["Raw original args"] -. "not used for visible text" .-> telemetry
Loading
Before:
[Codex calls message tool] -> [message tool sanitizes copied params]
  -> [Codex telemetry stores raw original args]
  -> [WebChat renderer could display raw telemetry]

After:
[Codex calls message tool] -> [message tool returns sanitized same-session result]
  -> [Codex telemetry prefers sanitized result.details.message]
  -> [WebChat renderer displays sanitized visible reply]

Security Impact (required)

  • New permissions/capabilities? (Yes/No): No
  • Secrets/tokens handling changed? (Yes/No): No
  • New/changed network calls? (Yes/No): No
  • Command/tool execution surface changed? (Yes/No): No
  • Data access scope changed? (Yes/No): No
  • If any Yes, explain risk + mitigation: N/A. The security-sensitive review finding was about reasoning-tag text disclosure; the patch mitigates it by rendering sanitized result text.

Repro + Verification

Environment

  • OS: macOS local development host
  • Runtime/container: local Lexar-backed OpenClaw checkout
  • Model/provider: not model-provider dependent
  • Integration/channel (if any): WebChat same-session message tool path and Codex dynamic bridge
  • Relevant config (redacted): sourceReplyDeliveryMode: "message_tool_only", current channel webchat

Steps

  1. Run the focused message tool and Codex bridge tests.
  2. Verify same-session result details contain sanitized message text.
  3. Verify bridge telemetry records sanitized result text and classifies status: "ok" as success.

Expected

  • Same-session WebChat message send returns success semantics.
  • Renderable telemetry contains sanitized visible text only.
  • Source-suppression payload metadata survives.

Actual

  • Focused tests passed, 104 total checks across 3 Vitest shards.
  • Formatter and whitespace checks passed.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: same-session WebChat message result, Codex bridge success classification, sanitized telemetry, payload metadata preservation.
  • Edge cases checked: reasoning-tag message args, status: "ok" plus deliveryStatus: "sent", external sends remain outside this same-session path.
  • What you did not verify: browser DOM rendering and full GitHub CI matrix.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

ClawSweeper P2 addressed by 45e0d6de92a: render sanitized WebChat message-tool text.

Compatibility / Migration

  • Backward compatible? (Yes/No): Yes
  • Config/env changes? (Yes/No): No
  • Migration needed? (Yes/No): No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 12, 2026
@clawsweeper

clawsweeper Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Summary
The PR adds a same-session WebChat message-tool result path, prefers sanitized tool-result details in Codex telemetry, and marks the resulting embedded-run payloads deliverable despite source reply suppression.

Reproducibility: yes. from source inspection, but not by executing current main: current main can suppress WebChat final delivery in message_tool_only mode while Codex telemetry reads original message-tool args instead of the sanitized result details.

Real behavior proof
Needs real behavior proof before merge: The PR body and comments provide copied focused Vitest/format output, but before merge the contributor should add redacted live WebChat/Gateway proof such as a terminal log, screenshot, recording, or linked artifact; updating the PR body should trigger re-review, or a maintainer can comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
Human handling is needed to choose this renderer approach versus the open automatic-delivery alternative and to require live behavior proof beyond tests before merge.

Security
Cleared: No concrete security or supply-chain issue found; the diff adds no dependencies, permissions, secret handling, or network calls, and it reduces the reasoning-tag disclosure risk by preferring sanitized result details.

Review details

Best possible solution:

Pick one canonical fix for #81109: land this sanitized renderer path with live behavior proof, or land #81110 and close the unused alternative.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection, but not by executing current main: current main can suppress WebChat final delivery in message_tool_only mode while Codex telemetry reads original message-tool args instead of the sanitized result details.

Is this the best way to solve the issue?

Unclear as a product choice. The implementation is a maintainable sanitized renderer option, but the narrower alternative is to keep internal WebChat replies automatic via #81110.

Acceptance criteria:

  • Contributor proof should exercise a real WebChat/Gateway path in message_tool_only mode and show the sanitized visible reply, with private endpoints, tokens, and user data redacted.
  • If maintainers choose this PR, focused validation should include pnpm test src/agents/tools/message-tool.test.ts extensions/codex/src/app-server/dynamic-tools.test.ts src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-runner/run/tool-media-payloads.test.ts plus the relevant changed gate.

What I checked:

Likely related people:

  • @steipete: Recent history shows central work on message-tool delivery, message-tool-only reply guidance, and bounded Codex dynamic-tool responses in the affected paths. (role: feature owner / adjacent owner; confidence: high; commits: 5e8e77ed83eb, b62166301efd, 09baec68eac7; files: src/agents/tools/message-tool.ts, extensions/codex/src/app-server/dynamic-tools.ts)
  • @pashpashpash: Path history for the Codex app-server dynamic-tool and harness surfaces includes deferred dynamic tools, structured tool replies, and Codex runtime policy work. (role: Codex dynamic-tool area contributor; confidence: high; commits: 3f217964d1f9, 439d8edf68e2, 02fe0d8978db; files: extensions/codex/src/app-server/dynamic-tools.ts, src/auto-reply/reply/dispatch-from-config.ts)
  • @vincentkoc: Recent merged work hardened Codex harness control surfaces and sanitizer-related agent behavior near the same telemetry and message rendering boundaries. (role: adjacent owner; confidence: medium; commits: ac3cd1a0ca8c, 92d33e4de85a, 47f6a98909b5; files: extensions/codex/src/app-server/dynamic-tools.ts, src/agents/tools/message-tool.ts, src/plugin-sdk/channel-streaming.test.ts)

Remaining risk / open question:

  • The supplied proof is only focused Vitest/format output; it does not show a live WebChat browser or Gateway path rendering the sanitized reply after the fix.
  • Maintainers still need to choose between this message-tool renderer shape and the open automatic-delivery alternative at fix: keep codex webchat replies automatic #81110.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 8a6c18a08a03.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026
@100yenadmin

100yenadmin commented May 12, 2026

Copy link
Copy Markdown
Contributor Author

Addressed ClawSweeper's P2 in cf2766eb792:

  • same-session WebChat sends now return bridge-recognized status: "ok" and preserve semantic delivery state as deliveryStatus: "sent"
  • added Codex dynamic bridge coverage proving the same-session WebChat message(action="send") result is classified as success: true instead of an error

Validation run after the patch:

OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test src/agents/tools/message-tool.test.ts extensions/codex/src/app-server/dynamic-tools.test.ts src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-runner/run/tool-media-payloads.test.ts
# passed: 3 Vitest shards, 102 tests

pnpm exec oxfmt --check --threads=1 src/agents/tools/message-tool.ts src/agents/tools/message-tool.test.ts extensions/codex/src/app-server/dynamic-tools.test.ts
# passed

git diff --check
# passed before commit

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026
@100yenadmin

Copy link
Copy Markdown
Contributor Author

Follow-up on the CI failures from my previous push:

  • Fixed the extension-boundary violation by removing the direct ../../../../src/agents/tools/message-tool.js import from extensions/codex/src/app-server/dynamic-tools.test.ts.
  • Kept the proof split correctly: src/agents/tools/message-tool.test.ts verifies the real WebChat same-session message tool result now reports status: "ok" plus deliveryStatus: "sent", while the Codex extension bridge test verifies that status: "ok" message sends are classified as successful dynamic tool results.

Local validation from /Volumes/LEXAR/repos/openclaw-webchat-message-renderer:

  • OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test extensions/codex/src/app-server/dynamic-tools.test.ts src/agents/tools/message-tool.test.ts passed.
  • pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/dynamic-tools.test.ts src/agents/tools/message-tool.ts src/agents/tools/message-tool.test.ts passed.
  • pnpm run lint:plugins:no-extension-test-core-imports passed.
  • pnpm tsgo:extensions:test passed.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026
@100yenadmin

Copy link
Copy Markdown
Contributor Author

Addressed the ClawSweeper sanitizer finding in 45e0d6de92a: Codex bridge telemetry now prefers sanitized message tool result details over raw original args, and same-session WebChat message output is covered at both the message-tool and Codex dynamic bridge seams. Local proof: focused 3-shard command passed 104 tests; oxfmt --check and git diff --check passed. @clawsweeper re-review please

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 12, 2026
@100yenadmin

Copy link
Copy Markdown
Contributor Author

@pashpashpash looks like #81586 landed the broader internal-UI message-tool sink for the same WebChat/Codex issue this PR was carrying as the option-2 renderer path. All good on the final shape, but I think your clanker forgot two housekeeping bits: close this as superseded and credit me / #81144 in the changelog 😄

@steipete

Copy link
Copy Markdown
Contributor

Thanks for working on this. This WebChat/TUI current-run message-tool path has now been fixed on main by #81586, merged as 78eb92e.

I rechecked the current code path: the message tool now returns the internal UI source reply sink, Codex telemetry extracts it, and the Pi payload builder projects it back into visible WebChat/TUI reply payloads plus transcript mirroring. Since this PR is superseded by the landed broader fix, I’m closing it to keep the queue clean.

@steipete

Copy link
Copy Markdown
Contributor

Superseded by #81586, which is merged on main.

@steipete steipete closed this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex proof: supplied External PR includes structured after-fix real behavior proof. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants