Skip to content

fix(agents): suppress raw subagent tool output#80110

Open
steipete wants to merge 2 commits into
mainfrom
maint/subagent-no-raw-output
Open

fix(agents): suppress raw subagent tool output#80110
steipete wants to merge 2 commits into
mainfrom
maint/subagent-no-raw-output

Conversation

@steipete

@steipete steipete commented May 10, 2026

Copy link
Copy Markdown
Contributor

Summary

This keeps the correct core fix from #80049: selectSubagentOutputText() must not fall through to snapshot.latestRawText when a child subagent produced only tool/toolResult output. Tool output is not user-facing completion text, so the requester session should receive (no output) or a post-compaction assistant reply instead of raw command/search output.

The replacement also fixes the gaps in #80049:

  • updates both public docs pages that described sanitized tool/toolResult fallback
  • adds direct regression coverage for readSubagentOutput()
  • strengthens announce-format e2e assertions to prove raw tool text is absent
  • adds a changelog entry with contributor and reporter credit
  • includes fresh Crabbox proof from a real Linux Testbox run

Verification

  • pnpm test src/agents/subagent-announce-output.test.ts src/agents/subagent-announce.format.e2e.test.ts src/agents/subagent-announce-delivery.test.ts src/gateway/server-methods/agent.test.ts
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/automation/tasks.md docs/tools/subagents.md src/agents/subagent-announce-output.ts src/agents/subagent-announce-output.test.ts src/agents/subagent-announce.format.e2e.test.ts
  • Crabbox/Testbox tbx_01kr83we61rvr3xjxxbtdme3x2, workflow run https://github.com/openclaw/openclaw/actions/runs/25620248580

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: Subagent completion announce could select raw child tool/toolResult content as completion text when no assistant-produced text existed. In Telegram this can surface code/search/exec dumps from the child transcript. This PR makes tool-only histories produce no selected completion output, so the existing announce fallback reports (no output) instead of leaking raw tool text.
  • Real environment tested: Blacksmith Testbox via Crabbox, Linux runner, Node 24.13.0, repo branch maint/subagent-no-raw-output synced from this PR checkout.
  • Exact steps or command run after this patch: Ran /Users/steipete/Projects/crabbox/bin/crabbox run --provider blacksmith-testbox --blacksmith-org openclaw --blacksmith-workflow .github/workflows/ci-check-testbox.yml --blacksmith-job check --blacksmith-ref main --idle-timeout 90m --ttl 240m --timing-json --shell -- '<probe tool-only child history>; oxfmt --check; targeted pnpm test ...'. The probe imports src/agents/subagent-announce-output.ts, stubs Gateway chat.history with assistant-empty + toolResult: raw grep output, calls readSubagentOutput("agent:main:subagent:child"), and fails if the result contains raw tool output or returns any non-undefined text.
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): Crabbox/Testbox terminal capture from tbx_01kr83we61rvr3xjxxbtdme3x2, workflow run https://github.com/openclaw/openclaw/actions/runs/25620248580:
{"scenario":"replacement-tool-only-subagent-output","result":null,"leaked":false}
Checking formatting...
All matched files use the correct format.
[test] passed 3 Vitest shards in 15.91s
blacksmith run summary sync=delegated command=1m2.483s total=1m4.837s exit=0

Pre-fix comparison from Crabbox/Testbox tbx_01kr83fbqn923dkp3zja0stp6d:

{"scenario":"prefix-tool-only-subagent-output","result":"raw grep output","leaked":true}
  • Observed result after fix: Tool-only subagent completion history no longer returns raw tool output from readSubagentOutput(). The Crabbox probe result is null/undefined with leaked:false, and the announce-format tests verify requester completion handoff contains (no output) and does not contain the raw tool text.
  • What was not tested: A live Telegram bot DM was not exercised. The tested path covers the runtime selector that fed raw child tool text into subagent completion announces; the separate duplicate parent progress symptom in [Bug] Subagent announce leaks raw tool output to Telegram and duplicates parent progress reply #79986 is not claimed fixed by this PR.
  • Before evidence (optional but encouraged): Pre-fix Crabbox/Testbox probe returned raw tool output with leaked:true, shown above.

Root Cause

  • Root cause: The completion-output selector kept a sanitized raw-text fallback for tool-only child transcripts, even though tool/toolResult content is not assistant-authored completion text.
  • Missing detection / guardrail: Existing e2e coverage checked formatted completion output but did not directly assert that readSubagentOutput() rejects tool-only histories or that post-compaction assistant text still wins over raw tool text.
  • Contributing context (if known): Public docs still documented tool/toolResult fallback behavior, so the selector and docs drifted together.

Regression Test Plan

  • Coverage level that should have caught this: Unit test and seam/e2e announce-format test.
  • Target test or file: src/agents/subagent-announce-output.test.ts and src/agents/subagent-announce.format.e2e.test.ts.
  • Scenario the test should lock in: Tool-only child histories produce no selected completion text; post-compaction assistant replies still produce selected text; requester handoff does not include raw tool output.
  • Why this is the smallest reliable guardrail: The bug is in the selector seam, with one e2e announce-format check to prove the user-visible fallback text.
  • Existing test that already covers this (if any): None covered the selector directly before this patch.
  • If no new test is added, why not: New tests are added.

User-visible / Behavior Changes

Subagent completion announces no longer surface raw tool output when the child produced no assistant text. They now use the existing (no output) fallback for that case.

Diagram

N/A

Based on #80049.

Co-authored-by: Blasius Patrick <blasius.patrick@gmail.com>
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 10, 2026
@clawsweeper

clawsweeper Bot commented May 10, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Summary
The PR changes subagent completion-output selection, docs, changelog, and regression tests so tool-only child histories produce (no output) instead of raw tool text.

Reproducibility: yes. Current main source and tests show the selector can return raw tool/toolResult text when no assistant text exists, and the PR body includes a pre-fix Testbox probe demonstrating the leak.

Real behavior proof
Sufficient (terminal): The PR body includes copied Testbox terminal output for the after-fix probe and a pre-fix comparison showing the raw-output leak no longer occurs.

Next step before merge
The remaining action is maintainer review/landing of an open protected-label PR, not a ClawSweeper repair job; no actionable review findings were found.

Security
Cleared: The diff does not touch dependencies, workflows, permissions, secrets, or package/install code, and it reduces exposure of raw transcript/tool output.

Review details

Best possible solution:

Land this focused selector/docs/test fix after normal maintainer review and exact-head checks, while tracking any remaining duplicate-progress behavior separately if it still reproduces.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main source and tests show the selector can return raw tool/toolResult text when no assistant text exists, and the PR body includes a pre-fix Testbox probe demonstrating the leak.

Is this the best way to solve the issue?

Yes. Returning undefined only after assistant, silent, and timeout partial-progress paths are exhausted is the narrow maintainable fix, and it preserves post-compaction assistant fallback plus bounded (no output) behavior.

What I checked:

  • Current main still returns raw tool text: On current main, selectSubagentOutputText() returns snapshot.latestRawText after assistant, silent, and timeout partial-progress paths do not produce text. (src/agents/subagent-announce-output.ts:304, 6d89bf65e08a)
  • Caller already has bounded fallback: The announce flow already uses childCompletionFindings || reply || "(no output)", so returning undefined from the selector feeds the intended bounded fallback. (src/agents/subagent-announce.ts:467, 6d89bf65e08a)
  • PR removes the raw fallback: The PR diff replaces return snapshot.latestRawText with return undefined and documents that raw tool output is not user-facing completion text. (src/agents/subagent-announce-output.ts:304, e98df82540af)
  • Regression coverage targets the leak: The PR adds direct readSubagentOutput() coverage for tool-only histories and changes announce-format e2e assertions to require (no output) and reject the raw tool text. (src/agents/subagent-announce-output.test.ts:104, e98df82540af)
  • Docs currently describe the old behavior: Current main docs still say completion result content falls back to sanitized tool/toolResult text; the PR updates both public docs pages to match the safer behavior. Public docs: docs/tools/subagents.md. (docs/tools/subagents.md:99, 6d89bf65e08a)
  • Exact-head checks are green: GitHub check-runs for PR head e98df82540afa067f16e1e8b75593dc0694b49f0 include successful check, check-docs, check-lint, check-prod-types, and agent/core test shards. (e98df82540af)

Likely related people:

  • steipete: Current blame for the selector and announce fallback path points to Peter Steinberger, and history shows prior refactoring that split the subagent announce output helpers. (role: recent area contributor; confidence: high; commits: 207bcd6b207c, 540b98b23f7b; files: src/agents/subagent-announce-output.ts, src/agents/subagent-announce.ts)
  • Tyler Yust: Subagent announce history includes multiple delivery and nested-run fixes by Tyler Yust, including descendant gating, frozen result refresh, cron retry, and extension-channel announce delivery work. (role: adjacent feature-history contributor; confidence: medium; commits: 81b93b9ce04b, 41cf93efff4d, fe57bea088c7; files: src/agents/subagent-announce.ts, src/agents/subagent-registry-lifecycle.ts, src/agents/subagent-announce-output.ts)
  • Roshan Singh: Git history identifies structured subagent announce output and run-outcome inclusion as earlier foundational work in this behavior area. (role: foundational feature contributor; confidence: low; commits: 1baa55c1456b; files: src/agents/subagent-announce.ts, src/agents/subagent-announce-output.ts)

Remaining risk / open question:

  • The linked report also mentioned duplicate parent progress delivery; this PR intentionally scopes to the raw subagent tool-output leak and should not be treated as fixing that separate symptom.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6d89bf65e08a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation maintainer Maintainer-authored PR proof: sufficient ClawSweeper judged the real behavior proof convincing. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Subagent announce leaks raw tool output to Telegram and duplicates parent progress reply

1 participant