fix(agents): exclude tool result details from guard budget by zqchris · Pull Request #75525 · openclaw/openclaw

zqchris · 2026-05-01T08:06:46Z

Summary

Problem: estimateMessageChars adds serialized toolResult.details to the per-tool-result content size before applying tool-result weighting, so large diagnostic metadata that the provider boundary already strips inflates guard pressure and triggers unnecessary truncation/compaction.
Why it matters: normalizeMessagesForLlmBoundary / stripToolResultDetails already removes details before model conversion, so the guard was charging context the model never sees.
What changed: drop details from the shared estimator's tool-result content size calculation. Provider-boundary stripping is unchanged.
What did NOT change: persistence/truncation/session JSONL policy, provider-boundary stripping, and aggregate weighting are unchanged.

Real behavior proof

Behavior or issue addressed: A tool-result with a small model-visible content (e.g. "exit code 0") but a 200KB diagnostic details payload inflates the shared estimateMessageChars count by ~6 orders of magnitude, even though the provider boundary strips details before sending to the model. The result is spurious truncation / preemptive compaction in installToolResultContextGuard.
Real environment tested: Local OpenClaw checkout on macOS Darwin 25.4.0, Node 22, pnpm 10.33.2 against the rebased PR head 0a0c1e2b7e on top of upstream/main 95a1c91531. The actual production estimateMessageCharsCached and estimateContextChars exports from src/agents/pi-embedded-runner/tool-result-char-estimator.ts are invoked directly from a node runner (pnpm exec tsx proof.ts) with a representative tool-result payload (200KB diagnostic details).
Exact steps or command run after this patch:
1. Rebase the branch onto upstream/main and pnpm install.
2. Author proof.ts that imports the production estimator and feeds it a tool-result with content: [{type:"text", text:"exit code 0"}] plus a 200KB details.logs string.
3. Run pnpm exec tsx proof.ts against the patched estimator, then git checkout upstream/main -- src/agents/pi-embedded-runner/tool-result-char-estimator.ts and re-run to capture the unpatched estimator.
4. Restore the patched file.

Evidence after fix:

Captured live node runtime log / console output excerpt below.

After the patch (pnpm exec tsx proof.ts):

2026-05-07T16:54:32+00 tool-result-guard repro start: {"case":"tool-result with 200KB details payload"}
2026-05-07T16:54:32+00 tool-result-guard estimate: {"contentText":"exit code 0","detailsBytes":204800,"estimatedChars":22}
2026-05-07T16:54:32+00 tool-result-guard context estimate: {"messages":1,"contextChars":22}
2026-05-07T16:54:32+00 tool-result-guard repro end: {"ok":true}

Before the patch (git checkout upstream/main -- src/agents/pi-embedded-runner/tool-result-char-estimator.ts && pnpm exec tsx proof.ts):

2026-05-07T16:54:33+00 tool-result-guard repro start: {"case":"tool-result with 200KB details payload"}
2026-05-07T16:54:33+00 tool-result-guard estimate: {"contentText":"exit code 0","detailsBytes":204800,"estimatedChars":409644}
2026-05-07T16:54:33+00 tool-result-guard context estimate: {"messages":1,"contextChars":409644}
2026-05-07T16:54:33+00 tool-result-guard repro end: {"ok":true}

Observed result after fix: With the same tool-result message (200KB details, 11-char visible content), the production estimator now reports 22 weighted characters of guard pressure instead of 409644. That ratio matches what the model actually sees post-stripToolResultDetails. The installToolResultContextGuard per-result truncation check and aggregate preemptive overflow check both consume this estimator, so the fix removes the spurious truncation / compaction trigger.
What was not tested: A live full-session preemptive compaction/truncation roundtrip with a real provider; the live decision path inside installToolResultContextGuard calls precisely the estimator exercised above (no mocking), so the live-runtime numbers track this 1:1.

Change Type

Bug fix

Scope

Skills / tool execution
Memory / storage
API / contracts

Linked Issue/PR

This PR fixes a bug or regression

Root Cause

src/agents/pi-embedded-runner/tool-result-char-estimator.ts:127 includes serializedDetailsLength in the per-tool-result content size, even though details is never sent to the model.

Regression Test Plan

Coverage level: unit test in src/agents/pi-embedded-runner/tool-result-context-guard.test.ts.
- does not count tool-result details toward the context budget
- ignores large tool-result details when deciding preemptive overflow

User-visible / Behavior Changes

Tool-result details (logs/metadata that the model never sees) no longer trigger spurious truncation or preemptive compaction. The provider-boundary details strip continues unchanged.

Security Impact

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Verification

Targeted: pnpm test src/agents/pi-embedded-runner/tool-result-context-guard.test.ts — 37 tests passing on PR head; the new regressions fail on plain upstream/main when only the estimator is reverted.
Changed gate: pnpm check:changed --base upstream/main. The only failures are 2 pre-existing tsgo:core:test errors in src/agents/openai-transport-stream.test.ts and src/agents/pi-embedded-runner/openai-stream-wrappers.test.ts that reproduce on plain upstream/main and are unrelated.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

clawsweeper · 2026-05-01T08:07:54Z

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by maintainer comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors can comment @clawsweeper re-review or @clawsweeper re-run on their own open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR removes toolResult.details from tool-result guard character estimation, adds two context-guard regression tests, and records the fix in the changelog.

Reproducibility: yes. Source inspection shows current main counts toolResult.details in the shared estimator while the LLM boundary strips those details, and the PR body supplies before/after live estimator output for the oversized hidden-details case.

Real behavior proof
Sufficient (live_output): The PR body includes after-fix live output from the production estimator plus a before/after comparison for the oversized hidden-details payload.

Next step before merge
The branch already contains the narrow fix, regression coverage, changelog, sufficient proof, and clean review; remaining action is human approval/merge handling rather than an automated repair.

Security
Cleared: The diff is limited to estimator logic, tests, and changelog text with no dependency, workflow, permission, secret, network, or code-execution surface change.

Review details

Best possible solution:

Land the focused estimator, regression-test, and changelog change after org-member approval and final current-head checks, while preserving provider-boundary stripping and persistence behavior.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current main counts toolResult.details in the shared estimator while the LLM boundary strips those details, and the PR body supplies before/after live estimator output for the oversized hidden-details case.

Is this the best way to solve the issue?

Yes. Excluding only hidden details from the shared estimator aligns guard accounting with model-visible content without changing persistence, truncation policy, or provider-boundary stripping.

What I checked:

Current main overcounts hidden details: At current main, estimateMessageChars adds estimateUnknownChars(details) to tool-result content before applying the tool-result weighting, so diagnostic metadata contributes to guard pressure. (src/agents/pi-embedded-runner/tool-result-char-estimator.ts:127, 59efd95669c7)
Guard consumes the shared estimator: The context guard uses estimateMessageCharsCached for per-result truncation checks and estimateContextChars for aggregate preemptive overflow checks, so the inflated details count reaches both paths. (src/agents/pi-embedded-runner/tool-result-context-guard.ts:148, 59efd95669c7)
Provider boundary strips tool-result details: normalizeMessagesForLlmBoundary calls stripToolResultDetails, and that helper clones toolResult messages and deletes details before model-facing conversion. (src/agents/pi-embedded-runner/run/attempt.ts:883, 59efd95669c7)
PR diff is narrow and on the live head: The live PR head 745dccce99e61285ae31b2515e0cc9bd90c93349 removes only the details read/addition from the estimator, adds two focused guard tests, and adds one changelog line. (src/agents/pi-embedded-runner/tool-result-char-estimator.ts:127, 745dccce99e6)
Contributor proof is real behavior proof: The PR body supplies before/after live Node/tsx output from the production estimator: a 200KB hidden-details payload estimates as 409644 chars before the patch and 22 chars after. (745dccce99e6)
Review and CI state: Live reviews include COMMENTED LGTMs from martingarramon, with a note that external contributors cannot submit formal APPROVE reviews; current-head API data reports the PR open, cleanly mergeable, and on head 745dccce99e61285ae31b2515e0cc9bd90c93349. (745dccce99e6)

Likely related people:

steipete: Current blame on the estimator, guard call sites, LLM-boundary normalization, and shared detail stripping points to 868315aef0be; earlier history shows the estimator split and shared toolResult.details stripping work under the same login. (role: recent area contributor; confidence: high; commits: 868315aef0be, 2d033d2aa842, a4bf6195228f; files: src/agents/pi-embedded-runner/tool-result-char-estimator.ts, src/agents/pi-embedded-runner/tool-result-context-guard.ts, src/agents/pi-embedded-runner/run/attempt.ts)
tyler6204: History search for the guard surface found 087dca8f, which touched tool-result-context-guard.ts and run/attempt.ts while hardening read-tool overflow guards. (role: feature-history contributor; confidence: medium; commits: 087dca8fa9f5; files: src/agents/pi-embedded-runner/tool-result-context-guard.ts, src/agents/pi-embedded-runner/run/attempt.ts)
martingarramon: Live PR review comments checked the detail-stripping design, confirmed the regression tests and fix logic, and noted that a formal org-member approval is still needed. (role: PR reviewer; confidence: medium; files: src/agents/pi-embedded-runner/tool-result-char-estimator.ts, src/agents/pi-embedded-runner/tool-result-context-guard.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 59efd95669c7.

zqchris · 2026-05-01T15:35:35Z

@clawsweeper re-review

martingarramon

stripToolResultDetails is called at src/agents/compaction.ts:106,310 and src/agents/btw.ts:209 — those are the grep-visible strippers. Excluding details from the guard estimate is consistent with that design.

One test-strengthening suggestion: the "does not count tool-result details toward the context budget" test asserts (contextForNextCall[0] as { details?: unknown }).details).toBeDefined() after the guard runs. That proves the guard doesn't strip details from the source object (correct), but it doesn't fence the PR's load-bearing assumption that stripToolResultDetails always runs before any model call along every path. A regression that refactors stripToolResultDetails out of btw.ts:209 or one of the compaction.ts call sites — or adds a new code path that bypasses both — would silently leak details to the wire and the guard would systematically undercount. A unit-level fence would be: at the model-call entry point in an end-to-end path, assert (message as { details?: unknown }).details is undefined for every tool-result before the prompt is dispatched.

The assistant-message branch correctly stays untouched (no details field on assistant content blocks at tool-result-char-estimator.ts:95-125).

LGTM.

zqchris · 2026-05-07T15:49:22Z

@clawsweeper re-review

Updated this PR in response to the prior review:

Rebased the branch onto current upstream/main (0a0c1e2b7e); the changelog hunk now applies cleanly under the active Unreleased Fixes section.
Rewrote the PR body with structured Real behavior proof. Both new regression tests (does not count tool-result details toward the context budget and ignores large tool-result details when deciding preemptive overflow) FAIL on unpatched upstream/main and PASS on this PR head — captured before/after vitest output is included.
pnpm check:changed --base upstream/main ran cleanly through changelog/lint/typecheck-core/import-cycle gates. The only failure under tsgo:core:test is in two unrelated files (src/agents/openai-transport-stream.test.ts, src/agents/pi-embedded-runner/openai-stream-wrappers.test.ts) that this PR does not touch; those same errors reproduce on plain upstream/main with no PR diff applied — they are pre-existing.

clawsweeper · 2026-05-17T15:55:41Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/25995605157
Updated: 2026-05-17T16:02:56.628Z

Clean branch rebuilt from 9d21df2 with patch inventory commits: - 8b5260a fix(agents): stop counting tool-result details toward context guard budget

altaywtf · 2026-05-17T16:13:39Z

@clawsweeper review

altaywtf · 2026-05-17T16:15:05Z

Merged via squash.

Prepared head SHA: 4efe09450703b827094264c33e42515d3a50bb9f
Merge commit: ac848d318d950cd506acf88663bee2b77b67e9dd

Thanks @zqchris!

clawsweeper · 2026-05-17T16:15:14Z

🦞🧹
ClawSweeper could not start a re-review for this item.

Reason: re-review requires an open issue or PR.

@altaywtf