fix(agents): exclude tool result details from guard budget#75525
Conversation
|
Codex review: needs maintainer review before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. Source inspection shows current main counts Real behavior proof Next step before merge Security Review detailsBest possible solution: Land the focused estimator, regression-test, and changelog change after org-member approval and final current-head checks, while preserving provider-boundary stripping and persistence behavior. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection shows current main counts Is this the best way to solve the issue? Yes. Excluding only hidden What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 59efd95669c7. |
c8b9b5b to
f72795a
Compare
|
@clawsweeper re-review |
f72795a to
277ebef
Compare
martingarramon
left a comment
There was a problem hiding this comment.
stripToolResultDetails is called at src/agents/compaction.ts:106,310 and src/agents/btw.ts:209 — those are the grep-visible strippers. Excluding details from the guard estimate is consistent with that design.
One test-strengthening suggestion: the "does not count tool-result details toward the context budget" test asserts (contextForNextCall[0] as { details?: unknown }).details).toBeDefined() after the guard runs. That proves the guard doesn't strip details from the source object (correct), but it doesn't fence the PR's load-bearing assumption that stripToolResultDetails always runs before any model call along every path. A regression that refactors stripToolResultDetails out of btw.ts:209 or one of the compaction.ts call sites — or adds a new code path that bypasses both — would silently leak details to the wire and the guard would systematically undercount. A unit-level fence would be: at the model-call entry point in an end-to-end path, assert (message as { details?: unknown }).details is undefined for every tool-result before the prompt is dispatched.
The assistant-message branch correctly stays untouched (no details field on assistant content blocks at tool-result-char-estimator.ts:95-125).
LGTM.
277ebef to
0a0c1e2
Compare
|
@clawsweeper re-review Updated this PR in response to the prior review:
|
6f8db49 to
745dccc
Compare
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
7be75a5 to
4efe094
Compare
|
@clawsweeper review |
|
Merged via squash.
Thanks @zqchris! |
|
🦞🧹 Reason: re-review requires an open issue or PR. |
* fix(gateway): clear CLI bindings on session reset * fix(gateway): preserve spawned sessions in configured lists * fix(channels): clear canonical stale routes * fix(telegram): preserve forum topic origin targets * fix(agents): skip fallback for session coordination errors * fix(agents): persist subagent registry before returning accepted (openclaw#83132) (openclaw#83238) * fix(memory): catch up stale sessions on startup (openclaw#82341) * fix(memory): preserve qmd lexical search for hyphenated queries (openclaw#81423) * fix(anthropic): preserve Claude image capability (openclaw#83756) * fix(agents): exclude tool result details from guard budget (openclaw#75525) * fix(provider): use Together video API endpoint * fix(telegram): preserve implicit default account (openclaw#82794) * fix(gateway): allow trusted-proxy local-direct password fallback (openclaw#82953) * fix(discord): return subagent thread delivery origin * fix: add missing prerequisites for upstream-ported fixes Add SessionWriteLockTimeoutError class and hasSessionWriteLockTimeout helper needed by the ported fix(agents) skip-fallback commit. Remove route property references from session-delivery.ts that don't exist in gemmaclaw's SessionEntry type. Add authorizePasswordAuth helper that was present in upstream but missing from gemmaclaw's auth.ts. * fix: remove route assertions incompatible with gemmaclaw SessionEntry Remove test assertions using .route property that exists in upstream's SessionEntry type but not in gemmaclaw's, restoring typecheck green. * fix(memory-core): yield event loop during fallback vector search (openclaw#81172) (openclaw#83758) Summary: - The branch changes memory-core fallback vector search to scan chunks in 256-row rowid batches with `setImmediate` yields, updates regression tests, and adds a changelog entry. - Reproducibility: yes. from source and supplied live output. Current main synchronously scans fallback vector ... and the PR body shows the before/after heartbeat behavior through the actual `searchVector` fallback path. Automerge notes: - PR branch already contained follow-up commit before automerge: test(memory-core): add boundary, parity, and concurrent-insert covera… - PR branch already contained follow-up commit before automerge: fix(memory-core): yield event loop during fallback vector search (#81… Validation: - ClawSweeper review passed for head 0ede3d7. - Required merge gates passed before the squash merge. Prepared head SHA: 0ede3d7 Review: openclaw#83758 (comment) Co-authored-by: NW <nitinwadhawan66@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> * fix(subagents): collect unresolved announce batches (openclaw#83701) Summary: - The PR changes collect-mode follow-up queue routing so unresolved-origin items can batch with a single resolved route and later compatible items can resume batching after a true cross-channel drain. - Reproducibility: yes. at source level: current main treats unkeyed-plus-same-keyed queue items as cross-chan ... failing path is directly visible in `src/utils/queue-helpers.ts` and `src/auto-reply/reply/queue/drain.ts`. Automerge notes: - PR branch already contained follow-up commit before automerge: Merge remote-tracking branch 'origin/main' into maint-83701-20260518 Validation: - ClawSweeper review passed for head e6ad029. - Required merge gates passed before the squash merge. Prepared head SHA: e6ad029 Review: openclaw#83701 (comment) Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> * fix(config): accept gateway remote port * fix: restore Array<{}> closing bracket in manager-search.ts Cherry-pick 68b3729 accidentally dropped the '>' from '}>', producing a syntax error. Restore '}>;' as it was in origin/main. * fix: add remotePort to GatewayRemoteConfig and GatewayRemoteConfigSchema * fix(agents): prioritize manual session turns (openclaw#82765) * fix(agents): prioritize manual session turns * docs: update changelog for session priority --------- Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com> * revert: fix(agents): prioritize manual session turns (openclaw#82765) - upstream deps not in gemmaclaw * fix: resolve undefined variable errors in cherry-picked extension code * fix(tui): preserve draft while chat is busy * fix(tui): add pendingChatRunId to TuiStateAccess for cherry-picked tui commit * fix(memory-wiki): make wiki_lint tool output path-safe (openclaw#83687) * fix(ui): render session-scoped tool events (openclaw#83734) * chore: regenerate base config schema after upstream cherry-picks * fix(agents): add persistSubagentRunsToDiskOrThrow to subagent-registry test mock New export added to subagent-registry-state.ts was missing from the vi.mock definition, causing all tests in the suite to skip and the module to fail to load. * fix(telegram): wire buildTelegramInboundOriginTarget into session context Cherry-pick 675e053 added the helper and the test assertion but did not update bot-message-context.session.ts to use it. OriginatingTo now correctly includes :topic:<id> for forum groups. * fix(memory): correct session path format in startup-catchup test sessionPathForFile returns sessions/<basename> (no agent dir), but the cherry-picked test used sessions/main/<basename>. The clean-file test always failed because the path mismatch made every file look unindexed. * fix(together): update video generation test URL from v1 to v2 The source uses TOGETHER_VIDEO_BASE_URL = https://api.together.xyz/v2 but the cherry-picked test still asserted the old v1 URL. --------- Co-authored-by: nitinjwadhawan <nitinwadhawan66@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> Co-authored-by: Andy Ye <35905412+TurboTheTurtle@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Galin Iliev <iliev@galcho.com> Co-authored-by: Galin Iliev <Galin.Iliev@microsoft.com> Co-authored-by: Harry Xie <harryhsieh963@yahoo.com>
Summary
estimateMessageCharsadds serializedtoolResult.detailsto the per-tool-result content size before applying tool-result weighting, so large diagnostic metadata that the provider boundary already strips inflates guard pressure and triggers unnecessary truncation/compaction.normalizeMessagesForLlmBoundary/stripToolResultDetailsalready removesdetailsbefore model conversion, so the guard was charging context the model never sees.detailsfrom the shared estimator's tool-result content size calculation. Provider-boundary stripping is unchanged.Real behavior proof
Behavior or issue addressed: A tool-result with a small model-visible
content(e.g."exit code 0") but a 200KB diagnosticdetailspayload inflates the sharedestimateMessageCharscount by ~6 orders of magnitude, even though the provider boundary stripsdetailsbefore sending to the model. The result is spurious truncation / preemptive compaction ininstallToolResultContextGuard.Real environment tested: Local OpenClaw checkout on macOS Darwin 25.4.0, Node 22, pnpm 10.33.2 against the rebased PR head
0a0c1e2b7eon top of upstream/main95a1c91531. The actual productionestimateMessageCharsCachedandestimateContextCharsexports fromsrc/agents/pi-embedded-runner/tool-result-char-estimator.tsare invoked directly from anoderunner (pnpm exec tsx proof.ts) with a representative tool-result payload (200KB diagnosticdetails).Exact steps or command run after this patch:
pnpm install.proof.tsthat imports the production estimator and feeds it a tool-result withcontent: [{type:"text", text:"exit code 0"}]plus a 200KBdetails.logsstring.pnpm exec tsx proof.tsagainst the patched estimator, thengit checkout upstream/main -- src/agents/pi-embedded-runner/tool-result-char-estimator.tsand re-run to capture the unpatched estimator.Evidence after fix:
Captured live
noderuntime log / console output excerpt below.After the patch (
pnpm exec tsx proof.ts):Before the patch (
git checkout upstream/main -- src/agents/pi-embedded-runner/tool-result-char-estimator.ts && pnpm exec tsx proof.ts):Observed result after fix: With the same tool-result message (200KB
details, 11-char visiblecontent), the production estimator now reports22weighted characters of guard pressure instead of409644. That ratio matches what the model actually sees post-stripToolResultDetails. TheinstallToolResultContextGuardper-result truncation check and aggregate preemptive overflow check both consume this estimator, so the fix removes the spurious truncation / compaction trigger.What was not tested: A live full-session preemptive compaction/truncation roundtrip with a real provider; the live decision path inside
installToolResultContextGuardcalls precisely the estimator exercised above (no mocking), so the live-runtime numbers track this 1:1.Change Type
Scope
Linked Issue/PR
Root Cause
src/agents/pi-embedded-runner/tool-result-char-estimator.ts:127includesserializedDetailsLengthin the per-tool-result content size, even thoughdetailsis never sent to the model.Regression Test Plan
src/agents/pi-embedded-runner/tool-result-context-guard.test.ts.does not count tool-result details toward the context budgetignores large tool-result details when deciding preemptive overflowUser-visible / Behavior Changes
Tool-result
details(logs/metadata that the model never sees) no longer trigger spurious truncation or preemptive compaction. The provider-boundarydetailsstrip continues unchanged.Security Impact
Verification
pnpm test src/agents/pi-embedded-runner/tool-result-context-guard.test.ts— 37 tests passing on PR head; the new regressions fail on plain upstream/main when only the estimator is reverted.pnpm check:changed --base upstream/main. The only failures are 2 pre-existingtsgo:core:testerrors insrc/agents/openai-transport-stream.test.tsandsrc/agents/pi-embedded-runner/openai-stream-wrappers.test.tsthat reproduce on plain upstream/main and are unrelated.Compatibility / Migration