fix(agents): truncate tool results before overflow compaction#81190
fix(agents): truncate tool results before overflow compaction#81190LLagoon3 wants to merge 2 commits into
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 12:54 AM ET / 04:54 UTC. Summary PR surface: Source +51, Tests +56. Total +107 across 2 files. Reproducibility: yes. Current main still attempts explicit overflow compaction before tool-result truncation, and the linked report includes runtime logs where compaction timed out for 912840ms before truncating 281 tool results. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Rebase the focused ordering change onto src/agents/embedded-agent-runner, keep the persisted-turn guard and fallback tests, and land after targeted overflow proof plus maintainer acceptance of truncation-first recovery. Do we have a high-confidence way to reproduce the issue? Yes. Current main still attempts explicit overflow compaction before tool-result truncation, and the linked report includes runtime logs where compaction timed out for 912840ms before truncating 281 tool results. Is this the best way to solve the issue? Yes, with rebase work. Reusing the existing truncation heuristic before LLM compaction is a narrow fix, but the current branch is not merge-ready until it is adapted to the renamed runner path and maintainers accept the fallback-order change. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against f7c32fc8befd. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +51, Tests +56. Total +107 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Thanks, ClawSweeper. I addressed the P2 persisted-turn retry guard in 7ec0b4b: successful pre-compaction tool-result truncation now uses the same transcript-continuation + suppress-next-user-append guard as the compaction-success path when the inbound message was already persisted. I also added a regression test for that exact case.\n\nValidation:\n\n Re-review progress:
|
|
Added Docker-isolated patched-runtime proof to the PR body. The proof image is built from this PR checkout, runs with a temporary OPENCLAW_HOME and synthetic tool-result transcript only, and shows the runtime truncation path executing before compaction fallback:\n\n Re-review progress:
|
|
@clawsweeper re-review Added Docker-isolated patched-runtime proof to the PR body. The Real behavior proof check is now green, and the proof shows the packaged OpenClaw runtime truncating tool results before compaction fallback with synthetic data and no production credentials. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
This pull request has been automatically marked as stale due to inactivity. |
|
ClawSweeper PR egg: ✨ hatched 🥚 common Cosmic Diff Drake. Rarity: 🥚 common. Trait: collects tiny proofs. DetailsShare on X: post this hatch
About:
|
Summary
AI-assisted: yes — implemented with an AI coding assistant and manually reviewed/verified locally.
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real behavior proof (required for external PRs)
OpenClaw 2026.5.12-beta.1 (7ec0b4b)) plus the original live OpenClaw Telegram direct-session before log, with private identifiers redacted in Overflow recovery should truncate tool results before waiting full auto-compaction timeout #81182. The Docker proof used a temporaryOPENCLAW_HOME=/tmp/openclaw-proof-home, synthetic tool-result payloads only, and no production Telegram token, host gateway, or user data.Built the package-installed Docker runtime image from this PR checkout with the repo's existing Docker E2E package pipeline.
Ran a Docker-isolated runtime proof harness against the packaged
/appOpenClaw runtime. The harness creates a temporary transcript with an oversized tool result, runs the samesessionLikelyHasOversizedToolResults(...)+truncateOversizedToolResultsInSession(...)path used by the new overflow branch, and asserts compaction was not reached before successful truncation.Redacted live runtime log from the original OpenClaw Telegram setup, showing the real failure mode this patch targets:
Docker-isolated patched-runtime terminal capture:
Targeted regression test terminal capture from the patched local checkout:
durationMs=912840followed by immediate truncation of281 tool result(s).Root Cause (if applicable)
Regression Test Plan (if applicable)
src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.tsUser-visible / Behavior Changes
Tool-heavy overflow recovery can retry sooner by truncating tool results before waiting for LLM auto-compaction. If truncation does not help, behavior falls back to the existing compaction path.
Diagram (if applicable)
Security Impact (required)
Yes/No) NoYes/No) NoYes/No) NoYes/No) NoYes/No) NoYes, explain risk + mitigation: N/ARepro + Verification
Environment
openai-codex/gpt-5.5; live before evidence usedopenai-codex/gpt-5.4; unit tests use the mocked embedded run-loop providerSteps
Expected
Actual
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
OPENCLAW_SKIP_DOCKER_BUILD=1 OPENCLAW_RUNTIME_PROOF_LOG=/tmp/openclaw-pr-81190-runtime-proof.log scripts/proof/docker-runtime-proof.sh -- node /proof/overflow-truncation-runtime-proof.mjsCI=true node scripts/run-vitest.mjs run src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.tsgit diff --checkpasses.pnpm exec oxfmt --write --threads=1 src/agents/pi-embedded-runner/run.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.tspnpm build && pnpm check && pnpm testReview Conversations
Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoRisks and Mitigations
sessionLikelyHasOversizedToolResultsheuristic detects tool-result pressure, and falls back to compaction when truncation does not help.