Bug
Context overflow recovery has a blind spot: sessionLikelyHasOversizedToolResults() checks individual tool result sizes against a threshold (30% of context window × 4), but multiple medium-sized results that collectively overflow the context are not detected. This causes the recovery to skip truncation and proceed directly to session reset — destroying all conversation history.
Version
2026.2.12
Steps to Reproduce
- Use a 200K context model (e.g., Claude Opus)
- Build up context to ~77% capacity (in our case via
load-topic loading 142 daily notes)
- In a single assistant turn, read 3 email threads via parallel tool calls:
- Thread A: 38,662 chars
- Thread B: 155,016 chars
- Thread C: 136,693 chars
- Combined: 330,371 chars (~82K tokens) added in one step
- Context jumps from 154,535 → 390,607 tokens (2× the 200K limit)
Expected Behavior
Recovery should truncate the oversized tool results and retry:
- ✅ Auto-compaction attempted → fails (even the chunk is 238K > 200K)
- ❌ Tool result truncation should detect that aggregate tool results are too large
- Truncate the 3 large results and retry
- Only reset session as absolute last resort
Actual Behavior
- ✅ Auto-compaction attempted → fails (238K prefix > 200K)
- ❌
sessionLikelyHasOversizedToolResults() returns false because no individual result exceeds the threshold
- ❌ Truncation skipped entirely
- 💥 Session reset — all history destroyed
Root Cause Analysis
// pi-embedded-DxwVpEx9.js
const MAX_TOOL_RESULT_CONTEXT_SHARE = 0.3;
const HARD_MAX_TOOL_RESULT_CHARS = 400000;
function calculateMaxToolResultChars(contextWindowTokens) {
return Math.min(
Math.floor(contextWindowTokens * 0.3) * 4, // = 240,000 for 200K window
400000
);
}
function sessionLikelyHasOversizedToolResults({ messages, contextWindowTokens }) {
const maxChars = calculateMaxToolResultChars(contextWindowTokens);
for (const msg of messages) {
if (msg.role !== "toolResult") continue;
if (getToolResultTextLength(msg) > maxChars) return true; // ← individual check only
}
return false; // ← returns false even if aggregate far exceeds context
}
For a 200K context window, the threshold is 240,000 chars per result. Our three results (39K, 155K, 137K) are all individually under this threshold, but combined they add ~330K chars (~82K tokens) to an already-full context.
Gateway Logs (evidence)
11:16:48.205Z [context-overflow-diag] 390607 tokens > 200000 max, compactionAttempts=0
11:16:48.207Z context overflow detected (attempt 1/3); attempting auto-compaction
11:16:48.834Z auto-compaction failed: prefix summarization 238273 tokens > 200000 max
11:16:48.861Z Restarting session → new session ID
No [context-overflow-recovery] Attempting tool result truncation log line — confirming truncation was never entered.
Suggested Fix
sessionLikelyHasOversizedToolResults() should also check aggregate tool result size:
function sessionLikelyHasOversizedToolResults({ messages, contextWindowTokens }) {
const maxChars = calculateMaxToolResultChars(contextWindowTokens);
let totalToolResultChars = 0;
for (const msg of messages) {
if (msg.role !== "toolResult") continue;
const len = getToolResultTextLength(msg);
if (len > maxChars) return true; // individual check
totalToolResultChars += len;
}
// Also flag if aggregate tool results exceed context budget
const aggregateMax = contextWindowTokens * 4; // full context in chars
return totalToolResultChars > aggregateMax * 0.5; // >50% of context is tool results
}
Additionally, the truncation logic in truncateOversizedToolResultsInSession should be able to truncate the largest tool results even if none individually cross the threshold, when the aggregate is causing overflow.
Related Issues
Environment
Bug
Context overflow recovery has a blind spot:
sessionLikelyHasOversizedToolResults()checks individual tool result sizes against a threshold (30% of context window × 4), but multiple medium-sized results that collectively overflow the context are not detected. This causes the recovery to skip truncation and proceed directly to session reset — destroying all conversation history.Version
2026.2.12Steps to Reproduce
load-topicloading 142 daily notes)Expected Behavior
Recovery should truncate the oversized tool results and retry:
Actual Behavior
sessionLikelyHasOversizedToolResults()returnsfalsebecause no individual result exceeds the thresholdRoot Cause Analysis
For a 200K context window, the threshold is 240,000 chars per result. Our three results (39K, 155K, 137K) are all individually under this threshold, but combined they add ~330K chars (~82K tokens) to an already-full context.
Gateway Logs (evidence)
No
[context-overflow-recovery] Attempting tool result truncationlog line — confirming truncation was never entered.Suggested Fix
sessionLikelyHasOversizedToolResults()should also check aggregate tool result size:Additionally, the truncation logic in
truncateOversizedToolResultsInSessionshould be able to truncate the largest tool results even if none individually cross the threshold, when the aggregate is causing overflow.Related Issues
Environment