fix(core): auto-compact subagent context to prevent overflow#3735
Conversation
Subagent chats accumulated history without ever compacting, so a long multi-turn run could hit "max context length exceeded" before the compaction logic the main session uses had a chance to fire. Move compaction down into the chat layer so both main agent and subagent auto-compress at the configured threshold, and surface the result via a new chat stream event that bridges into the existing ChatCompressed UI path. The main-session wrapper still owns full /compress reset. Closes #3664
E2E Test ReportRan the plan at Test 1 — Subagent path (the bug)Prompt:
Compression first fires on round 5 (~27–28k tokens against the 30k limit), then continues throughout the run as the subagent keeps adding filler. Decisive contrast vs. pre-fix. Test 2 — Main agent path (control)Prompt:
Mock-log excerpt (input tokens / isCompression):
Behavior matches pre-refactor: every time VerdictBoth paths pass. The fix delivers compaction for subagents end-to-end and preserves the main-agent compaction behavior unchanged. |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
E2E Test Summary — UI Compression DisplayGoal: verify the on-screen compression UI is intact after moving auto-compaction into the chat layer. Method: interactive (tmux) against the issue-3664 mock server. Threshold lowered to
Verdict: both UI surfaces display correctly. Token counts (24472 → 20509) propagate end-to-end through the new event chain. No visual regression introduced by the refactor. Environment: macOS 25.3.0, Node v20.18.1 |
| // GeminiChat already mutated its own history; surface to the debug | ||
| // log so subagent compactions show up alongside the main session's. | ||
| if (streamEvent.type === 'compressed') { | ||
| this.runtimeContext |
There was a problem hiding this comment.
[Critical] Subagent lastPromptTokenCount never seeded — auto-compression silently skipped on first send.
GeminiChat.setLastPromptTokenCount() is designed for "chats created with inherited history (forks, subagents, speculation)" per its JSDoc, but createChat() never calls it. As a result lastPromptTokenCount defaults to 0, the threshold check 0 < threshold * contextWindow always passes, and tryCompress returns NOOP on the first sendMessageStream. For fork subagents with large inherited history, the first API call can 400.
This debug log (the only subagent compression surface) will never fire because compression never triggers. AgentCore already tracks lastPromptTokenCount (field at L216); seed it in createChat():
| this.runtimeContext | |
| // In createChat(), after `new GeminiChat(...)` returns: | |
| chat.setLastPromptTokenCount(this.lastPromptTokenCount); | |
| return chat; |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Fixed in 3239bbe — createChat now seeds the new chat with this.lastPromptTokenCount so the threshold gate sees the inherited size on first send. Agreed the realistic blast radius is small (the 40-entry truncation in forkedAgent.ts defangs the fork case in practice), but a one-line seed is cheap insurance against future call sites that pass large extraHistory/initialMessages.
| // Re-enable auto-compaction so a forced /compress recovers a chat | ||
| // that an earlier auto-attempt latched off. | ||
| this.hasFailedCompressionAttempt = false; | ||
| } else if ( |
There was a problem hiding this comment.
[Critical] COMPRESSION_FAILED_TOKEN_COUNT_ERROR missing from else-if — hasFailedCompressionAttempt never set, causing infinite retry.
The else-if chain checks for FAILED_INFLATED_TOKEN_COUNT and FAILED_EMPTY_SUMMARY but omits FAILED_TOKEN_COUNT_ERROR. When a token counting error occurs, the sticky hasFailedCompressionAttempt flag is never set, so every subsequent sendMessageStream wastes an API call retrying compression that will never succeed. No log or event is emitted either.
| } else if ( | |
| } else if ( | |
| info.compressionStatus === | |
| CompressionStatus.COMPRESSION_FAILED_INFLATED_TOKEN_COUNT || | |
| info.compressionStatus === | |
| CompressionStatus.COMPRESSION_FAILED_EMPTY_SUMMARY || | |
| info.compressionStatus === | |
| CompressionStatus.COMPRESSION_FAILED_TOKEN_COUNT_ERROR | |
| ) { |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Fixed in 3239bbe — added COMPRESSION_FAILED_TOKEN_COUNT_ERROR to the else-if so token-counting failures now latch hasFailedCompressionAttempt and stop retrying compression every send.
| // Surface auto-compaction that fired inside the chat stream as the | ||
| // top-level ChatCompressed event so existing UI handlers stay | ||
| // connected. GeminiClient.sendMessageStream typically runs the | ||
| // compression pass earlier and emits this event itself; this branch |
There was a problem hiding this comment.
[Critical] Turn.run() compressed → ChatCompressed event bridge has zero test coverage.
This is the sole path for auto-compression events to surface from the chat layer to the client layer. If the bridge breaks (e.g., type mismatch after rename), sendMessageStream never receives ChatCompressed, forceFullIdeContext is never flipped, and IDE context is lost after auto-compress. turn.test.ts should cover this:
| // compression pass earlier and emits this event itself; this branch | |
| // In turn.test.ts: construct a StreamEvent with type 'compressed', | |
| // verify run() yields GeminiEventType.ChatCompressed with the same info payload. |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Test added in 3239bbe — turn.test.ts now feeds a compressed StreamEvent through Turn.run() and asserts the yielded ChatCompressed event carries the same info payload.
| // via the `compressed → ChatCompressed` bridge in turn.ts. Manual /compress | ||
| // still calls tryCompressChat directly for the full reset (env refresh + | ||
| // forceFullIdeContext flip). | ||
| const sessionTokenLimit = this.config.getSessionTokenLimit(); |
There was a problem hiding this comment.
[Critical] sessionTokenLimit check now fires before auto-compression runs — behavioral regression.
The old code called tryCompressChat first, then checked sessionTokenLimit with the (potentially reduced) post-compression count. The new code checks the limit at L726 before the Turn starts; auto-compression only happens later inside chat.sendMessageStream(). If the context grew past the session limit on the previous turn, the new code terminates the session immediately, whereas the old code would have attempted compression first and might have reduced the context below the limit.
| const sessionTokenLimit = this.config.getSessionTokenLimit(); | |
| // Auto-compaction happens inside GeminiChat.sendMessageStream and surfaces | |
| // via the `compressed → ChatCompressed` bridge in turn.ts. Manual /compress | |
| // still calls tryCompressChat directly for the full reset (env refresh + | |
| // forceFullIdeContext flip). | |
| // | |
| // NOTE: sessionTokenLimit check moved after Turn completes so that | |
| // auto-compression inside sendMessageStream has a chance to reduce | |
| // the token count before we decide to terminate the session. |
The check should be moved to after the Turn's stream is fully consumed (after the for await loop in sendMessageStream), or a pre-turn compression attempt should be added before the limit check.
— glm-5.1 via Qwen Code /review
There was a problem hiding this comment.
Acknowledged trade-off — the PR description calls this out: sessionTokenLimit is being deprecated in favor of the auto-compaction path, so reading the pre-compaction count here is intentional. In the new model, auto-compaction lives inside chat.sendMessageStream and runs during the Turn, not as a pre-call gate; the sessionTokenLimit check is best-effort safety for the legacy code path. Leaving as-is.
| // immediately after this event still triggers the finally below; | ||
| // otherwise `streamDoneResolver` never fires and the next send hangs. | ||
| if ( | ||
| compressionInfo.compressionStatus === CompressionStatus.COMPRESSED |
There was a problem hiding this comment.
[Critical] COMPRESSED stream event emission has zero test coverage.
This yield is the sole path for auto-compression results to propagate from the chat layer to callers. The old client.test.ts tests for ChatCompressed events (deleted in this diff) had no replacement added. If this yield path breaks (e.g., compressionInfo checked before tryCompress resolves, or generator short-circuits), the main agent UI won't display compression feedback and subagents won't see the debug log in agent-core.ts:547.
// Suggested test in geminiChat.test.ts:
// Mock ChatCompressionService.prototype.compress to return COMPRESSED,
// call sendMessageStream, consume stream events, and assert the first
// event has type === StreamEventType.COMPRESSED with correct info payload.— glm-5.1 via Qwen Code /review
There was a problem hiding this comment.
Test added in 3239bbe — geminiChat.test.ts now mocks ChatCompressionService.compress to return COMPRESSED, runs chat.sendMessageStream, consumes the stream, and asserts the first event has type === StreamEventType.COMPRESSED with the expected info payload.
| this.hasFailedCompressionAttempt = true; | ||
| } | ||
| const compressedHistory = this.getChat().getHistory(); | ||
| await this.startChat(compressedHistory); |
There was a problem hiding this comment.
[Critical] getFileReadCache().clear() call silently lost — reverts the fix from PR (3717).
PR (3717) added this.config.getFileReadCache().clear() to the old tryCompressChat after startChat to prevent stale file_unchanged placeholders after compression. This rewrite (based on an earlier merge base) never included it, and the new auto-compression path in GeminiChat.tryCompress() (L396-404) also has no cache clearing. After compression, FileReadCache retains entries for files whose content was summarized away — subsequent ReadFile hits the cache and returns stale file_unchanged pointing at content the model can no longer retrieve.
| await this.startChat(compressedHistory); | |
| await this.startChat(compressedHistory); | |
| this.config.getFileReadCache()?.clear(); | |
| this.getChat().setLastPromptTokenCount(info.newTokenCount); |
The same fix is needed in geminiChat.ts tryCompress after this.setHistory(newHistory) at L401.
— glm-5.1 via Qwen Code /review
There was a problem hiding this comment.
Good catch — the clear was lost during the merge resolution from main. Restored in 3239bbe at both sites:
client.tryCompressChat: clears afterstartChat(compressedHistory). Note thatstartChatconstructs a freshGeminiChatand does NOT route throughclient.setHistory(which would also clear), so the explicit clear here is necessary.GeminiChat.tryCompress: clears afterthis.setHistory(newHistory). The chat-layersetHistoryis a plain history assignment — only theGeminiClientwrapper clears the cache — so this path also needs the explicit clear.
Both paths now log [FILE_READ_CACHE] clear so it shows up in debug traces.
| } | ||
| }); | ||
|
|
||
| it('NOOPs when the keep slice has too few tool rounds to absorb', async () => { |
There was a problem hiding this comment.
[Suggestion] Test "NOOPs when the keep slice has too few tool rounds to absorb" exits at the threshold gate — never reaches findCompressSplitPoint.
The test sets originalTokenCount=20000 with contextWindowSize=30000. Since 20000 < 0.7 * 30000 = 21000, the threshold check at compress() L203 returns NOOP before reaching findCompressSplitPoint or MIN_COMPRESSION_FRACTION. The test passes for the wrong reason.
| it('NOOPs when the keep slice has too few tool rounds to absorb', async () => { | |
| // Raise originalTokenCount above the threshold so the test actually | |
| // exercises the findCompressSplitPoint + MIN_COMPRESSION_FRACTION path: | |
| const originalTokenCount = 22_000; // 22000 > 0.7 * 30000 = 21000 |
— glm-5.1 via Qwen Code /review
There was a problem hiding this comment.
Fixed in 3239bbe — bumped originalTokenCount to 22000 (above the cheap-gate threshold of 0.7 × 30000 = 21000) so the test now exercises findCompressSplitPoint and the MIN_COMPRESSION_FRACTION decision instead of short-circuiting.
|
|
||
| // Re-send a full IDE context blob on the next regular message — auto | ||
| // compaction inside chat.sendMessageStream may have summarized away | ||
| // the previous IDE-context turn. |
There was a problem hiding this comment.
[Suggestion] forceFullIdeContext flip on auto-compression path has no test.
The forceFullIdeContext = true flip on ChatCompressed at L866-867 only has tests for the manual /compress path (via tryCompressChat). The auto-compression path goes through the event loop here, but in tests tryCompressChat is mocked to NOOP so no ChatCompressed event is ever yielded through this path. A test should verify that when a ChatCompressed event flows through sendMessageStream, forceFullIdeContext becomes true.
— glm-5.1 via Qwen Code /review
There was a problem hiding this comment.
Test added in 3239bbe — client.test.ts now mocks Turn.run to yield a ChatCompressed event, runs sendMessageStream through to completion, and asserts forceFullIdeContext is true after the loop. Covers the auto-compaction path that the manual tryCompressChat tests missed.
| info, | ||
| compressedHistory: newHistory, | ||
| }); | ||
| this.setHistory(newHistory); |
There was a problem hiding this comment.
[Suggestion] Auto-compression no longer refreshes environment context — setHistory replaces in-place without startChat.
The old auto-compression path called tryCompressChat → startChat → getInitialChatHistory() which prepended a fresh environment snapshot (file tree, git status, working directory). The new in-place setHistory at L401 does not call startChat, so environment context is lost into the compression summary (a lossy representation). The comment at client.ts:724 acknowledges this is intentional for the auto path, but it should be documented as an accepted trade-off in the code here.
— glm-5.1 via Qwen Code /review
There was a problem hiding this comment.
Comment added in 03ed878 — pointer at the setHistory call site directs readers to GeminiClient for the env-refresh trade-off, so the layering decision is discoverable from geminiChat.ts without duplicating the rationale across files.
OverviewThis PR moves automatic chat compaction from Additional collateral changes:
CorrectnessThe split-point fallback rewrite is the riskiest part, and it is well-tested. A few smaller things worth flagging:
Code Quality / Style
Test CoverageStrong overall:
Test churn in Risks
SummaryThe fix correctly addresses the documented bug (subagent overflow) by relocating compaction to the layer that already owns the conversation state, with thoughtful handling of edge cases (in-flight tool calls, role alternation, send-lock leakage on compression failure, latch reset). Two small follow-ups worth taking before merging: refresh the stale comment in |
…text-compaction # Conflicts: # packages/core/src/core/client.test.ts # packages/core/src/core/client.ts
wenshao
left a comment
There was a problem hiding this comment.
Overview
Moves the auto-compaction trigger from GeminiClient.sendMessageStream (main-agent only) into GeminiChat.sendMessageStream, so subagents — which use GeminiChat directly via AgentCore — get the same threshold-driven compaction as the main session. Adds a per-chat lastPromptTokenCount and hasFailedCompressionAttempt, switches ChatCompressionService.compress to a single options-object signature, and surfaces the new StreamEventType.COMPRESSED up through Turn to GeminiEventType.ChatCompressed so existing UI handlers stay wired. Also extends findCompressSplitPoint with an in-flight model+functionCall fallback that compresses-most rather than NOOPing when the trailing entry is an unmatched tool call (the dominant subagent-loop shape).
Issues
1. Bug — FileReadCache.clear() is dropped on both compaction paths (regression)
The pre-PR client.ts cleared the file-read cache after every successful compaction with this comment, which still applies:
Compaction rewrites the prompt history: prior full-Read tool results may have been summarised away, but the FileReadCache still believes those reads are "in this conversation". A follow-up Read could then return the file_unchanged placeholder pointing at content the model can no longer retrieve.
After the PR:
- Manual
/compresspath —client.ts:1397-1417(tryCompressChat) no longer callsgetFileReadCache().clear(). Neither does the newGeminiChat.tryCompressit now delegates to. - Auto-compaction path — the
ChatCompressedevent handler inclient.ts:997-999only flipsforceFullIdeContext. Same gap.
Compare with surrounding code that does clear correctly: truncateHistory (client.ts:295), resetChat (client.ts:320), and microcompaction (client.ts:814 — its comment explicitly references "mirroring the post-compaction clear in tryCompressChat", which no longer exists). The same `file_unchanged`-pointing-at-summarized-bytes scenario the original comment described will reappear after every compaction, including the new auto-path you're enabling for subagents (where Read tools are common).
Suggested fix: clear the cache inside GeminiChat.tryCompress on COMPRESSED (centralizes for both main + subagent), or — if you'd rather not pull Config.getFileReadCache into the chat layer — add it to both client-side handlers (tryCompressChat after the startChat call, and the ChatCompressed event branch in the iterator).
2. Hard-coded English continuation bridge text
chatCompressionService.ts:355-364:
```ts
text: 'Continue with the prior task using the context above.',
```
When keepNeedsContinuationBridge is true, this synthetic user turn is injected into history. Other prompt strings in the surrounding compaction code (getCompressionPrompt, the 'Got it. Thanks…' ack) are equally hardcoded English, so this matches existing style — but it is a new model-visible synthetic instruction whose phrasing affects what the model thinks it was told to do. Worth at least a constant near the other compression strings, for the same reason COMPRESSION_TOKEN_THRESHOLD lives at the top of the file.
3. Scope creep — unrelated formatting churn
About 10 of 21 files are pure whitespace/formatting (docs/users/features/code-review.md, packages/cli/src/commands/review/*.ts, packages/core/src/skills/{skill-manager,symlinkScope}.{ts,test.ts}, monitorRegistry.ts). They have nothing to do with subagent compaction and bloat the review surface. Recommend splitting into a separate "prettier sweep" PR — keeps git blame clean and lets this PR be cherry-picked safely.
Minor
agent-core.ts:540-552— thestreamEvent.type === 'compressed'string-literal check works becauseStreamEventType.COMPRESSED = 'compressed', but the surrounding code inturn.ts:311does the same check without importing the enum. Fine as-is, but inconsistent with the discriminated-union style elsewhere; usingStreamEventType.COMPRESSEDwould survive an enum value rename.- Per-chat
lastPromptTokenCount = 0for subagents on first send — explicitly acknowledged in "Known Limitations" as accepted. Worth adding a TODO onGeminiChat.setLastPromptTokenCountpointing at the agent-core construction site so a future fix is discoverable. tryCompressclearshasFailedCompressionAttempt = falseon every successful COMPRESSED, including auto-compactions (geminiChat.ts:1791). The comment justifies this only for forced compactions ("a forced /compress recovers a chat that an earlier auto-attempt latched off"), but the code applies to both. That is actually the correct behavior — an auto-success means the state changed enough that a retry is now reasonable — but the comment undersells it.
Strengths
- Send-lock release on
tryCompressthrow (geminiChat.ts:1819-1830) — exactly the kind of finicky bug that would deadlock production sessions weeks later. Test ingeminiChat.test.ts:1369-1409is a clear regression guard. originalTokenCountis now an explicit parameter rather than a hidden global read — makes the service unit-testable per-chat and is the right abstraction for the subagent case.- In-flight fallback in
findCompressSplitPoint— clean two-phase doc comment, thesplitPointRetainingTrailingPairshelper is small and focused, and the four new test cases cover (default retain, fewer-than-retain, zero-pairs, override) thoroughly. - Test coverage —
chatCompressionService.test.tsadds the realistic "tool-loop subagent absorption" scenario including assertions on strict role alternation in the joined history;geminiChat.test.tscovers per-chat state in isolation from the service.
Risk Assessment
- The
FileReadCacheregression is the only blocking concern — easy to fix but real (will cause stale-read confusion for users after every/compressor auto-compaction). - The
sessionTokenLimitordering shift is correctly called out as accepted; the feature is on its way out. - Subagent first-send overflow is a known and bounded edge case.
- Manual
/compresssemantics are preserved (forced compaction still bypasses the failed-attempt latch and resets it on success).
Code fixes: - Seed `lastPromptTokenCount` on subagent chats so the first-send threshold gate sees the inherited history's true size. - Add `COMPRESSION_FAILED_TOKEN_COUNT_ERROR` to the fail-latch chain so token-counting failures stop retrying compression every send. - Restore `FileReadCache.clear()` after compaction in both the manual /compress wrapper and the auto-compaction path inside GeminiChat, preventing post-summary `file_unchanged` placeholders from pointing at content the model can no longer retrieve. - Refresh stale comment on the `compressed → ChatCompressed` bridge in turn.ts now that this path is the primary route, not a fallback. Tests: - turn.test.ts asserts the compressed → ChatCompressed bridge. - geminiChat.test.ts asserts COMPRESSED yields as the first stream event after auto-compaction succeeds. - chatCompressionService.test.ts bumps originalTokenCount above the cheap-gate so the NOOP test exercises findCompressSplitPoint. - client.test.ts asserts forceFullIdeContext flips when a ChatCompressed event flows through sendMessageStream's loop.
…tion trade-off - Remove the `as StreamEvent` cast at the COMPRESSED yield site — the literal already matches the union member. - Add a 4-line comment at the auto-compaction setHistory point that points readers to GeminiClient for the env-refresh trade-off rationale, so readers don't have to chase the layering decision back across files.
|
Thanks for the thorough review @wenshao. Pushed two follow-up commits (3239bbe + 03ed878) addressing the points raised: Code fixes
Test additions
Acknowledged trade-offs (not changing)
|
Code ReviewOverviewThe PR fixes #3664 by moving auto-compaction from Strengths
Issues / SuggestionsCorrectness
Scope / Noise
Testing
Minor
Risk Assessment
VerdictSolid fix for a real bug, with thoughtful architectural cleanup and good test coverage. Two requests before merge:
Otherwise LGTM — the in-flight fallback design, the deadlock-safety dance, and the per-chat-counter decoupling are all the right calls. |
wenshao
left a comment
There was a problem hiding this comment.
LGTM. Solid fix — moving compaction to the chat layer is the right call, the send-lock release on throw is well-covered, and the in-flight fallback in findCompressSplitPoint handles the subagent tool-loop case cleanly. Two non-blocking nits in the review comment (split out prettier noise, refresh the stale Known Limitation #2).
wenshao
left a comment
There was a problem hiding this comment.
Two new Suggestion-level findings below. No new Critical issues that aren't already captured in existing Qwen Code comments.
| force = false, | ||
| signal?: AbortSignal, | ||
| ): Promise<ChatCompressionInfo> { | ||
| const service = new ChatCompressionService(); |
There was a problem hiding this comment.
[Suggestion] new ChatCompressionService() allocated on the hot path — every sendMessageStream call (main + subagent, every turn).
The service is stateless (single method, no instance fields), so this allocation is pure GC pressure. Consider making compress a static method or using a module-level singleton.
| const service = new ChatCompressionService(); | |
| - const service = new ChatCompressionService(); | |
| - const { newHistory, info } = await service.compress(this, { | |
| + const { newHistory, info } = await ChatCompressionService.compress(this, { |
— deepseek-v4-pro via Qwen Code /review
| !lastContent?.parts?.some((part) => part.functionCall) | ||
| ) { | ||
| return contents.length; | ||
| if (lastContent?.role === 'model') { |
There was a problem hiding this comment.
[Suggestion] No debug logging when the new in-flight functionCall fallback triggers (splitPointRetainingTrailingPairs path).
The regular scan path also has no logging at the split decision point. This makes it impossible to distinguish at runtime which path was taken without instrumenting and redeploying — critical for debugging "model lost context" reports after compression.
| if (lastContent?.role === 'model') { | |
| const lastContent = contents[contents.length - 1]; | |
| if (lastContent?.role === 'model') { | |
| if (!hasFunctionCall(lastContent)) return contents.length; | |
| - return splitPointRetainingTrailingPairs(contents, retainCount); | |
| + const splitPoint = splitPointRetainingTrailingPairs(contents, retainCount); | |
| + debugLogger.debug( | |
| + `[COMPRESS-SPLIT] in-flight-fc fallback: splitPoint=${splitPoint}, total=${contents.length}` | |
| + ); | |
| + return splitPoint; |
— deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] ACP session now triggers double compression per turn.
packages/cli/src/acp-integration/session/Session.ts (#sendMessageStreamWithAutoCompression) calls geminiClient.tryCompressChat() before each turn AND then calls chat.sendMessageStream(), which now internally runs this.tryCompress after this PR. Each ACP turn gets two tryCompress calls — the first from the wrapper, the second from inside sendMessageStream. The second call usually NOOPs (token count already updated), but still incurs unnecessary CompressOptions allocation and threshold checks.
More subtly: if the first call succeeds via startChat (replacing the chat object), the second tryCompress runs on the new GeminiChat whose lastPromptTokenCount = 0 — so the threshold comparison uses the wrong value.
The CLI/TUI path removed its manual pre-call in this PR. Recommend the ACP path follow suit.
— deepseek-v4-pro via Qwen Code /review
* fix(core): auto-compact subagent context to prevent overflow Subagent chats accumulated history without ever compacting, so a long multi-turn run could hit "max context length exceeded" before the compaction logic the main session uses had a chance to fire. Move compaction down into the chat layer so both main agent and subagent auto-compress at the configured threshold, and surface the result via a new chat stream event that bridges into the existing ChatCompressed UI path. The main-session wrapper still owns full /compress reset. Closes #3664 * fix(core): address subagent compaction review feedback Code fixes: - Seed `lastPromptTokenCount` on subagent chats so the first-send threshold gate sees the inherited history's true size. - Add `COMPRESSION_FAILED_TOKEN_COUNT_ERROR` to the fail-latch chain so token-counting failures stop retrying compression every send. - Restore `FileReadCache.clear()` after compaction in both the manual /compress wrapper and the auto-compaction path inside GeminiChat, preventing post-summary `file_unchanged` placeholders from pointing at content the model can no longer retrieve. - Refresh stale comment on the `compressed → ChatCompressed` bridge in turn.ts now that this path is the primary route, not a fallback. Tests: - turn.test.ts asserts the compressed → ChatCompressed bridge. - geminiChat.test.ts asserts COMPRESSED yields as the first stream event after auto-compaction succeeds. - chatCompressionService.test.ts bumps originalTokenCount above the cheap-gate so the NOOP test exercises findCompressSplitPoint. - client.test.ts asserts forceFullIdeContext flips when a ChatCompressed event flows through sendMessageStream's loop. * chore(core): drop redundant StreamEvent cast and document auto-compaction trade-off - Remove the `as StreamEvent` cast at the COMPRESSED yield site — the literal already matches the union member. - Add a 4-line comment at the auto-compaction setHistory point that points readers to GeminiClient for the env-refresh trade-off rationale, so readers don't have to chase the layering decision back across files.
… auto-compaction redesign - OOM reproduction report: root cause confirmed as structuredClone() positive feedback loop during auto-compaction (#3735, #3879), with real debug log evidence from crash session. - Runtime diagnostics benchmark: process-tree RSS sampling results comparing installed CLI vs local rebuilt bundle. - Auto-compaction threshold redesign: proposal for replacing the fixed 70% token threshold with RSS-aware graduated strategy.
… auto-compaction redesign - OOM reproduction report: root cause confirmed as structuredClone() positive feedback loop during auto-compaction (#3735, #3879), with real debug log evidence from crash session. - Runtime diagnostics benchmark: process-tree RSS sampling results comparing installed CLI vs local rebuilt bundle. - Auto-compaction threshold redesign: proposal for replacing the fixed 70% token threshold with RSS-aware graduated strategy.
…3735) * fix(core): auto-compact subagent context to prevent overflow Subagent chats accumulated history without ever compacting, so a long multi-turn run could hit "max context length exceeded" before the compaction logic the main session uses had a chance to fire. Move compaction down into the chat layer so both main agent and subagent auto-compress at the configured threshold, and surface the result via a new chat stream event that bridges into the existing ChatCompressed UI path. The main-session wrapper still owns full /compress reset. Closes QwenLM#3664 * fix(core): address subagent compaction review feedback Code fixes: - Seed `lastPromptTokenCount` on subagent chats so the first-send threshold gate sees the inherited history's true size. - Add `COMPRESSION_FAILED_TOKEN_COUNT_ERROR` to the fail-latch chain so token-counting failures stop retrying compression every send. - Restore `FileReadCache.clear()` after compaction in both the manual /compress wrapper and the auto-compaction path inside GeminiChat, preventing post-summary `file_unchanged` placeholders from pointing at content the model can no longer retrieve. - Refresh stale comment on the `compressed → ChatCompressed` bridge in turn.ts now that this path is the primary route, not a fallback. Tests: - turn.test.ts asserts the compressed → ChatCompressed bridge. - geminiChat.test.ts asserts COMPRESSED yields as the first stream event after auto-compaction succeeds. - chatCompressionService.test.ts bumps originalTokenCount above the cheap-gate so the NOOP test exercises findCompressSplitPoint. - client.test.ts asserts forceFullIdeContext flips when a ChatCompressed event flows through sendMessageStream's loop. * chore(core): drop redundant StreamEvent cast and document auto-compaction trade-off - Remove the `as StreamEvent` cast at the COMPRESSED yield site — the literal already matches the union member. - Add a 4-line comment at the auto-compaction setHistory point that points readers to GeminiClient for the env-refresh trade-off rationale, so readers don't have to chase the layering decision back across files.
Summary
maximum context length exceededinstead of compacting first./compressreset path is unchanged.Validation
Use the Explore subagent to map out the project structure.(subagent path) andSearch this repo and report what you find.(main-agent control).Known Limitations
Two trade-offs reviewers should be aware of — both are intentional, not oversights:
sessionTokenLimitgate ordering. The gate now reads the previous turn's prompt count instead of the post-compaction count, so a session that sat just over the limit may trip one turn earlier than before. Acceptable because the feature is slated for removal; not worth re-introducing the eager compression pre-call in the main session loop just to preserve it.0, so the first send doesn't trigger compaction even when the inherited parent history is already over the model limit. Subsequent sends compact normally once the first response populates the counter. We deliberately did not seed the counter from the parent: the plumbing would touch fork/speculation/btw paths that don't need compaction at all, and the worst case (first-send overflow) is rare in practice — most parents that big have already compacted themselves.Scope / Risk
Testing Matrix
Verified macOS via
npm run build && npm run bundleand the headless e2e plan above; other platforms unchanged.Linked Issues / Bugs
Closes #3664