fix(core): guard oversized resumed history sends#4531
Conversation
|
This PR shares code with PR #4525, PR #4526, and PR #4528 — all modifying core token estimation and prompt handling logic. Key shared functions:
Recommendation: This PR guards oversized resumed history which shares client and token estimation code with other PRs in the group. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
|
This PR shares code with PR #4525, PR #4526, and PR #4528 — all modifying core token estimation and prompt handling logic. Key shared functions:
Recommendation: This PR guards oversized resumed history which shares client and token estimation code with other PRs in the group. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
| ? estimatePromptTokens( | ||
| this.getHistoryShallow(true), | ||
| userContent, | ||
| 0, |
There was a problem hiding this comment.
[Suggestion] lastPromptTokenCount is hardcoded to 0 here, forcing estimatePromptTokens into the char/4 heuristic branch even when this.lastPromptTokenCount has already been updated by tryCompress (set to info.newTokenCount on COMPRESSED, unchanged on failure). The codebase's own comments document char/4 as under-counting by ~15–20K tokens.
In the non-COMPRESSED branch (compression failure, history unchanged), shouldStopAfterHardRescue compares only localPromptTokensAfterCompression against hardLimit. But effectiveTokens — which triggered hard-rescue in the first place and uses the API-anchored lastPromptTokenCount when available — is never passed into the decision function. A resumed session where lastPromptTokenCount > 0 and the API-anchored count exceeds hard by less than ~15–20K tokens will trigger hard-rescue, fail compression, and then the guard will NOT fire because the char/4 re-estimate falls below hard. The oversized request proceeds to the API — the exact scenario this guard exists to prevent.
Consider passing this.lastPromptTokenCount instead of 0, or threading effectiveTokens into shouldStopAfterHardRescue and using Math.max(effectiveTokens, localPromptTokensAfterCompression) in the non-COMPRESSED branch (mirroring what getHardRescueFailureMessage already does for the error text).
— qwen3.7-max via Qwen Code /review
| ): string { | ||
| const tokenCount = | ||
| compressionInfo.compressionStatus === CompressionStatus.COMPRESSED | ||
| ? compressionInfo.newTokenCount |
There was a problem hiding this comment.
[Suggestion] In the COMPRESSED branch, tokenCount is always compressionInfo.newTokenCount — even when the guard fired because localPromptTokensAfterCompression >= hardLimit (while newTokenCount < hardLimit). In that scenario the error message reports a token count below the hard limit, leaving users unable to understand why the guard triggered.
Consider using Math.max(compressionInfo.newTokenCount, localPromptTokensAfterCompression) so the reported value always matches or exceeds the limit that was breached:
| ? compressionInfo.newTokenCount | |
| const tokenCount = | |
| compressionInfo.compressionStatus === CompressionStatus.COMPRESSED | |
| ? Math.max(compressionInfo.newTokenCount, localPromptTokensAfterCompression) | |
| : Math.max(effectiveTokens, localPromptTokensAfterCompression); |
— qwen3.7-max via Qwen Code /review
| imageTokenEstimate, | ||
| ) | ||
| : 0; | ||
| if ( |
There was a problem hiding this comment.
[Suggestion] When compression succeeds (COMPRESSED status), tryCompress has already called this.setHistory(newHistory) before this guard runs. If the guard then throws, the original pre-compression history is irreversibly replaced. The session file records the compressed (but still oversized) history, and --continue loads the same unusable state.
This is a design tradeoff — the original history was already too large to send — but it's worth documenting explicitly. Consider either:
- Adding a code comment noting that history mutation is intentional and the original is recoverable only from the session JSONL backup, or
- Snapshotting the history reference before
tryCompressand restoring it on guard failure, so the user's session file retains the original conversation.
— qwen3.7-max via Qwen Code /review
| compressionInfo, | ||
| localPromptTokensAfterCompression, | ||
| ); | ||
| debugLogger.warn(message); |
There was a problem hiding this comment.
[Suggestion] The existing hard-rescue warn at ~line 1565 uses a [compaction] prefix and includes effectiveTokens, hard, and consecutiveFailures for greppability and correlation. This new warn uses the user-facing error message as-is, with no [compaction] tag and no prompt_id. In log aggregation, there's no way to filter for all oversized-history guard triggers or correlate with the upstream hard-rescue warn.
Consider adding a structured log prefix:
| debugLogger.warn(message); | |
| debugLogger.warn( | |
| `[compaction] oversized-history guard rejected send: prompt_id=${prompt_id}, ` + | |
| `effectiveTokens=${effectiveTokens}, localPromptTokensAfterCompression=${localPromptTokensAfterCompression}, ` + | |
| `compressionNewTokenCount=${compressionInfo.newTokenCount}, hardLimit=${hard}, ` + | |
| `compressionStatus=${compressionInfo.compressionStatus}`, | |
| ); |
— qwen3.7-max via Qwen Code /review
| }); | ||
|
|
||
| it('rejects before request serialization when hard-rescue compression is still oversized', async () => { | ||
| chat.setHistory([ |
There was a problem hiding this comment.
[Suggestion] shouldStopAfterHardRescue has a branch that is not covered by either test: COMPRESSED + compressionInfo.newTokenCount < hard + localPromptTokensAfterCompression >= hard. This catches cases where the compression service's reported token count is optimistic but the local char/4 re-estimate still shows oversized. It's the branch most likely to catch real tokenizer mismatches in production.
Consider adding a third test where compression succeeds with newTokenCount just under hard but the returned newHistory is still large enough that local re-estimation exceeds the hard limit. For example, mock newTokenCount: 176_999 (just under hard) while keeping the 720K-char history content so char/4 estimates ~180K.
— qwen3.7-max via Qwen Code /review
| return localPromptTokensAfterCompression >= hardLimit; | ||
| } | ||
| return ( | ||
| compressionInfo.newTokenCount >= hardLimit || |
There was a problem hiding this comment.
[Suggestion] In the COMPRESSED branch, compressionInfo.newTokenCount >= hardLimit is logically redundant. After tryCompress succeeds, this.lastPromptTokenCount = info.newTokenCount (line ~1404), so localPromptTokensAfterCompression = lastPromptTokenCount + estimateContentTokens(userContent) + imageTokenEstimate. Since the user-content and image estimates are non-negative, localPromptTokensAfterCompression >= newTokenCount always holds. If newTokenCount >= hardLimit, then localPromptTokensAfterCompression >= hardLimit is guaranteed to fire first.
The function could simplify to:
if (!shouldForceFromHard) return false;
return localPromptTokensAfterCompression >= hardLimit;unless the explicit newTokenCount check serves as a defensive safety net worth documenting.
— qwen3.7-max via Qwen Code /review
| if ( | ||
| compressionInfo.compressionStatus === CompressionStatus.COMPRESSED && | ||
| historyBeforeHardRescue | ||
| ) { |
There was a problem hiding this comment.
[Suggestion] When compression succeeds (COMPRESSED) but the guard still rejects, tryCompress has already called chatRecordingService.recordChatCompression() (line ~1394), writing a compression checkpoint to the JSONL session log. This rollback restores in-memory state (history, lastPromptTokenCount, telemetry) but the JSONL record persists.
On --continue resume, buildApiHistoryFromConversation would use the stale compressed snapshot as the reconstruction base, even though the in-memory session was rolled back. Consider either:
- Deferring
recordChatCompressionuntil after the guard passes (move the call fromtryCompressinto the caller). - Emitting a compensating "compression rolled back" record after the restore.
- Documenting that this asymmetry is acceptable since the guard re-fires on resume anyway.
— qwen3.7-max via Qwen Code /review
| chat.setHistory([ | ||
| { role: 'user', parts: [{ text: 'x'.repeat(720_000) }] }, | ||
| { role: 'model', parts: [{ text: 'ack' }] }, | ||
| ]); |
There was a problem hiding this comment.
[Suggestion] Tests 2 and 3 both exercise the COMPRESSED-then-restore path but neither asserts chat.getLastPromptTokenCount() after the guard throws. The production code at geminiChat.ts:1622 restores lastPromptTokenCount, but if that line were accidentally removed, these tests would still pass.
Consider adding after the rejects.toThrow(...):
expect(chat.getLastPromptTokenCount()).toBe(0);For stronger coverage, set a non-trivial value before the send (e.g., chat.setLastPromptTokenCount(50_000)) and assert it's restored — this verifies the round-trip through compression's mutation and back.
— qwen3.7-max via Qwen Code /review
| return ( | ||
| `Context is too large to send safely after automatic compression. ` + | ||
| `Estimated prompt tokens: ${tokenCount}; hard limit: ${hardLimit}; ` + | ||
| `compression status: ${compressionInfo.compressionStatus}. ` + |
There was a problem hiding this comment.
[Suggestion] CompressionStatus is a numeric enum (COMPRESSED = 1, COMPRESSION_FAILED_EMPTY_SUMMARY = 4, etc.). The template literal ${compressionInfo.compressionStatus} renders as a raw integer (e.g., "compression status: 1") in this user-facing error message.
| `compression status: ${compressionInfo.compressionStatus}. ` + | |
| `compression status: ${CompressionStatus[compressionInfo.compressionStatus]}. ` + |
This also applies to the debug log at line 1630 where the same pattern appears.
— qwen3.7-max via Qwen Code /review
| }); | ||
|
|
||
| it('rejects when compressed history is below hard but the pending user message pushes it over', async () => { | ||
| chat.setHistory([ |
There was a problem hiding this comment.
[Suggestion] This test uses chat instead of chatWithRecording, so it cannot verify two important invariants that the other COMPRESSED+rollback tests check:
recordChatCompressionwas NOT called (the deferred-record invariant that this PR introduces)- History was rolled back to the original pre-compression state
This is the subtlest edge case (compressed history is below hard, but the pending user message tips it over) and the one most likely to regress. If a future change accidentally moves recordChatCompression before the guard, this test would not catch it.
Consider switching to chatWithRecording (same pattern as the "hard-rescue compression is still oversized" test above) and adding:
expect(recordChatCompression).not.toHaveBeenCalled();
expect(chatWithRecording.getHistory()[0].parts?.[0].text).toBe(
'x'.repeat(720_000),
);— qwen3.7-max via Qwen Code /review
| // Hard-rescue compression mutates in-memory history before this | ||
| // guard can compare the compressed prompt size. If the compressed | ||
| // prompt is still too large to send, restore the pre-compression | ||
| // state and defer the JSONL compression checkpoint until a guarded |
There was a problem hiding this comment.
[Suggestion] The comment says "defer the JSONL compression checkpoint until a guarded send is actually allowed below" — but the code throws at line 1647. There is no "below" that eventually writes the checkpoint; the send is rejected outright.
| // state and defer the JSONL compression checkpoint until a guarded | |
| // Hard-rescue compression mutates in-memory history before this | |
| // guard can compare the compressed prompt size. If the compressed | |
| // prompt is still too large to send, restore the pre-compression | |
| // state. The JSONL compression checkpoint is intentionally NOT | |
| // written because the send is about to be rejected. |
— qwen3.7-max via Qwen Code /review
|
Updated in 7541e55 for the latest comment wording suggestion. The hard-rescue rollback comment now states that the JSONL compression checkpoint is intentionally not written because the send is about to be rejected. No behavior changed. Validation run locally on Windows: npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hard-rescue|oversized resumed history|compressed history is below hard"
npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts
npx eslint packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts |
| */ | ||
| deferChatCompressionRecord?: boolean; | ||
| } | ||
|
|
There was a problem hiding this comment.
[Suggestion] The tryCompress method-level JSDoc (lines 1361-1367) states unconditionally that on COMPRESSED the method "recorded the event to chatRecordingService (if wired)" — but with this new deferChatCompressionRecord option, that's no longer true. When the option is set, recording is intentionally skipped inside tryCompress and deferred to the caller.
Consider updating the JSDoc to note the exception, e.g.:
* history, recorded the event to `chatRecordingService`
* (if wired, unless `options.deferChatCompressionRecord` is set), and
This prevents a future reader of the method signature from assuming the recording always happens inside tryCompress and introducing a double-write.
— qwen3.7-max via Qwen Code /review
Local Verification ReportPR: #4531 — 1. Build & Static Analysis
2. Targeted Tests (PR author's recommended commands)
3. Full Core Regression Suite
* Pre-existing failures in 4. New Test Coverage (4 new + 1 enhanced test case)
5. Code ReviewProblem solved (Fixes #4363): Solution — 3 changes:
Key invariants verified:
No issues found. Changes are minimal, well-documented (7 clean commits), and strictly scoped to the hard-rescue send-size guard flow. 6. SummaryAll 150 geminiChat tests pass (including 4 new + 1 enhanced). Build, typecheck, lint, and format all clean. The fix correctly prevents oversized resumed histories from writing misleading compression checkpoints and ensures the send-size guard rejects before any API call. Recommendation: Ready to merge ✅ |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
|
Hi @Jerry2003826, this PR is directly relevant to #4624 (resume OOM) which we're actively working on — the "guard oversized resumed history" direction makes sense as an additional safety net. Before we can merge, we need you to demonstrate the problem and the fix with concrete evidence:
We recently landed #4644 which addresses the transient OOM from Thanks for the contribution! |
yiliang114
left a comment
There was a problem hiding this comment.
Code-wise this looks reasonable to me: the rollback/recording order is safer, the guard prevents the final oversized send, and the targeted tests cover the new invariants.
Non-blocking suggestion: it would still be useful to add a short before/after log for the original #4363 synthetic resumed-history repro, just to confirm the real path reaches this guard rather than failing earlier during the compression request itself.
What this PR does
Adds a hard guard for resumed histories that still exceed the request-size limit after compression/rescue, and defers compression recording until the guard accepts the rescued history.
Why it's needed
A resumed session could persist an oversized or misleading compression checkpoint before the final send-size guard decided the request still had to be rejected.
Reviewer Test Plan
How to verify
Run: npm test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hard-tier rescue"; npm test --workspace=packages/core -- src/core/geminiChat.test.ts; npx eslint packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts; npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts; npm run typecheck --workspace=packages/core.
Evidence (Before & After)
Before: an oversized resumed history could record a compression checkpoint before the final hard-size guard rejected the send. After: regression tests cover accepted hard rescue, rejected hard rescue, delayed recording, and restored token accounting. This is core/session behavior, so before/after TUI screenshots are N/A.
Tested on
Environment (optional)
Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.
Risk & Scope
Linked Issues
Fixes #4363
中文说明
这个 PR 做了什么
为恢复会话后的超大历史增加最终请求大小保护,并且只有在保护通过后才记录压缩 checkpoint。
为什么需要
之前恢复会话在最终发送保护拒绝之前,可能先写入有误导性的压缩记录。
Reviewer Test Plan
本地在 Windows 跑了 geminiChat 相关定向测试、完整文件测试、eslint、prettier check 和 core typecheck;macOS/Linux 依赖 GitHub CI。
风险和范围
范围限制在 hard rescue 的发送大小保护和压缩记录顺序,不改公共 API。