Skip to content

fix(core): guard oversized resumed history sends#4531

Merged
wenshao merged 7 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-resumed-history-request-size-guard
May 31, 2026
Merged

fix(core): guard oversized resumed history sends#4531
wenshao merged 7 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-resumed-history-request-size-guard

Conversation

@Jerry2003826

@Jerry2003826 Jerry2003826 commented May 26, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Adds a hard guard for resumed histories that still exceed the request-size limit after compression/rescue, and defers compression recording until the guard accepts the rescued history.

Why it's needed

A resumed session could persist an oversized or misleading compression checkpoint before the final send-size guard decided the request still had to be rejected.

Reviewer Test Plan

How to verify

Run: npm test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hard-tier rescue"; npm test --workspace=packages/core -- src/core/geminiChat.test.ts; npx eslint packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts; npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts; npm run typecheck --workspace=packages/core.

Evidence (Before & After)

Before: an oversized resumed history could record a compression checkpoint before the final hard-size guard rejected the send. After: regression tests cover accepted hard rescue, rejected hard rescue, delayed recording, and restored token accounting. This is core/session behavior, so before/after TUI screenshots are N/A.

Tested on

OS Status
macOS CI only
Windows Tested locally
Linux CI only

Environment (optional)

Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.

Risk & Scope

  • Main risk or tradeoff: The change is limited to hard-rescue send-size guard behavior and chat-compression recording order.
  • Not validated / out of scope: No unrelated refactors, public API changes, UI redesigns, or behavior outside the linked issue scope.
  • Breaking changes / migration notes: None expected.

Linked Issues

Fixes #4363

中文说明

这个 PR 做了什么

为恢复会话后的超大历史增加最终请求大小保护,并且只有在保护通过后才记录压缩 checkpoint。

为什么需要

之前恢复会话在最终发送保护拒绝之前,可能先写入有误导性的压缩记录。

Reviewer Test Plan

本地在 Windows 跑了 geminiChat 相关定向测试、完整文件测试、eslint、prettier check 和 core typecheck;macOS/Linux 依赖 GitHub CI。

风险和范围

范围限制在 hard rescue 的发送大小保护和压缩记录顺序,不改公共 API。

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4526, and PR #4528 — all modifying core token estimation and prompt handling logic.

Key shared functions:

  • sendMessageStreampackages/core/src/core/client.ts
  • getLastPromptTokenCountpackages/core/src/agents/runtime/agent-interactive.ts, packages/core/src/core/geminiChat.ts
  • sendMessageStreampackages/core/src/core/geminiChat.ts
  • estimatePromptTokenspackages/core/src/services/tokenEstimation.ts

Recommendation: This PR guards oversized resumed history which shares client and token estimation code with other PRs in the group.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4526, and PR #4528 — all modifying core token estimation and prompt handling logic.

Key shared functions:

  • sendMessageStreampackages/core/src/core/client.ts
  • getLastPromptTokenCountpackages/core/src/agents/runtime/agent-interactive.ts, packages/core/src/core/geminiChat.ts
  • sendMessageStreampackages/core/src/core/geminiChat.ts
  • estimatePromptTokenspackages/core/src/services/tokenEstimation.ts

Recommendation: This PR guards oversized resumed history which shares client and token estimation code with other PRs in the group.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

Comment thread packages/core/src/core/geminiChat.ts Outdated
? estimatePromptTokens(
this.getHistoryShallow(true),
userContent,
0,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] lastPromptTokenCount is hardcoded to 0 here, forcing estimatePromptTokens into the char/4 heuristic branch even when this.lastPromptTokenCount has already been updated by tryCompress (set to info.newTokenCount on COMPRESSED, unchanged on failure). The codebase's own comments document char/4 as under-counting by ~15–20K tokens.

In the non-COMPRESSED branch (compression failure, history unchanged), shouldStopAfterHardRescue compares only localPromptTokensAfterCompression against hardLimit. But effectiveTokens — which triggered hard-rescue in the first place and uses the API-anchored lastPromptTokenCount when available — is never passed into the decision function. A resumed session where lastPromptTokenCount > 0 and the API-anchored count exceeds hard by less than ~15–20K tokens will trigger hard-rescue, fail compression, and then the guard will NOT fire because the char/4 re-estimate falls below hard. The oversized request proceeds to the API — the exact scenario this guard exists to prevent.

Consider passing this.lastPromptTokenCount instead of 0, or threading effectiveTokens into shouldStopAfterHardRescue and using Math.max(effectiveTokens, localPromptTokensAfterCompression) in the non-COMPRESSED branch (mirroring what getHardRescueFailureMessage already does for the error text).

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
): string {
const tokenCount =
compressionInfo.compressionStatus === CompressionStatus.COMPRESSED
? compressionInfo.newTokenCount

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] In the COMPRESSED branch, tokenCount is always compressionInfo.newTokenCount — even when the guard fired because localPromptTokensAfterCompression >= hardLimit (while newTokenCount < hardLimit). In that scenario the error message reports a token count below the hard limit, leaving users unable to understand why the guard triggered.

Consider using Math.max(compressionInfo.newTokenCount, localPromptTokensAfterCompression) so the reported value always matches or exceeds the limit that was breached:

Suggested change
? compressionInfo.newTokenCount
const tokenCount =
compressionInfo.compressionStatus === CompressionStatus.COMPRESSED
? Math.max(compressionInfo.newTokenCount, localPromptTokensAfterCompression)
: Math.max(effectiveTokens, localPromptTokensAfterCompression);

— qwen3.7-max via Qwen Code /review

imageTokenEstimate,
)
: 0;
if (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] When compression succeeds (COMPRESSED status), tryCompress has already called this.setHistory(newHistory) before this guard runs. If the guard then throws, the original pre-compression history is irreversibly replaced. The session file records the compressed (but still oversized) history, and --continue loads the same unusable state.

This is a design tradeoff — the original history was already too large to send — but it's worth documenting explicitly. Consider either:

  1. Adding a code comment noting that history mutation is intentional and the original is recoverable only from the session JSONL backup, or
  2. Snapshotting the history reference before tryCompress and restoring it on guard failure, so the user's session file retains the original conversation.

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
compressionInfo,
localPromptTokensAfterCompression,
);
debugLogger.warn(message);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The existing hard-rescue warn at ~line 1565 uses a [compaction] prefix and includes effectiveTokens, hard, and consecutiveFailures for greppability and correlation. This new warn uses the user-facing error message as-is, with no [compaction] tag and no prompt_id. In log aggregation, there's no way to filter for all oversized-history guard triggers or correlate with the upstream hard-rescue warn.

Consider adding a structured log prefix:

Suggested change
debugLogger.warn(message);
debugLogger.warn(
`[compaction] oversized-history guard rejected send: prompt_id=${prompt_id}, ` +
`effectiveTokens=${effectiveTokens}, localPromptTokensAfterCompression=${localPromptTokensAfterCompression}, ` +
`compressionNewTokenCount=${compressionInfo.newTokenCount}, hardLimit=${hard}, ` +
`compressionStatus=${compressionInfo.compressionStatus}`,
);

— qwen3.7-max via Qwen Code /review

});

it('rejects before request serialization when hard-rescue compression is still oversized', async () => {
chat.setHistory([

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] shouldStopAfterHardRescue has a branch that is not covered by either test: COMPRESSED + compressionInfo.newTokenCount < hard + localPromptTokensAfterCompression >= hard. This catches cases where the compression service's reported token count is optimistic but the local char/4 re-estimate still shows oversized. It's the branch most likely to catch real tokenizer mismatches in production.

Consider adding a third test where compression succeeds with newTokenCount just under hard but the returned newHistory is still large enough that local re-estimation exceeds the hard limit. For example, mock newTokenCount: 176_999 (just under hard) while keeping the 720K-char history content so char/4 estimates ~180K.

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
return localPromptTokensAfterCompression >= hardLimit;
}
return (
compressionInfo.newTokenCount >= hardLimit ||

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] In the COMPRESSED branch, compressionInfo.newTokenCount >= hardLimit is logically redundant. After tryCompress succeeds, this.lastPromptTokenCount = info.newTokenCount (line ~1404), so localPromptTokensAfterCompression = lastPromptTokenCount + estimateContentTokens(userContent) + imageTokenEstimate. Since the user-content and image estimates are non-negative, localPromptTokensAfterCompression >= newTokenCount always holds. If newTokenCount >= hardLimit, then localPromptTokensAfterCompression >= hardLimit is guaranteed to fire first.

The function could simplify to:

if (!shouldForceFromHard) return false;
return localPromptTokensAfterCompression >= hardLimit;

unless the explicit newTokenCount check serves as a defensive safety net worth documenting.

— qwen3.7-max via Qwen Code /review

if (
compressionInfo.compressionStatus === CompressionStatus.COMPRESSED &&
historyBeforeHardRescue
) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] When compression succeeds (COMPRESSED) but the guard still rejects, tryCompress has already called chatRecordingService.recordChatCompression() (line ~1394), writing a compression checkpoint to the JSONL session log. This rollback restores in-memory state (history, lastPromptTokenCount, telemetry) but the JSONL record persists.

On --continue resume, buildApiHistoryFromConversation would use the stale compressed snapshot as the reconstruction base, even though the in-memory session was rolled back. Consider either:

  1. Deferring recordChatCompression until after the guard passes (move the call from tryCompress into the caller).
  2. Emitting a compensating "compression rolled back" record after the restore.
  3. Documenting that this asymmetry is acceptable since the guard re-fires on resume anyway.

— qwen3.7-max via Qwen Code /review

chat.setHistory([
{ role: 'user', parts: [{ text: 'x'.repeat(720_000) }] },
{ role: 'model', parts: [{ text: 'ack' }] },
]);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Tests 2 and 3 both exercise the COMPRESSED-then-restore path but neither asserts chat.getLastPromptTokenCount() after the guard throws. The production code at geminiChat.ts:1622 restores lastPromptTokenCount, but if that line were accidentally removed, these tests would still pass.

Consider adding after the rejects.toThrow(...):

expect(chat.getLastPromptTokenCount()).toBe(0);

For stronger coverage, set a non-trivial value before the send (e.g., chat.setLastPromptTokenCount(50_000)) and assert it's restored — this verifies the round-trip through compression's mutation and back.

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
return (
`Context is too large to send safely after automatic compression. ` +
`Estimated prompt tokens: ${tokenCount}; hard limit: ${hardLimit}; ` +
`compression status: ${compressionInfo.compressionStatus}. ` +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] CompressionStatus is a numeric enum (COMPRESSED = 1, COMPRESSION_FAILED_EMPTY_SUMMARY = 4, etc.). The template literal ${compressionInfo.compressionStatus} renders as a raw integer (e.g., "compression status: 1") in this user-facing error message.

Suggested change
`compression status: ${compressionInfo.compressionStatus}. ` +
`compression status: ${CompressionStatus[compressionInfo.compressionStatus]}. ` +

This also applies to the debug log at line 1630 where the same pattern appears.

— qwen3.7-max via Qwen Code /review

});

it('rejects when compressed history is below hard but the pending user message pushes it over', async () => {
chat.setHistory([

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This test uses chat instead of chatWithRecording, so it cannot verify two important invariants that the other COMPRESSED+rollback tests check:

  1. recordChatCompression was NOT called (the deferred-record invariant that this PR introduces)
  2. History was rolled back to the original pre-compression state

This is the subtlest edge case (compressed history is below hard, but the pending user message tips it over) and the one most likely to regress. If a future change accidentally moves recordChatCompression before the guard, this test would not catch it.

Consider switching to chatWithRecording (same pattern as the "hard-rescue compression is still oversized" test above) and adding:

expect(recordChatCompression).not.toHaveBeenCalled();
expect(chatWithRecording.getHistory()[0].parts?.[0].text).toBe(
  'x'.repeat(720_000),
);

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
// Hard-rescue compression mutates in-memory history before this
// guard can compare the compressed prompt size. If the compressed
// prompt is still too large to send, restore the pre-compression
// state and defer the JSONL compression checkpoint until a guarded

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The comment says "defer the JSONL compression checkpoint until a guarded send is actually allowed below" — but the code throws at line 1647. There is no "below" that eventually writes the checkpoint; the send is rejected outright.

Suggested change
// state and defer the JSONL compression checkpoint until a guarded
// Hard-rescue compression mutates in-memory history before this
// guard can compare the compressed prompt size. If the compressed
// prompt is still too large to send, restore the pre-compression
// state. The JSONL compression checkpoint is intentionally NOT
// written because the send is about to be rejected.

— qwen3.7-max via Qwen Code /review

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated in 7541e55 for the latest comment wording suggestion.

The hard-rescue rollback comment now states that the JSONL compression checkpoint is intentionally not written because the send is about to be rejected. No behavior changed.

Validation run locally on Windows:

npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hard-rescue|oversized resumed history|compressed history is below hard"
npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts
npx eslint packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts

*/
deferChatCompressionRecord?: boolean;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The tryCompress method-level JSDoc (lines 1361-1367) states unconditionally that on COMPRESSED the method "recorded the event to chatRecordingService (if wired)" — but with this new deferChatCompressionRecord option, that's no longer true. When the option is set, recording is intentionally skipped inside tryCompress and deferred to the caller.

Consider updating the JSDoc to note the exception, e.g.:

* history, recorded the event to `chatRecordingService`
* (if wired, unless `options.deferChatCompressionRecord` is set), and

This prevents a future reader of the method signature from assuming the recording always happens inside tryCompress and introducing a double-write.

— qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Local Verification Report

PR: #4531fix(core): guard oversized resumed history sends
Verified by: Maintainer (local build + test + code review)
Date: 2026-05-28


1. Build & Static Analysis

Check Result
Core build (npm run build) PASS
Typecheck (tsc --noEmit) PASS — zero errors
ESLint PASSgeminiChat.ts + geminiChat.test.ts both clean
Prettier PASS — all matched files use Prettier code style

2. Targeted Tests (PR author's recommended commands)

Test Count Result
geminiChat.test.ts -t 'hard-tier rescue' 6 tests ALL PASSED
geminiChat.test.ts -t 'rejects before request serialization' 2 tests ALL PASSED
geminiChat.test.ts (full file) 150 tests ALL PASSED

3. Full Core Regression Suite

Metric Result
Total tests 9428
Passed 9396
Failed 29 (pre-existing*)
Skipped 3
geminiChat.test.ts 150/150 passed, 0 failures

* Pre-existing failures in gitDiff.test.ts (21), crawler.test.ts (7), anthropicContentGenerator.test.ts (1) — all unrelated to this PR. Same failures observed on other branches.

4. New Test Coverage (4 new + 1 enhanced test case)

Test Case What it verifies
hard-rescue forces compression and sends (enhanced) Deferred recordChatCompression is called exactly once with correct payload after guard passes
rejects before request serialization when oversized resumed history cannot be compressed Guard rejects when compression fails (COMPRESSION_FAILED_EMPTY_SUMMARY), generateContentStream never called, token count preserved at 0
rejects before request serialization and restores history when hard-rescue compression is still oversized Guard rejects when compressed result still >= hard limit, history rolled back to pre-compression state, recordChatCompression never called, lastPromptTokenCount restored
rejects when compressed history is below hard but the pending user message pushes it over Guard catches the case where compression alone brings tokens under hard limit, but adding the pending user message pushes total back over
seeds inherited token count (fixed) Now mocks contextWindowSize: 200_000 to prevent hard-rescue interference

5. Code Review

Problem solved (Fixes #4363):
A resumed session could persist an oversized or misleading compression checkpoint to the JSONL recording before the final send-size guard decided the request still had to be rejected.

Solution — 3 changes:

  1. deferChatCompressionRecord option (line 193): tryCompress() now conditionally skips recordChatCompression() when the caller needs to run post-compression guards first.

  2. Post-compression guard (lines 1607-1648): After hard-rescue compression, re-estimates prompt tokens including the pending user message. If still >= hard limit:

    • Restores pre-compression history snapshot
    • Restores lastPromptTokenCount and telemetry counter
    • Throws descriptive error (no API call, no JSONL checkpoint)
  3. Deferred recording (lines 1649-1657): Only writes the compression checkpoint after the guard passes — ensuring the JSONL accurately reflects what will actually be sent.

Key invariants verified:

  • generateContentStream is never called when the guard rejects (no wasted API request)
  • History is restored to pre-compression state on rejection (no data loss)
  • lastPromptTokenCount and telemetry are restored on rejection (accurate accounting)
  • recordChatCompression is called exactly once on success, zero times on rejection
  • Helper functions (shouldStopAfterHardRescue, getHardRescueFailureMessage) are module-private — correct encapsulation

No issues found. Changes are minimal, well-documented (7 clean commits), and strictly scoped to the hard-rescue send-size guard flow.

6. Summary

All 150 geminiChat tests pass (including 4 new + 1 enhanced). Build, typecheck, lint, and format all clean. The fix correctly prevents oversized resumed histories from writing misleading compression checkpoints and ensures the send-size guard rejects before any API call.

Recommendation: Ready to merge

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@yiliang114

Copy link
Copy Markdown
Collaborator

Hi @Jerry2003826, this PR is directly relevant to #4624 (resume OOM) which we're actively working on — the "guard oversized resumed history" direction makes sense as an additional safety net.

Before we can merge, we need you to demonstrate the problem and the fix with concrete evidence:

  1. Reproduction — show us a session that actually triggers this guard. For example:

    • How large does the .jsonl session file need to be?
    • What model / context window triggers the "still exceeds request-size limit after compression"?
    • Can you share the exact error or behavior you observed without this fix?
  2. Before/After screenshots or logs — something like:

    • Before: FATAL ERROR: ... or the specific error message when sending oversized history
    • After: the guard fires, user sees [message], session continues normally
  3. Threshold justification — how is the hard limit calculated? Is it contextWindowSize * tokensPerChar? A fixed byte cap? Why that specific value?

We recently landed #4644 which addresses the transient OOM from structuredClone peaks. Your PR would complement it by catching the "history is genuinely too large even after compression" edge case — but we need to see it actually triggered to validate the approach.

Thanks for the contribution!

@yiliang114 yiliang114 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise this looks reasonable to me: the rollback/recording order is safer, the guard prevents the final oversized send, and the targeted tests cover the new invariants.

Non-blocking suggestion: it would still be useful to add a short before/after log for the original #4363 synthetic resumed-history repro, just to confirm the real path reaches this guard rather than failing earlier during the compression request itself.

@wenshao wenshao merged commit 7a31c80 into QwenLM:main May 31, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Oversized resumed history can fail with Invalid string length

4 participants