Skip to content

fix(core): bound hard rescue compression retries#4526

Open
Jerry2003826 wants to merge 9 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-bound-hard-rescue-retries
Open

fix(core): bound hard rescue compression retries#4526
Jerry2003826 wants to merge 9 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-bound-hard-rescue-retries

Conversation

@Jerry2003826

@Jerry2003826 Jerry2003826 commented May 26, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Bounds hard-rescue compression retries so an oversized request cannot loop through repeated rescue attempts indefinitely.

Why it's needed

Hard-rescue compression needs a clear retry limit and deterministic exit path when the request remains too large.

Reviewer Test Plan

How to verify

Run the PR-targeted hard-rescue compression tests in packages/core, then run eslint/prettier/typecheck for the touched core files. The GitHub CI matrix also runs lint and tests on macOS, Windows, and Linux.

Evidence (Before & After)

Before: retry behavior could continue longer than intended when hard rescue stayed oversized. After: regression coverage pins bounded retry behavior and the terminal failure path. This is core request-size behavior, so TUI screenshots are N/A.

Tested on

OS Status
macOS CI only
Windows Tested locally
Linux CI only

Environment (optional)

Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.

Risk & Scope

  • Main risk or tradeoff: Limited to the hard-rescue compression retry loop; normal compression thresholds and send paths are out of scope.
  • Not validated / out of scope: No unrelated refactors, public API changes, UI redesigns, or behavior outside the linked issue scope.
  • Breaking changes / migration notes: None expected.

Linked Issues

Fixes #4346

中文说明

这个 PR 做了什么

限制 hard-rescue 压缩重试次数,避免请求仍然超大时无限重试。

为什么需要

hard-rescue 需要明确的重试上限和可预测的退出路径。

Reviewer Test Plan

本地按 PR 相关 core 测试、lint、format check、typecheck 验证;macOS/Linux 由 GitHub CI 验证。

风险和范围

只影响 hard-rescue 压缩重试循环,不改普通压缩阈值。

@wenshao wenshao added the type/bug Something isn't working as expected label May 26, 2026
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated the PR description to the latest template with Reviewer Test Plan, Evidence, Tested on, Risk & Scope, and Chinese summary.

No code changes were needed for this update; CI is already green across macOS, Windows, Linux, lint, and CodeQL.

@wenshao

wenshao commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Hi @Jerry2003826, thanks for the PRs! 👋

I noticed that the Chinese text inside the <details> collapsible sections of several of your PRs is showing as garbled characters (???? / ??). The English parts render fine — only the Chinese inside <details> is corrupted.

Affected PRs:

PRs without this issue (for reference): #4524, #4520, #4518

This is likely an encoding issue when the <details> block content is pasted or posted via the GitHub API. A few things to check:

  1. If using gh pr create --body: make sure the body content is UTF-8 encoded. The --body flag reads the string as-is — if your terminal or editor produces a different encoding, Chinese characters will be mangled.
  2. If copying from a file: try gh pr create --body-file description.md instead, which reads the file with proper encoding.
  3. The <details> tag itself may be a factor: GitHub's markdown parser handles UTF-8 inside <details> fine, but some clipboard tools or paste mechanisms may strip non-ASCII chars from HTML-like blocks.

You can fix the existing descriptions with:

gh pr edit <number> --body-file fixed-description.md

No code changes needed — just a description update. Thanks!

wenshao
wenshao previously approved these changes May 26, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review findings. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Thanks for pointing this out. I updated the affected PR descriptions using UTF-8 --body-file and replaced the garbled Chinese <details> sections in #4519, #4517, #4526, #4525, #4522, and #4521. Code is unchanged.

@wenshao

wenshao commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Local verification report (Linux, tmux)

Verified PR head 45e84deb7 in an isolated worktree against main on Linux (Node v22.22.2, npm 10.9.7) inside a dedicated tmux session (tmux: pr4526). All commands from the PR's "How to verify" section were run; results below.

Environment

Item Value
OS Linux (Debian 13, kernel 6.12)
Node v22.22.2
npm 10.9.7
Worktree pr-4526 @ 45e84deb7
Test runner vitest 3.2.4 (workspace @qwen-code/qwen-code-core)

Results

Command Outcome
npm install OK (1374 packages, prepare/build/bundle ran)
npm run test ... geminiChat.test.ts -t "hard-rescue" 4 passed, 146 skipped (150 total)
npm run test ... geminiChat.test.ts -t "hard-tier rescue" 6 passed, 144 skipped (150 total)
npm run test ... geminiChat.test.ts (full file) 150 passed / 150, ~10.9s
npx prettier --check geminiChat.ts geminiChat.test.ts All matched files use Prettier code style
npx eslint geminiChat.ts geminiChat.test.ts 0 errors / 0 warnings (exit 0)
npm run lint --workspace=@qwen-code/qwen-code-core exit 0
npm run typecheck --workspace=@qwen-code/qwen-code-core (tsc --noEmit) exit 0
git diff --check (PR files) no whitespace errors

Behavior covered by the new tests

The three new specs in packages/core/src/core/geminiChat.test.ts exercise exactly the bounding semantics described in the PR description and were observed passing:

  • stops pre-send hard-rescue after repeated failed hard-tier compactions — after MAX_CONSECUTIVE_FAILURES failed rescues, compress is not called on subsequent sends (skipped path), while sends still flow through generateContentStream. Asserts compressSpy calls = MAX_CONSECUTIVE_FAILURES over MAX_CONSECUTIVE_FAILURES + 1 iterations.
  • counts thrown hard-rescue attempts toward the retry bound — rejected promises (thrown errors) still consume a strike; once bounded, the next send proceeds without another rescue attempt.
  • refunds hard-rescue NOOP results so history-too-small does not trip the retry boundNOOP decrements the counter, so MAX_CONSECUTIVE_FAILURES + 1 iterations all attempt rescue (the strike is refunded each time). This isolates "rescue worked / nothing to compress" from "rescue is broken".

The pre-existing companion test forwards latched consecutiveFailures into hard-rescue (no pre-call reset); success recovers via the post-call branch also continues to pass, so the prior cheap-gate breaker semantics for consecutiveFailures remain intact — the new hardRescueFailureCount is a separate counter, not a replacement.

Diff-level sanity checks (read alongside the tests)

  • hardRescueFailureCount is incremented pessimistically before the rescue call in the shouldForceFromHard branch.
  • When isHardTier && !shouldForceFromHard (bound reached), the code short-circuits with a synthesized NOOP compressionInfo instead of calling tryCompress, so the existing reactive overflow path remains the fallback — no behavior change downstream.
  • NOOP is refunded via hardRescueFailureCount = Math.max(0, hardRescueFailureCount - 1).
  • COMPRESSED resets both consecutiveFailures and hardRescueFailureCount (post-call branch in the COMPRESSED handler), matching the recovery story.

Notes

  • PR description's "Tested on" table marks Linux as "not tested locally" — this report covers that gap (and the original three CI lanes already pass on the PR).
  • No UI surface changed; no manual TUI run was required.

Conclusion

LGTM from a verification standpoint. All commands listed in the PR's reviewer test plan pass locally on Linux; no extra failures or warnings observed outside the documented behavior. Safe to merge from this PR's own scope of changes.

Verification artifacts (logs) retained locally at /tmp/pr4526-{install,test-hard-rescue,test-hard-tier,test-full,prettier,eslint,lint,typecheck}.log.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4528, and PR #4531 — all modifying core token estimation and prompt handling logic.

Key shared functions:

  • sendMessageStreampackages/core/src/core/client.ts, packages/core/src/core/geminiChat.ts
  • setLastPromptTokenCountpackages/core/src/core/geminiChat.ts, packages/core/src/telemetry/uiTelemetry.ts
  • stripTrailingSessionStartContextBlockpackages/core/src/core/geminiChat.ts
  • tryCompresspackages/core/src/core/geminiChat.ts

Recommendation: This PR modifies rescue compression retries which shares geminiChat and token estimation code with other PRs.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4528, and PR #4531 — all modifying core token estimation and prompt handling logic.

Key shared functions:

  • sendMessageStreampackages/core/src/core/client.ts, packages/core/src/core/geminiChat.ts
  • setLastPromptTokenCountpackages/core/src/core/geminiChat.ts, packages/core/src/telemetry/uiTelemetry.ts
  • stripTrailingSessionStartContextBlockpackages/core/src/core/geminiChat.ts
  • tryCompresspackages/core/src/core/geminiChat.ts

Recommendation: This PR modifies rescue compression retries which shares geminiChat and token estimation code with other PRs.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@wenshao

wenshao commented May 27, 2026

Copy link
Copy Markdown
Collaborator

Maintainer Verification Report

PR: #4526 — fix(core): bound hard rescue compression retries
Branch: Jiarui/fix-bound-hard-rescue-retries
Tested on: macOS Darwin 25.4.0

Build & Lint

  • npm run buildPASS (no errors)
  • npx tsc -p packages/core/tsconfig.json --noEmitPASS (no type errors)
  • npx eslint on changed files — PASS (0 errors, 0 warnings)

Unit Tests

Test Suite Tests Result
geminiChat.test.ts 150 (including 3 new) PASS
chatCompressionService.test.ts 69 PASS

New Test Coverage (#4346)

All 3 new test cases pass:

  1. Bounded failed compactions: After MAX_CONSECUTIVE_FAILURES (3) failed hard-tier compactions, hard-rescue stops firing; compress called exactly 3 times across 4 sends — PASS
  2. Thrown errors count toward bound: Compression exceptions count as failures; after 3 throws, the next send skips hard-rescue and proceeds directly — PASS
  3. NOOP refund: NOOP results (history too small to compress) decrement the failure counter, so they don't consume retry budget — PASS

Code Review Notes

  • Scope: 2 files changed (geminiChat.ts +62/-45, geminiChat.test.ts +105) — focused on the hard-rescue retry path only
  • No regressions: All 150 existing geminiChat tests + 69 compression service tests pass
  • Design:
    • New hardRescueFailureCount counter is separate from consecutiveFailures (which governs the cheap-gate breaker) — correct separation of concerns
    • Counter increments before the rescue attempt (pre-increment), so thrown errors still consume budget
    • NOOP refund via Math.max(0, count - 1) prevents underflow
    • COMPRESSED success resets both consecutiveFailures and hardRescueFailureCount to 0
    • When hard-rescue is exhausted, a synthetic NOOP compressionInfo is returned and the send proceeds; reactive overflow recovery remains the fallback safety net
  • Debug observability: Added hardRescueFailureCount to the warn log when triggered, and a separate log when skipped — good for post-incident investigation

Verdict

Ready to merge. The fix is well-bounded, correctly separates hard-rescue retry tracking from the cheap-gate breaker, and all existing tests pass without regressions.


Verified by: wenshao

Comment thread packages/core/src/core/geminiChat.ts Outdated
const shouldForceFromHard =
isHardTier && this.hardRescueFailureCount < MAX_CONSECUTIVE_FAILURES;
if (shouldForceFromHard) {
this.hardRescueFailureCount += 1;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardRescueFailureCount is incremented before the async tryCompress call. If tryCompress throws (user abort via AbortSignal, transient network error), neither the NOOP refund (line 1557) nor the COMPRESSED reset (line 1393) applies, so the counter stays inflated.

Three user-initiated aborts during hard-tier rescue would exhaust MAX_CONSECUTIVE_FAILURES and permanently skip hard rescue for the rest of the session — even though compression was never truly attempted and failed.

Consider moving the increment after tryCompress returns (and only for non-NOOP, non-COMPRESSED results), or decrementing in a catch/finally block for thrown errors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3543b6c.

hardRescueFailureCount is no longer pre-incremented before tryCompress(). It is now incremented only after tryCompress() returns a hard-rescue compression failure status, so thrown compression errors/aborts do not consume the retry budget. The merge resolution also keeps main's hard-rescue history restore and deferred recording guard.

Validation:

  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hard-rescue|hard-tier rescue"
  • npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts
  • npx eslint packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts
  • npm run typecheck --workspace=packages/core

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

设计思路正确,NOOP refund 和 COMPRESSED reset 的逻辑都没问题。留了一个 inline comment 关于 hardRescueFailureCounttryCompress 之前 pre-increment —— 用户连续 abort 3 次会耗尽 rescue 预算。影响有限(用户可以 /compress 恢复),不 block,供参考。

@BZ-D

BZ-D commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

This PR has merge conflicts with main — please rebase or merge main to resolve before merging.

BZ-D
BZ-D previously approved these changes Jun 1, 2026
@Jerry2003826 Jerry2003826 dismissed stale reviews from BZ-D and wenshao via 3543b6c June 1, 2026 08:39
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Merged current origin/main in 3543b6c and resolved the geminiChat.ts hard-rescue conflict.

I also handled the inline hardRescueFailureCount feedback: thrown tryCompress() errors/aborts no longer consume the hard-rescue retry budget. The counter now increments only when tryCompress() returns a compression failure status, while compressed results still reset it and NOOP still leaves it unchanged.

Validation:

  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hard-rescue|hard-tier rescue"
  • npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts
  • npx eslint packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts
  • npm run typecheck --workspace=packages/core

Comment thread packages/core/src/core/geminiChat.ts Outdated
shouldForceFromHard &&
isCompressionFailureStatus(compressionInfo.compressionStatus)
) {
this.hardRescueFailureCount += 1;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The hardRescueFailureCount bound is bypassed when compression returns COMPRESSED but the post-compression guard at shouldStopAfterHardRescue still rejects the send.

Root cause: tryCompress resets this.hardRescueFailureCount = 0 on COMPRESSED (line 1409). The post-guard then re-estimates the prompt size and throws if still >= hard, rolling back history. But the counter is already 0. On the next send, the counter is 0 again → hard-rescue fires → tryCompress returns COMPRESSED → counter resets → guard throws → repeat indefinitely. Each send costs one compression side-query that the PR's bound was designed to prevent.

Trace:

  1. Send N: hardRescueFailureCount=0shouldForceFromHard=true
  2. tryCompressCOMPRESSED (resets counter to 0)
  3. Post-guard: localPromptTokensAfterCompression >= hard → rollback + throw
  4. Send N+1: same cycle — counter never accumulates

The existing test "rejects before request serialization and restores history when hard-rescue compression is still oversized" only tests a single attempt, so it doesn't catch the loop.

Suggested change
this.hardRescueFailureCount += 1;
if (
shouldForceFromHard &&
(isCompressionFailureStatus(compressionInfo.compressionStatus) ||
compressionInfo.compressionStatus === CompressionStatus.COMPRESSED)
) {
this.hardRescueFailureCount += 1;
}

Alternatively, increment hardRescueFailureCount in the rollback block (around line 1637) when compressionInfo.compressionStatus === CompressionStatus.COMPRESSED. This keeps the increment co-located with the rejection decision rather than relying on the counter surviving the tryCompress reset.

— qwen-latest-series-invite-beta-v38 via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

// (or any successful compaction) recovers a chat whose breaker had
// tripped.
this.consecutiveFailures = 0;
this.hardRescueFailureCount = 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This unconditional reset creates a second unbounded path. When compression returns NOOP, isCompressionFailureStatus(NOOP) is false (line 1604), so the counter is never incremented — yet the post-guard throws when the prompt remains oversized (because the char/4 estimator over-counted). Like the COMPRESSED gap above, this lets one compression side-query burn per send indefinitely.

The test "does not count hard-rescue NOOP results toward the retry bound" explicitly verifies this, and the rationale ("a too-small history slice is not a broken rescue mechanism") is sound in principle. However, when the token estimator persistently over-counts by 15-20K (as documented in the code comments), the session enters a permanent per-send compression tax with no recovery path.

Consider adding a separate, higher bound for NOOP outcomes (e.g., MAX_CONSECUTIVE_FAILURES * 3), or at minimum a debug log when NOOP + oversized recurs more than a few times, so the pattern is visible in diagnostics.

— qwen-latest-series-invite-beta-v38 via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

@Jerry2003826 Jerry2003826 force-pushed the Jiarui/fix-bound-hard-rescue-retries branch from 3543b6c to 843ca1c Compare June 1, 2026 14:59

if (isHardTier && !shouldForceFromHard) {
compressionInfo = {
originalTokenCount: effectiveTokens,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The synthesized NOOP branch here is implicitly traversed by the post-loop recovery calls in the compressed-oversized and failed-compaction tests (the (MAX+1)-th send), but no test explicitly asserts the outcome. Consider adding an assertion after the post-loop call that proves compress() was NOT called on the (MAX+1)-th attempt and that the send succeeded via generateContentStream without compression:

// After the for-loop + post-loop send:
const callsBeforeBound = compressSpy.mock.calls.length;
// ... consume stream ...
expect(compressSpy).toHaveBeenCalledTimes(callsBeforeBound); // no new compression call

This makes the test's intent explicit and guards against regressions if the synthesized NOOP branch logic changes.

— qwen-latest-series-invite-beta-v38 via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

…-rescue-retries

# Conflicts:
#	packages/core/src/core/geminiChat.ts
@Jerry2003826 Jerry2003826 force-pushed the Jiarui/fix-bound-hard-rescue-retries branch from 843ca1c to a19b21b Compare June 1, 2026 16:34
BZ-D
BZ-D previously approved these changes Jun 2, 2026

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the diff and ran the targeted GeminiChat tests; no blocking issues found.

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Addressed the hard-rescue NOOP retry-bound feedback in 8b1f9c486.

Change:

  • A hard-rescue attempt that reaches the post-compression hard-limit guard now counts as one hard-rescue failure regardless of whether compression returned COMPRESSED, NOOP, or another non-throwing status.
  • COMPRESSED still restores the pre-compression history/token state before throwing when the compressed request remains too large.
  • Added/updated regression coverage so repeated NOOP hard-rescue results stop spending compression calls after the bound, and the next send proceeds without another compression attempt.

Validation:

  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hard-rescue"
  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts
  • npm run lint --workspace=packages/core
  • npm run typecheck --workspace=packages/core
  • npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts

wenshao
wenshao previously approved these changes Jun 4, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Critical findings. 7 Suggestions (6 inline + 1 below): the error log at the shouldStopAfterHardRescue throw site includes consecutiveFailures but omits hardRescueFailureCount — adding it would aid debugging hard-rescue boundary behavior.

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
shouldForceFromHard &&
isCompressionFailureStatus(compressionInfo.compressionStatus)
) {
this.hardRescueFailureCount += 1;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This inline += 1 is dead code in all tested paths. When compression returns a failure status, tokens remain above hard, so shouldStopAfterHardRescue returns true and the snapshot+override at line ~1634 (= hardRescueFailureCountBeforeHardRescue + 1) overwrites this increment with the same value. Both produce N+1, so this write has no runtime effect.

The dual-write pattern makes the counter harder to reason about — future maintainers may think there are two independent increment paths, or that removing this line would change behavior.

Suggested change
this.hardRescueFailureCount += 1;

Remove this if block and rely solely on the post-guard override at line ~1634, which already handles all three failure modes (COMPRESSED-but-oversized, failure status, NOOP).

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

Comment thread packages/core/src/core/geminiChat.ts Outdated
* Number of failed hard-tier rescue attempts for this chat. Hard rescue is
* forced and therefore bypasses the cheap-gate breaker, so it needs its own
* bound to avoid spending one compression side-query on every send when
* history repeatedly cannot shrink. NOOP is refunded because a too-small

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The doc says "NOOP is refunded" but NOOP results actually count toward the bound. Tracing the path: when tryCompress returns NOOP, isCompressionFailureStatus(NOOP) is false (so the inline += 1 at line 1609 doesn't fire), but shouldStopAfterHardRescue returns true because tokens are unchanged, and the snapshot+override at line ~1634 sets hardRescueFailureCount = snapshot + 1. The test "stops hard-rescue after repeated NOOP results are still oversized" confirms NOOP results exhaust the budget.

Future readers will believe NOOP is free and may misdiagnose why hard-rescue stops firing.

Suggested change
* history repeatedly cannot shrink. NOOP is refunded because a too-small
* bound to avoid spending one compression side-query on every send when
* history repeatedly cannot shrink. NOOP counts toward the bound via the
* post-compression guard (the prompt remains oversized); COMPRESSED resets
* this counter unless the post-compression hard-limit guard still rejects
* the send.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

Comment thread packages/core/src/core/geminiChat.ts Outdated
);
} else if (isHardTier) {
debugLogger.warn(
`[compaction] hard-tier rescue skipped after ${this.hardRescueFailureCount} failed attempts; relying on reactive overflow recovery. effectiveTokens=${effectiveTokens}, hard=${hard}.`,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This "skipped" log omits prompt_id, while the "triggered" log two lines above includes it. When grepping logs for a specific prompt_id to trace a user's failing request, the skip event is invisible — you see "triggered" for prompts 1–3, then silence.

Suggested change
`[compaction] hard-tier rescue skipped after ${this.hardRescueFailureCount} failed attempts; relying on reactive overflow recovery. effectiveTokens=${effectiveTokens}, hard=${hard}.`,
`[compaction] hard-tier rescue skipped after ${this.hardRescueFailureCount} failed attempts; relying on reactive overflow recovery. prompt_id=${prompt_id}, effectiveTokens=${effectiveTokens}, hard=${hard}.`,

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

? this.getHistoryShallow()
: undefined;
const lastPromptTokenCountBeforeHardRescue = this.lastPromptTokenCount;
const hardRescueFailureCountBeforeHardRescue =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This snapshot exists for a subtle reason: tryCompress resets hardRescueFailureCount to 0 on COMPRESSED success (line ~1410), but the post-compression guard below may still reject the send. Without the snapshot, the override would compute 0 + 1 = 1 instead of N + 1, losing accumulated progress.

This is the most subtle pattern in the PR and the easiest landmine for a future maintainer — someone might think the snapshot is redundant and use this.hardRescueFailureCount directly, breaking the COMPRESSED-but-oversized path.

Consider adding an inline comment:

// Snapshot before tryCompress: tryCompress resets hardRescueFailureCount
// to 0 on COMPRESSED success, but the post-compression guard below may
// still reject the send. Without this snapshot, the override would
// compute 0 + 1 instead of N + 1.
const hardRescueFailureCountBeforeHardRescue =
  this.hardRescueFailureCount;

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

);
});

it('rejects when compressed history is below hard but the pending user message pushes it over', async () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This test verifies that after hard-rescue exhaustion, the next send proceeds to the model with a mocked success response. But the debug log at geminiChat.ts:1577 explicitly says "relying on reactive overflow recovery" — the designed fallback is that the API returns context-overflow, triggering reactive compression. No test exercises this end-to-end path.

Consider adding a test that: (1) exhausts hardRescueFailureCount, (2) sends one more message that enters the synthesized NOOP branch, (3) mocks the model to throw a context-overflow error, (4) asserts that reactive compression fires and consecutiveFailures increments.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

);
});

it('stops hard-rescue after repeated compressed results are still oversized', async () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No test verifies the recovery path: after accumulating hard-rescue failures, a subsequent successful COMPRESSED send should reset hardRescueFailureCount to 0 (line ~1410 in tryCompress). If the reset line were accidentally removed, a session with past failures would permanently lose hard-rescue capability — a silent regression.

Consider adding a test: (1) accumulate 1–2 hard-rescue failures (below MAX), (2) trigger a successful COMPRESSED send, (3) send again in hard-tier and assert compress is called with force=true (proving hard-rescue fired again, which requires counter < MAX).

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

DragonnZhang
DragonnZhang previously approved these changes Jun 8, 2026

@DragonnZhang DragonnZhang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new findings beyond the existing inline comments. The hard-rescue retry bound is correctly implemented: hardRescueFailureCount is incremented only after shouldStopAfterHardRescue rejects (not pre-incremented), thrown tryCompress errors do not consume the budget, the snapshot/restore pattern preserves correct counter state across COMPRESSED-but-oversized paths, and the synthesized NOOP short-circuit avoids wasted compression calls when the bound is exhausted. Test coverage spans all five key paths (compressed-oversized, failed, noop, thrown, and reactive fallback). LGTM! ✅ — qwen3-coder-plus via Qwen Code /review

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Merged latest origin/main into this branch to clear the merge conflict.

Validation after merge:

  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "sendMessageStream hard-tier rescue"
  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts
  • npm run lint --workspace=packages/core
  • npm run typecheck --workspace=packages/core
  • npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts

@wenshao

wenshao commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /review

@github-actions

Copy link
Copy Markdown
Contributor
_Qwen Code review request accepted. Review is queued in [workflow run](https://github.com/QwenLM/qwen-code/actions/runs/27307216562)._

wenshao
wenshao previously approved these changes Jun 10, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The "hard-tier rescue stopped oversized prompt" debug log at geminiChat.ts:1716 includes consecutiveFailures but omits hardRescueFailureCount. Since consecutiveFailures is never incremented in the force=true path, it's diagnostically uninformative here. Adding hardRescueFailureCount=${this.hardRescueFailureCount} to this log would make it self-contained for debugging hard-rescue bound exhaustion.

— qwen3.7-max via Qwen Code /review

compressionInfo,
localPromptTokensAfterCompression,
);
if (shouldForceFromHard) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This increment only runs when shouldStopAfterHardRescue returns true (i.e., tryCompress returned normally with an oversized result). When compress() throws (e.g., network error, revoked API key), the exception propagates before reaching this block, so hardRescueFailureCount is never incremented. The test "does not count thrown hard-rescue attempts toward the retry bound" explicitly validates this: MAX_CONSECUTIVE_FAILURES + 1 throws all call compress(), proving the throw path is unbounded.

In a permanently broken compression service, every subsequent send wastes one compression side-query with no upper limit. Consider either adding a throw-path increment (e.g., in a catch around tryCompress in the hard-rescue path) or documenting this as an intentional tradeoff (transient errors should not permanently disable hard-rescue).

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this behavior unchanged. Thrown hard-rescue compression attempts intentionally do not consume hardRescueFailureCount; the bound is for completed hard-rescue attempts whose result is still oversized or otherwise unusable. A thrown compression call still propagates as the original failure instead of hiding it behind the rescue bound.

This is covered by does not count thrown hard-rescue attempts toward the retry bound in geminiChat.test.ts.

@Jerry2003826 Jerry2003826 dismissed stale reviews from wenshao and DragonnZhang via e718dbf June 11, 2026 08:32
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated in e718dbfa8 for the latest diagnostic-log suggestion.

Changes:

  • Added hardRescueFailureCount to the hard-tier rescue stopped-oversized-prompt warning log.
  • Added a regression assertion so this field stays present in the hard-rescue bound path.

Validation:

  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "sendMessageStream hard-tier rescue"
  • npm run lint --workspace=packages/core
  • npm run typecheck --workspace=packages/core
  • npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts

@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Runtime verification — real TUI, tmux-driven (maintainer merge reference)

Verdict: PASS — built PR head e718dbf locally and drove the actual TUI through the hard-rescue loop. The bound, the deterministic exit, the reactive-overflow fallback, and the counter reset all behave exactly as described, and the before/after contrast against current main reproduces #4346 cleanly. CI-independent details below.

Method

  • PR worktree built locally (npm install && npm run build), real TUI (packages/cli/dist/index.js) run inside tmux with an isolated $HOME.
  • A local mock OpenAI-compatible server made the scenario deterministic:
    • chat replies report usage.prompt_tokens=125000, which the chat adopts as lastPromptTokenCount, so every following send sits over the hard tier (contextWindowSize=8000hard=8000);
    • compression side-queries (recognized by the summarizer system prompt) return an empty summary (→ COMPRESSION_FAILED_EMPTY_SUMMARY) or a valid one (→ COMPRESSED), switchable per step;
    • optional one-shot HTTP 400 prompt is too long to trigger reactive overflow.
  • Three independent evidence streams: tmux pane captures, the mock's request log (chat vs compress), and ~/.qwen/debug [compaction] lines.

Steps (PR build)

  1. ✅ Send pre-release: fix ci #1 — reply ok, footer jumps to >100% context (token-count poisoning works).
  2. ✅ Sends Where is the config saved? #2Are you interested in AI Terminal? #4 — each rejected with compression status: COMPRESSION_FAILED_EMPTY_SUMMARY; debug log counts up precisely:
    [compaction] hard-tier rescue triggered: ... hardRescueAttempt=1, consecutiveFailures=0.
    [compaction] hard-tier rescue stopped oversized prompt: ... hardRescueFailureCount=1, ...
    [compaction] hard-tier rescue triggered: ... hardRescueAttempt=2, ...   → hardRescueFailureCount=2
    [compaction] hard-tier rescue triggered: ... hardRescueAttempt=3, ...   → hardRescueFailureCount=3
    
    Mock log: exactly one compress request per rejected send (3 total).
  3. ✅ Send TypeError in Authentication Selection Interface #5bound kicks in:
    [compaction] hard-tier rescue skipped after 3 failed attempts; relying on reactive overflow recovery. ... effectiveTokens=125005, hard=8000.
    
    No compress request hits the wire; the chat request goes through and the reply renders. The session is usable again.
  4. ✅ Sends OpenAI API Error: 401 Incorecct API Key provided #6API Error: Streaming setup timeout after 45s #9 (incl. a 6 KB paste) — steady state: zero further compress side-queries. The per-send side-query burn is gone.
  5. 🔍 /compress with a working summarizer → Chat history compressed from 125000 to 6050 tokens. Then the next oversized send logs hardRescueAttempt=1 (was 3) — COMPRESSED resets the bound, and consecutiveFailures stays independent throughout (visible in the same log lines).
  6. 🔍 Reactive-overflow fallback, fresh session: 3 failed attempts → bound exhausted → next send: rescue skipped → API rejects with 400 prompt is too long → reactive compression fires → retry succeeds:
    requests: chat ×2 → compress(fail) ×3 → chat → 400 overflow → compress(good) → chat OK
    TUI: "IMPORTANT: This conversation approached the input token limit for mock-model.
          A compressed context will be sent for future messages (compressed from: 125002 to 6052 tokens)."
    
    This is the unit test “falls back to reactive overflow recovery after the hard-rescue bound is exhausted” reproduced end-to-end at the real surface.
  7. 🔍 Failed manual /compress attempts (two COMPRESSION_FAILED_INFLATED_TOKEN_COUNT results during setup) neither incremented nor reset hardRescueFailureCount — consistent with the increment being scoped to shouldForceFromHard and the reset to COMPRESSED.

Baseline contrast (current main @ 5854e28, same mock, same scenario)

5 oversized sends → 5× "hard-tier rescue triggered" (no attempt field, no bound)
                 → 5× "hard-tier rescue stopped"  → 5 compress side-queries burned
                 → every send rejected, forever; 0× "rescue skipped"

The TUI shows the same Context is too large… error on every send with no way forward — the unbounded loop of #4346. With the PR: 3 strikes, then sends flow again while reactive overflow remains the safety net.

main (5854e28) PR (e718dbf)
Rescue attempts when history can't shrink unbounded (1 per send) exactly 3
Compression side-query cost per send after exhaustion 1 per send, forever 0
Chat usability after repeated failure stuck (every send rejected) recovers (send proceeds; API-side overflow handled reactively)
Recovery after a successful compression n/a bound re-arms (hardRescueAttempt=1)

Observations (non-blocking)

  • The new hardRescueAttempt=N / hardRescueFailureCount=N log fields made the runtime behavior precisely auditable — they're genuinely useful for support diagnostics, not just decoration.
  • The skip path deliberately lets an oversized request reach the API. In my run the (mock) API accepted it; when a real API rejects it, the reactive path recovers (step 6). Worth keeping in mind that after bound exhaustion users pay one doomed API round-trip per send instead of one compression side-query — that's the documented trade-off, and it's the better failure mode (progress is possible, and a successful reactive compression resets everything).
  • Per the unit test, thrown rescue attempts (side-query network/5xx errors) intentionally don't count toward the bound — so a summarizer that permanently throws still retries on every send. That's a defensible trade-off (counting throws would let 3 transient blips disable rescue, and a throw means "infra failed", not "history can't shrink"); flagging it only so the behavior is a conscious choice at merge time.
  • Pre-existing, unrelated to this PR: compress() derives newTokenCount from the side-query's API-reported usage (original - (input-1000) + output), so a provider reporting tiny usage on the side-query can make a genuinely-shrinking compression read as COMPRESSION_FAILED_INFLATED_TOKEN_COUNT. Only mentioning it because it bit my harness twice before I made the mock report realistic usage.

Build: npm install && npm run build on the PR worktree, clean. Tests were not re-run locally (CI already covers them on three OSes); this report is runtime evidence only.

中文版(验证报告)

运行时验证 — tmux 驱动真实 TUI(合并参考)

结论:PASS — 本地构建 PR head e718dbf,在 tmux 里驱动真实 TUI 走完 hard-rescue 循环。重试上限、确定性退出、reactive overflow 兜底、计数器重置全部符合 PR 描述;与当前 main 的对照完整复现了 #4346

方法

  • PR worktree 本地构建,真实 TUI 在 tmux 中运行,$HOME 完全隔离。
  • 本地 mock OpenAI 兼容服务器使场景可控:
    • chat 回复的 usage.prompt_tokens=125000 被会话采纳为 lastPromptTokenCount,配合 contextWindowSize=8000(该分支阈值算法下 hard=8000),让后续每次发送都处于 hard 层;
    • 压缩 side-query(按摘要器系统提示词识别)按开关返回空摘要(→ COMPRESSION_FAILED_EMPTY_SUMMARY)或有效摘要(→ COMPRESSED);
    • 可选一次性 HTTP 400 prompt is too long 触发 reactive overflow。
  • 三路独立证据:tmux 屏幕快照、mock 请求日志(chat/compress 分类)、~/.qwen/debug[compaction] 日志。

步骤(PR 构建)

  1. ✅ 发送 pre-release: fix ci #1 正常;footer 升至 >100% context。
  2. ✅ 发送 Where is the config saved? #2Are you interested in AI Terminal? #4 每次被拒(COMPRESSION_FAILED_EMPTY_SUMMARY),debug 日志精确计数:hardRescueAttempt=1→2→3hardRescueFailureCount=1→2→3;mock 日志每次恰好一个 compress 请求(共 3 个)。
  3. ✅ 发送 TypeError in Authentication Selection Interface #5 上限生效:日志 hard-tier rescue skipped after 3 failed attempts; relying on reactive overflow recovery.——不再发出 compress 请求,chat 请求直达 API,回复正常渲染,会话恢复可用。
  4. ✅ 发送 OpenAI API Error: 401 Incorecct API Key provided #6API Error: Streaming setup timeout after 45s #9(含 6KB 粘贴)稳态:零 compress side-query,每次发送烧一次压缩调用的问题消失。
  5. 🔍 /compress 摘要器正常时成功(125000 → 6050 tokens),随后的超大发送日志显示 hardRescueAttempt=1(此前是 3)——COMPRESSED 重置上限,且 consecutiveFailures 全程独立。
  6. 🔍 reactive overflow 兜底(全新会话):3 次失败耗尽上限后,下一次发送 → rescue skipped → API 返回 400 prompt is too long → reactive 压缩成功 → 自动重试成功;TUI 显示压缩横幅(125002 → 6052 tokens)。与单测 “falls back to reactive overflow recovery after the hard-rescue bound is exhausted” 端到端一致。
  7. 🔍 两次失败的手动 /compressINFLATED)既不增加也不重置 hardRescueFailureCount——增量仅限 shouldForceFromHard、重置仅限 COMPRESSED,与实现一致。

基线对照(当前 main @ 5854e28,同一 mock、同一场景)

5 次超大发送 → 5 次 rescue 触发(无 attempt 字段、无上限)→ 5 次被拒 → 烧掉 5 个压缩 side-query → 永远卡死,0 次 skip。TUI 每次发送都报同样的错误,没有任何出路——即 #4346 的无界循环。PR 构建下:3 次后恢复可用,reactive overflow 仍是安全网。

main(5854e28) PR(e718dbf)
历史无法缩小时的 rescue 次数 无界(每次发送 1 次) 恰好 3 次
耗尽后每次发送的压缩调用开销 每次 1 个,永远 0
反复失败后的会话可用性 卡死(每次发送被拒) 恢复(发送放行;API 侧溢出走 reactive)
压缩成功后的恢复 不适用 上限重置(hardRescueAttempt=1

观察(不阻塞合并)

  • 新增的 hardRescueAttempt / hardRescueFailureCount 日志字段让运行时行为可精确审计,对线上排障是实打实的加分项。
  • 上限耗尽后超大请求会真实到达 API:mock 接受了它;真实 API 拒绝时 reactive 路径能恢复(步骤 6)。耗尽后用户每次发送付出的是一次注定失败的 API 往返而非一次压缩 side-query——这是文档化的取舍,也是更好的失败模式(可推进,且 reactive 压缩成功后一切重置)。
  • 按单测设计,抛错的 rescue 尝试(side-query 网络/5xx 错误)有意不计入上限——摘要器若永久性抛错,每次发送仍会重试。这个取舍是合理的(把抛错计数会让 3 次瞬时抖动就禁用 rescue,且抛错代表"基础设施故障"而非"历史无法缩小");提出来只是让合并时这一行为是个有意识的选择。
  • 与本 PR 无关的既有行为:compress()newTokenCount 由 side-query 的 API usage 推导(original - (input-1000) + output),provider 若在 side-query 上报极小 usage,真实缩小的压缩也会被判 COMPRESSION_FAILED_INFLATED_TOKEN_COUNT。提它只是因为我的验证脚手架在把 mock 的 usage 调成真实比例前被坑了两次。

构建:PR worktree 上 npm install && npm run build 干净通过。本地未重跑测试(CI 已在三个平台覆盖);本报告只提供运行时证据。

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed runtime verification. The tmux-driven before/after evidence is very helpful, especially the confirmation that the bound stops repeated compression side-queries while reactive overflow still recovers correctly.

I agree with the noted tradeoff: after the hard-rescue bound is exhausted, the request may pay one provider round-trip instead of repeatedly burning compression calls, which is the intended safer failure mode here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/bug Something isn't working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bound hard-tier rescue retries with hardRescueFailureCount counter

6 participants