fix(core): bound hard rescue compression retries#4526
Conversation
|
Updated the PR description to the latest template with Reviewer Test Plan, Evidence, Tested on, Risk & Scope, and Chinese summary. No code changes were needed for this update; CI is already green across macOS, Windows, Linux, lint, and CodeQL. |
|
Hi @Jerry2003826, thanks for the PRs! 👋 I noticed that the Chinese text inside the Affected PRs:
PRs without this issue (for reference): #4524, #4520, #4518 This is likely an encoding issue when the
You can fix the existing descriptions with: gh pr edit <number> --body-file fixed-description.mdNo code changes needed — just a description update. Thanks! |
wenshao
left a comment
There was a problem hiding this comment.
No review findings. LGTM! ✅ — qwen3.7-max via Qwen Code /review
Local verification report (Linux, tmux)Verified PR head Environment
Results
Behavior covered by the new testsThe three new specs in
The pre-existing companion test Diff-level sanity checks (read alongside the tests)
Notes
ConclusionLGTM from a verification standpoint. All commands listed in the PR's reviewer test plan pass locally on Linux; no extra failures or warnings observed outside the documented behavior. Safe to merge from this PR's own scope of changes. Verification artifacts (logs) retained locally at |
|
This PR shares code with PR #4525, PR #4528, and PR #4531 — all modifying core token estimation and prompt handling logic. Key shared functions:
Recommendation: This PR modifies rescue compression retries which shares geminiChat and token estimation code with other PRs. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
|
This PR shares code with PR #4525, PR #4528, and PR #4531 — all modifying core token estimation and prompt handling logic. Key shared functions:
Recommendation: This PR modifies rescue compression retries which shares geminiChat and token estimation code with other PRs. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
Maintainer Verification ReportPR: #4526 — fix(core): bound hard rescue compression retries Build & Lint
Unit Tests
New Test Coverage (#4346)All 3 new test cases pass:
Code Review Notes
VerdictReady to merge. The fix is well-bounded, correctly separates hard-rescue retry tracking from the cheap-gate breaker, and all existing tests pass without regressions. Verified by: wenshao |
| const shouldForceFromHard = | ||
| isHardTier && this.hardRescueFailureCount < MAX_CONSECUTIVE_FAILURES; | ||
| if (shouldForceFromHard) { | ||
| this.hardRescueFailureCount += 1; |
There was a problem hiding this comment.
hardRescueFailureCount is incremented before the async tryCompress call. If tryCompress throws (user abort via AbortSignal, transient network error), neither the NOOP refund (line 1557) nor the COMPRESSED reset (line 1393) applies, so the counter stays inflated.
Three user-initiated aborts during hard-tier rescue would exhaust MAX_CONSECUTIVE_FAILURES and permanently skip hard rescue for the rest of the session — even though compression was never truly attempted and failed.
Consider moving the increment after tryCompress returns (and only for non-NOOP, non-COMPRESSED results), or decrementing in a catch/finally block for thrown errors.
There was a problem hiding this comment.
Fixed in 3543b6c.
hardRescueFailureCount is no longer pre-incremented before tryCompress(). It is now incremented only after tryCompress() returns a hard-rescue compression failure status, so thrown compression errors/aborts do not consume the retry budget. The merge resolution also keeps main's hard-rescue history restore and deferred recording guard.
Validation:
npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hard-rescue|hard-tier rescue"npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.tsnpx eslint packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.tsnpm run typecheck --workspace=packages/core
BZ-D
left a comment
There was a problem hiding this comment.
设计思路正确,NOOP refund 和 COMPRESSED reset 的逻辑都没问题。留了一个 inline comment 关于 hardRescueFailureCount 在 tryCompress 之前 pre-increment —— 用户连续 abort 3 次会耗尽 rescue 预算。影响有限(用户可以 /compress 恢复),不 block,供参考。
|
This PR has merge conflicts with main — please rebase or merge main to resolve before merging. |
|
Merged current I also handled the inline Validation:
|
| shouldForceFromHard && | ||
| isCompressionFailureStatus(compressionInfo.compressionStatus) | ||
| ) { | ||
| this.hardRescueFailureCount += 1; |
There was a problem hiding this comment.
[Critical] The hardRescueFailureCount bound is bypassed when compression returns COMPRESSED but the post-compression guard at shouldStopAfterHardRescue still rejects the send.
Root cause: tryCompress resets this.hardRescueFailureCount = 0 on COMPRESSED (line 1409). The post-guard then re-estimates the prompt size and throws if still >= hard, rolling back history. But the counter is already 0. On the next send, the counter is 0 again → hard-rescue fires → tryCompress returns COMPRESSED → counter resets → guard throws → repeat indefinitely. Each send costs one compression side-query that the PR's bound was designed to prevent.
Trace:
- Send N:
hardRescueFailureCount=0→shouldForceFromHard=true tryCompress→COMPRESSED(resets counter to 0)- Post-guard:
localPromptTokensAfterCompression >= hard→ rollback + throw - Send N+1: same cycle — counter never accumulates
The existing test "rejects before request serialization and restores history when hard-rescue compression is still oversized" only tests a single attempt, so it doesn't catch the loop.
| this.hardRescueFailureCount += 1; | |
| if ( | |
| shouldForceFromHard && | |
| (isCompressionFailureStatus(compressionInfo.compressionStatus) || | |
| compressionInfo.compressionStatus === CompressionStatus.COMPRESSED) | |
| ) { | |
| this.hardRescueFailureCount += 1; | |
| } |
Alternatively, increment hardRescueFailureCount in the rollback block (around line 1637) when compressionInfo.compressionStatus === CompressionStatus.COMPRESSED. This keeps the increment co-located with the rejection decision rather than relying on the counter surviving the tryCompress reset.
— qwen-latest-series-invite-beta-v38 via Qwen Code /review
There was a problem hiding this comment.
System.Collections.Hashtable.body
| // (or any successful compaction) recovers a chat whose breaker had | ||
| // tripped. | ||
| this.consecutiveFailures = 0; | ||
| this.hardRescueFailureCount = 0; |
There was a problem hiding this comment.
[Suggestion] This unconditional reset creates a second unbounded path. When compression returns NOOP, isCompressionFailureStatus(NOOP) is false (line 1604), so the counter is never incremented — yet the post-guard throws when the prompt remains oversized (because the char/4 estimator over-counted). Like the COMPRESSED gap above, this lets one compression side-query burn per send indefinitely.
The test "does not count hard-rescue NOOP results toward the retry bound" explicitly verifies this, and the rationale ("a too-small history slice is not a broken rescue mechanism") is sound in principle. However, when the token estimator persistently over-counts by 15-20K (as documented in the code comments), the session enters a permanent per-send compression tax with no recovery path.
Consider adding a separate, higher bound for NOOP outcomes (e.g., MAX_CONSECUTIVE_FAILURES * 3), or at minimum a debug log when NOOP + oversized recurs more than a few times, so the pattern is visible in diagnostics.
— qwen-latest-series-invite-beta-v38 via Qwen Code /review
There was a problem hiding this comment.
System.Collections.Hashtable.body
3543b6c to
843ca1c
Compare
|
|
||
| if (isHardTier && !shouldForceFromHard) { | ||
| compressionInfo = { | ||
| originalTokenCount: effectiveTokens, |
There was a problem hiding this comment.
[Suggestion] The synthesized NOOP branch here is implicitly traversed by the post-loop recovery calls in the compressed-oversized and failed-compaction tests (the (MAX+1)-th send), but no test explicitly asserts the outcome. Consider adding an assertion after the post-loop call that proves compress() was NOT called on the (MAX+1)-th attempt and that the send succeeded via generateContentStream without compression:
// After the for-loop + post-loop send:
const callsBeforeBound = compressSpy.mock.calls.length;
// ... consume stream ...
expect(compressSpy).toHaveBeenCalledTimes(callsBeforeBound); // no new compression callThis makes the test's intent explicit and guards against regressions if the synthesized NOOP branch logic changes.
— qwen-latest-series-invite-beta-v38 via Qwen Code /review
There was a problem hiding this comment.
System.Collections.Hashtable.body
…-rescue-retries # Conflicts: # packages/core/src/core/geminiChat.ts
843ca1c to
a19b21b
Compare
BZ-D
left a comment
There was a problem hiding this comment.
Reviewed the diff and ran the targeted GeminiChat tests; no blocking issues found.
|
Addressed the hard-rescue NOOP retry-bound feedback in Change:
Validation:
|
wenshao
left a comment
There was a problem hiding this comment.
No Critical findings. 7 Suggestions (6 inline + 1 below): the error log at the shouldStopAfterHardRescue throw site includes consecutiveFailures but omits hardRescueFailureCount — adding it would aid debugging hard-rescue boundary behavior.
— qwen3.7-max via Qwen Code /review
| shouldForceFromHard && | ||
| isCompressionFailureStatus(compressionInfo.compressionStatus) | ||
| ) { | ||
| this.hardRescueFailureCount += 1; |
There was a problem hiding this comment.
[Suggestion] This inline += 1 is dead code in all tested paths. When compression returns a failure status, tokens remain above hard, so shouldStopAfterHardRescue returns true and the snapshot+override at line ~1634 (= hardRescueFailureCountBeforeHardRescue + 1) overwrites this increment with the same value. Both produce N+1, so this write has no runtime effect.
The dual-write pattern makes the counter harder to reason about — future maintainers may think there are two independent increment paths, or that removing this line would change behavior.
| this.hardRescueFailureCount += 1; |
Remove this if block and rely solely on the post-guard override at line ~1634, which already handles all three failure modes (COMPRESSED-but-oversized, failure status, NOOP).
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
System.Collections.Hashtable.body
| * Number of failed hard-tier rescue attempts for this chat. Hard rescue is | ||
| * forced and therefore bypasses the cheap-gate breaker, so it needs its own | ||
| * bound to avoid spending one compression side-query on every send when | ||
| * history repeatedly cannot shrink. NOOP is refunded because a too-small |
There was a problem hiding this comment.
[Suggestion] The doc says "NOOP is refunded" but NOOP results actually count toward the bound. Tracing the path: when tryCompress returns NOOP, isCompressionFailureStatus(NOOP) is false (so the inline += 1 at line 1609 doesn't fire), but shouldStopAfterHardRescue returns true because tokens are unchanged, and the snapshot+override at line ~1634 sets hardRescueFailureCount = snapshot + 1. The test "stops hard-rescue after repeated NOOP results are still oversized" confirms NOOP results exhaust the budget.
Future readers will believe NOOP is free and may misdiagnose why hard-rescue stops firing.
| * history repeatedly cannot shrink. NOOP is refunded because a too-small | |
| * bound to avoid spending one compression side-query on every send when | |
| * history repeatedly cannot shrink. NOOP counts toward the bound via the | |
| * post-compression guard (the prompt remains oversized); COMPRESSED resets | |
| * this counter unless the post-compression hard-limit guard still rejects | |
| * the send. |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
System.Collections.Hashtable.body
| ); | ||
| } else if (isHardTier) { | ||
| debugLogger.warn( | ||
| `[compaction] hard-tier rescue skipped after ${this.hardRescueFailureCount} failed attempts; relying on reactive overflow recovery. effectiveTokens=${effectiveTokens}, hard=${hard}.`, |
There was a problem hiding this comment.
[Suggestion] This "skipped" log omits prompt_id, while the "triggered" log two lines above includes it. When grepping logs for a specific prompt_id to trace a user's failing request, the skip event is invisible — you see "triggered" for prompts 1–3, then silence.
| `[compaction] hard-tier rescue skipped after ${this.hardRescueFailureCount} failed attempts; relying on reactive overflow recovery. effectiveTokens=${effectiveTokens}, hard=${hard}.`, | |
| `[compaction] hard-tier rescue skipped after ${this.hardRescueFailureCount} failed attempts; relying on reactive overflow recovery. prompt_id=${prompt_id}, effectiveTokens=${effectiveTokens}, hard=${hard}.`, |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
System.Collections.Hashtable.body
| ? this.getHistoryShallow() | ||
| : undefined; | ||
| const lastPromptTokenCountBeforeHardRescue = this.lastPromptTokenCount; | ||
| const hardRescueFailureCountBeforeHardRescue = |
There was a problem hiding this comment.
[Suggestion] This snapshot exists for a subtle reason: tryCompress resets hardRescueFailureCount to 0 on COMPRESSED success (line ~1410), but the post-compression guard below may still reject the send. Without the snapshot, the override would compute 0 + 1 = 1 instead of N + 1, losing accumulated progress.
This is the most subtle pattern in the PR and the easiest landmine for a future maintainer — someone might think the snapshot is redundant and use this.hardRescueFailureCount directly, breaking the COMPRESSED-but-oversized path.
Consider adding an inline comment:
// Snapshot before tryCompress: tryCompress resets hardRescueFailureCount
// to 0 on COMPRESSED success, but the post-compression guard below may
// still reject the send. Without this snapshot, the override would
// compute 0 + 1 instead of N + 1.
const hardRescueFailureCountBeforeHardRescue =
this.hardRescueFailureCount;— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
System.Collections.Hashtable.body
| ); | ||
| }); | ||
|
|
||
| it('rejects when compressed history is below hard but the pending user message pushes it over', async () => { |
There was a problem hiding this comment.
[Suggestion] This test verifies that after hard-rescue exhaustion, the next send proceeds to the model with a mocked success response. But the debug log at geminiChat.ts:1577 explicitly says "relying on reactive overflow recovery" — the designed fallback is that the API returns context-overflow, triggering reactive compression. No test exercises this end-to-end path.
Consider adding a test that: (1) exhausts hardRescueFailureCount, (2) sends one more message that enters the synthesized NOOP branch, (3) mocks the model to throw a context-overflow error, (4) asserts that reactive compression fires and consecutiveFailures increments.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
System.Collections.Hashtable.body
| ); | ||
| }); | ||
|
|
||
| it('stops hard-rescue after repeated compressed results are still oversized', async () => { |
There was a problem hiding this comment.
[Suggestion] No test verifies the recovery path: after accumulating hard-rescue failures, a subsequent successful COMPRESSED send should reset hardRescueFailureCount to 0 (line ~1410 in tryCompress). If the reset line were accidentally removed, a session with past failures would permanently lose hard-rescue capability — a silent regression.
Consider adding a test: (1) accumulate 1–2 hard-rescue failures (below MAX), (2) trigger a successful COMPRESSED send, (3) send again in hard-tier and assert compress is called with force=true (proving hard-rescue fired again, which requires counter < MAX).
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
System.Collections.Hashtable.body
DragonnZhang
left a comment
There was a problem hiding this comment.
No new findings beyond the existing inline comments. The hard-rescue retry bound is correctly implemented: hardRescueFailureCount is incremented only after shouldStopAfterHardRescue rejects (not pre-incremented), thrown tryCompress errors do not consume the budget, the snapshot/restore pattern preserves correct counter state across COMPRESSED-but-oversized paths, and the synthesized NOOP short-circuit avoids wasted compression calls when the bound is exhausted. Test coverage spans all five key paths (compressed-oversized, failed, noop, thrown, and reactive fallback). LGTM! ✅ — qwen3-coder-plus via Qwen Code /review
|
Merged latest Validation after merge:
|
|
@qwen-code /review |
| _Qwen Code review request accepted. Review is queued in [workflow run](https://github.com/QwenLM/qwen-code/actions/runs/27307216562)._ |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
[Suggestion] The "hard-tier rescue stopped oversized prompt" debug log at geminiChat.ts:1716 includes consecutiveFailures but omits hardRescueFailureCount. Since consecutiveFailures is never incremented in the force=true path, it's diagnostically uninformative here. Adding hardRescueFailureCount=${this.hardRescueFailureCount} to this log would make it self-contained for debugging hard-rescue bound exhaustion.
— qwen3.7-max via Qwen Code /review
| compressionInfo, | ||
| localPromptTokensAfterCompression, | ||
| ); | ||
| if (shouldForceFromHard) { |
There was a problem hiding this comment.
[Suggestion] This increment only runs when shouldStopAfterHardRescue returns true (i.e., tryCompress returned normally with an oversized result). When compress() throws (e.g., network error, revoked API key), the exception propagates before reaching this block, so hardRescueFailureCount is never incremented. The test "does not count thrown hard-rescue attempts toward the retry bound" explicitly validates this: MAX_CONSECUTIVE_FAILURES + 1 throws all call compress(), proving the throw path is unbounded.
In a permanently broken compression service, every subsequent send wastes one compression side-query with no upper limit. Consider either adding a throw-path increment (e.g., in a catch around tryCompress in the hard-rescue path) or documenting this as an intentional tradeoff (transient errors should not permanently disable hard-rescue).
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Leaving this behavior unchanged. Thrown hard-rescue compression attempts intentionally do not consume hardRescueFailureCount; the bound is for completed hard-rescue attempts whose result is still oversized or otherwise unusable. A thrown compression call still propagates as the original failure instead of hiding it behind the rescue bound.
This is covered by does not count thrown hard-rescue attempts toward the retry bound in geminiChat.test.ts.
|
Updated in Changes:
Validation:
|
Runtime verification — real TUI, tmux-driven (maintainer merge reference)Verdict: PASS — built PR head Method
Steps (PR build)
Baseline contrast (current
|
main (5854e28) |
PR (e718dbf) | |
|---|---|---|
| Rescue attempts when history can't shrink | unbounded (1 per send) | exactly 3 |
| Compression side-query cost per send after exhaustion | 1 per send, forever | 0 |
| Chat usability after repeated failure | stuck (every send rejected) | recovers (send proceeds; API-side overflow handled reactively) |
| Recovery after a successful compression | n/a | bound re-arms (hardRescueAttempt=1) |
Observations (non-blocking)
- The new
hardRescueAttempt=N/hardRescueFailureCount=Nlog fields made the runtime behavior precisely auditable — they're genuinely useful for support diagnostics, not just decoration. - The skip path deliberately lets an oversized request reach the API. In my run the (mock) API accepted it; when a real API rejects it, the reactive path recovers (step 6). Worth keeping in mind that after bound exhaustion users pay one doomed API round-trip per send instead of one compression side-query — that's the documented trade-off, and it's the better failure mode (progress is possible, and a successful reactive compression resets everything).
- Per the unit test, thrown rescue attempts (side-query network/5xx errors) intentionally don't count toward the bound — so a summarizer that permanently throws still retries on every send. That's a defensible trade-off (counting throws would let 3 transient blips disable rescue, and a throw means "infra failed", not "history can't shrink"); flagging it only so the behavior is a conscious choice at merge time.
- Pre-existing, unrelated to this PR:
compress()derivesnewTokenCountfrom the side-query's API-reported usage (original - (input-1000) + output), so a provider reporting tiny usage on the side-query can make a genuinely-shrinking compression read asCOMPRESSION_FAILED_INFLATED_TOKEN_COUNT. Only mentioning it because it bit my harness twice before I made the mock report realistic usage.
Build: npm install && npm run build on the PR worktree, clean. Tests were not re-run locally (CI already covers them on three OSes); this report is runtime evidence only.
中文版(验证报告)
运行时验证 — tmux 驱动真实 TUI(合并参考)
结论:PASS — 本地构建 PR head e718dbf,在 tmux 里驱动真实 TUI 走完 hard-rescue 循环。重试上限、确定性退出、reactive overflow 兜底、计数器重置全部符合 PR 描述;与当前 main 的对照完整复现了 #4346。
方法
- PR worktree 本地构建,真实 TUI 在 tmux 中运行,
$HOME完全隔离。 - 本地 mock OpenAI 兼容服务器使场景可控:
- chat 回复的
usage.prompt_tokens=125000被会话采纳为lastPromptTokenCount,配合contextWindowSize=8000(该分支阈值算法下hard=8000),让后续每次发送都处于 hard 层; - 压缩 side-query(按摘要器系统提示词识别)按开关返回空摘要(→
COMPRESSION_FAILED_EMPTY_SUMMARY)或有效摘要(→COMPRESSED); - 可选一次性 HTTP 400
prompt is too long触发 reactive overflow。
- chat 回复的
- 三路独立证据:tmux 屏幕快照、mock 请求日志(chat/compress 分类)、
~/.qwen/debug的[compaction]日志。
步骤(PR 构建)
- ✅ 发送 pre-release: fix ci #1 正常;footer 升至 >100% context。
- ✅ 发送 Where is the config saved? #2–Are you interested in AI Terminal? #4 每次被拒(
COMPRESSION_FAILED_EMPTY_SUMMARY),debug 日志精确计数:hardRescueAttempt=1→2→3,hardRescueFailureCount=1→2→3;mock 日志每次恰好一个 compress 请求(共 3 个)。 - ✅ 发送 TypeError in Authentication Selection Interface #5 上限生效:日志
hard-tier rescue skipped after 3 failed attempts; relying on reactive overflow recovery.——不再发出 compress 请求,chat 请求直达 API,回复正常渲染,会话恢复可用。 - ✅ 发送 OpenAI API Error: 401 Incorecct API Key provided #6–API Error: Streaming setup timeout after 45s #9(含 6KB 粘贴)稳态:零 compress side-query,每次发送烧一次压缩调用的问题消失。
- 🔍
/compress摘要器正常时成功(125000 → 6050 tokens),随后的超大发送日志显示hardRescueAttempt=1(此前是 3)——COMPRESSED 重置上限,且consecutiveFailures全程独立。 - 🔍 reactive overflow 兜底(全新会话):3 次失败耗尽上限后,下一次发送 →
rescue skipped→ API 返回 400prompt is too long→ reactive 压缩成功 → 自动重试成功;TUI 显示压缩横幅(125002 → 6052 tokens)。与单测 “falls back to reactive overflow recovery after the hard-rescue bound is exhausted” 端到端一致。 - 🔍 两次失败的手动
/compress(INFLATED)既不增加也不重置hardRescueFailureCount——增量仅限shouldForceFromHard、重置仅限COMPRESSED,与实现一致。
基线对照(当前 main @ 5854e28,同一 mock、同一场景)
5 次超大发送 → 5 次 rescue 触发(无 attempt 字段、无上限)→ 5 次被拒 → 烧掉 5 个压缩 side-query → 永远卡死,0 次 skip。TUI 每次发送都报同样的错误,没有任何出路——即 #4346 的无界循环。PR 构建下:3 次后恢复可用,reactive overflow 仍是安全网。
main(5854e28) |
PR(e718dbf) | |
|---|---|---|
| 历史无法缩小时的 rescue 次数 | 无界(每次发送 1 次) | 恰好 3 次 |
| 耗尽后每次发送的压缩调用开销 | 每次 1 个,永远 | 0 |
| 反复失败后的会话可用性 | 卡死(每次发送被拒) | 恢复(发送放行;API 侧溢出走 reactive) |
| 压缩成功后的恢复 | 不适用 | 上限重置(hardRescueAttempt=1) |
观察(不阻塞合并)
- 新增的
hardRescueAttempt/hardRescueFailureCount日志字段让运行时行为可精确审计,对线上排障是实打实的加分项。 - 上限耗尽后超大请求会真实到达 API:mock 接受了它;真实 API 拒绝时 reactive 路径能恢复(步骤 6)。耗尽后用户每次发送付出的是一次注定失败的 API 往返而非一次压缩 side-query——这是文档化的取舍,也是更好的失败模式(可推进,且 reactive 压缩成功后一切重置)。
- 按单测设计,抛错的 rescue 尝试(side-query 网络/5xx 错误)有意不计入上限——摘要器若永久性抛错,每次发送仍会重试。这个取舍是合理的(把抛错计数会让 3 次瞬时抖动就禁用 rescue,且抛错代表"基础设施故障"而非"历史无法缩小");提出来只是让合并时这一行为是个有意识的选择。
- 与本 PR 无关的既有行为:
compress()的newTokenCount由 side-query 的 API usage 推导(original - (input-1000) + output),provider 若在 side-query 上报极小 usage,真实缩小的压缩也会被判COMPRESSION_FAILED_INFLATED_TOKEN_COUNT。提它只是因为我的验证脚手架在把 mock 的 usage 调成真实比例前被坑了两次。
构建:PR worktree 上 npm install && npm run build 干净通过。本地未重跑测试(CI 已在三个平台覆盖);本报告只提供运行时证据。
|
Thanks for the detailed runtime verification. The tmux-driven before/after evidence is very helpful, especially the confirmation that the bound stops repeated compression side-queries while reactive overflow still recovers correctly. I agree with the noted tradeoff: after the hard-rescue bound is exhausted, the request may pay one provider round-trip instead of repeatedly burning compression calls, which is the intended safer failure mode here. |
What this PR does
Bounds hard-rescue compression retries so an oversized request cannot loop through repeated rescue attempts indefinitely.
Why it's needed
Hard-rescue compression needs a clear retry limit and deterministic exit path when the request remains too large.
Reviewer Test Plan
How to verify
Run the PR-targeted hard-rescue compression tests in packages/core, then run eslint/prettier/typecheck for the touched core files. The GitHub CI matrix also runs lint and tests on macOS, Windows, and Linux.
Evidence (Before & After)
Before: retry behavior could continue longer than intended when hard rescue stayed oversized. After: regression coverage pins bounded retry behavior and the terminal failure path. This is core request-size behavior, so TUI screenshots are N/A.
Tested on
Environment (optional)
Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.
Risk & Scope
Linked Issues
Fixes #4346
中文说明
这个 PR 做了什么
限制 hard-rescue 压缩重试次数,避免请求仍然超大时无限重试。
为什么需要
hard-rescue 需要明确的重试上限和可预测的退出路径。
Reviewer Test Plan
本地按 PR 相关 core 测试、lint、format check、typecheck 验证;macOS/Linux 由 GitHub CI 验证。
风险和范围
只影响 hard-rescue 压缩重试循环,不改普通压缩阈值。