fix(core): compress when usage metadata is missing by Jerry2003826 · Pull Request #4528 · QwenLM/qwen-code

Jerry2003826 · 2026-05-26T02:31:56Z

What this PR does

Allows chat compression to proceed safely when provider usage metadata is missing, while rejecting inflated local token deltas that would make the compression result unsafe.

Why it's needed

Some model responses omit usage metadata. The compression path still needs a conservative fallback instead of silently skipping or accepting an unsafe inflated estimate.

Reviewer Test Plan

How to verify

Run: npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|inflated local delta|cap-sized"; npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts; npx eslint packages/core/src/services/chatCompressionService.test.ts; npx prettier --check packages/core/src/services/chatCompressionService.test.ts; npm run typecheck --workspace=packages/core.

Evidence (Before & After)

Before: missing usage metadata could leave the compression path without coverage for inflated local estimates. After: regression tests cover missing-usage compression, inflated fallback rejection, and cap-sized safe fallback. This is core service behavior, so TUI screenshots are N/A.

Tested on

OS	Status
macOS	CI only
Windows	Tested locally
Linux	CI only

Environment (optional)

Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.

Risk & Scope

Main risk or tradeoff: The fallback remains conservative and only affects compression decisions when provider usage metadata is absent.
Not validated / out of scope: No unrelated refactors, public API changes, UI redesigns, or behavior outside the linked issue scope.
Breaking changes / migration notes: None expected.

Linked Issues

Fixes #3282

中文说明

这个 PR 做了什么

在 provider 没有返回 usage metadata 时，仍然允许压缩走安全的保守路径，并拒绝膨胀的本地 token 估算。

为什么需要

部分模型响应没有 usage metadata，压缩逻辑需要有可靠 fallback，不能跳过或接受不安全估算。

Reviewer Test Plan

本地在 Windows 跑了 chatCompressionService 定向测试、完整文件测试、eslint、prettier check 和 core typecheck；macOS/Linux 依赖 GitHub CI。

风险和范围

只影响缺失 usage metadata 时的压缩判断，不改压缩公共接口。

longbinlai · 2026-05-26T12:50:08Z

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4526, and PR #4531.

Key shared functions:

compress — packages/core/src/services/chatCompressionService.ts
estimateContentTokens — packages/core/src/services/tokenEstimation.ts

Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525.

This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

longbinlai · 2026-05-26T12:57:34Z

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4526, and PR #4531.

Key shared functions:

compress — packages/core/src/services/chatCompressionService.ts
estimateContentTokens — packages/core/src/services/tokenEstimation.ts

Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525.

This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

wenshao

Suggestion (pre-existing code, not on diff line): The output-truncation guard at chatCompressionService.ts:560 (compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS) is a no-op when usage is undefined, because compressionOutputTokenCount is also undefined. A usage-omitting provider that returns a max-length truncated summary will have it accepted unconditionally by the new fallback path. Consider logging a warning when the fallback path is used and the truncation guard cannot fire, so the risk is at least observable.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T00:57:54Z

            (compressionInputTokenCount - 1000) +
            compressionOutputTokenCount,
        );
+      } else {


[Critical] estimateContentTokens(extraHistory) produces a char/4 lower-bound of only the visible post-compression messages (summary + ack + historyToKeep). But originalTokenCount is API-reported and includes system prompt + tool definitions (~15–25K tokens of invisible context). These are on fundamentally different scales.

Downstream, geminiChat.ts:1362 stores newTokenCount as lastPromptTokenCount. On the next send, estimatePromptTokens() takes the lastPromptTokenCount > 0 branch and returns lastPromptTokenCount + estimateContentTokens([userMessage]) — missing the system-prompt-and-tools delta entirely. The auto-compaction gate then undercounts by 15–25K tokens until the next API response corrects it. For providers that consistently omit usage, the undercount persists across compression cycles.

Additionally, the newTokenCount > originalTokenCount (INFLATED_TOKEN_COUNT) guard becomes effectively dead code: the char/4 estimate of visible content will almost always be less than the API-reported full-prompt count, so even a pathological summary that makes the history larger would pass this check.

Suggested fix — estimate the delta rather than the absolute post-compression size, so newTokenCount stays on the same scale as originalTokenCount:

} else { // Preserve the API-reported baseline (system prompt + tools) by // estimating the reduction from content replacement rather than // the absolute post-compression size. const estimatedRemoved = estimateContentTokens( historyToCompress, slimmingConfig.imageTokenEstimate, ); const syntheticMessages = extraHistory.slice( 0, extraHistory.length - historyToKeep.length, ); const estimatedAdded = estimateContentTokens( syntheticMessages, slimmingConfig.imageTokenEstimate, ); newTokenCount = Math.max( 0, originalTokenCount - estimatedRemoved + estimatedAdded, ); canCalculateNewTokenCount = newTokenCount > 0; }

This mirrors the primary path's structure (originalTokenCount - removed + added) and keeps both sides of the INFLATED guard comparison on the same measurement scale.

— qwen3.7-max via Qwen Code /review

Addressed in the current branch. The missing-usage path no longer replaces the API-reported total with a visible-history-only estimate. It computes the visible delta only, preserves the API-reported non-visible remainder (originalTokenCount - estimatedOriginalVisibleTokenCount), and adds the estimated post-compression visible history back on top.

Coverage:

should use estimated token count if usage metadata is missing

Validated:

npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts

npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

wenshao · 2026-05-27T00:57:54Z

        newTokenCount = Math.max(
          0,
          originalTokenCount -
            (compressionInputTokenCount - 1000) +


[Suggestion] This fallback branch silently switches measurement methodology with no log trace. Other paths in this file emit debug/warn messages (e.g., lines 494, 567). A 3 AM operator sees a chat_compression telemetry event with tokens_after populated but compression_input_token_count / compression_output_token_count absent — there is no breadcrumb explaining why.

Suggested change

(compressionInputTokenCount - 1000) +

} else {

config.getDebugLogger().debug(

`[chat-compression] usage metadata omitted by provider; ` +

`falling back to local token estimation.`,

);

newTokenCount = estimateContentTokens(

— qwen3.7-max via Qwen Code /review

Addressed in 6020b735b: the missing-usage fallback path now has an explicit debug breadcrumb. The test captures getDebugLogger().debug and asserts the usage metadata missing / preserved non-visible remainder message is emitted.

wenshao · 2026-05-27T00:57:54Z

    expect(result.info.originalTokenCount).toBe(800);
-    expect(result.info.newTokenCount).toBe(800);
-    expect(result.newHistory).toBeNull();
+    expect(result.info.newTokenCount).toBeLessThan(800);


[Suggestion] toBeLessThan(800) is too loose to verify the estimation fallback actually ran. With the test's tiny history ("Summary" + "Got it..." ack + a few kept messages ≈ 57 chars), estimateContentTokens returns approximately Math.ceil(57/4) = 15 tokens. The upper bound of 800 would pass even if the fallback branch were never entered and newTokenCount were accidentally close to originalTokenCount.

Consider a tighter assertion that actually proves the estimator ran:

Suggested change

expect(result.info.newTokenCount).toBeLessThan(800);

expect(result.info.newTokenCount).toBeGreaterThan(0);

expect(result.info.newTokenCount).toBeLessThan(50);

— qwen3.7-max via Qwen Code /review

Addressed in 6020b735b with a tighter assertion. I did not use <50 here because the current fallback intentionally preserves the API-reported non-visible remainder (5000 - 4000 = 1000) and only replaces the visible-history delta. The test now asserts newTokenCount > 1000 and < 1100, which verifies the fallback estimator ran while preserving the non-visible remainder.

wenshao · 2026-05-27T05:55:54Z

+    expect(result.info.newTokenCount).toBeLessThan(5_000);
+    expect(result.newHistory).not.toBeNull();
+    expect(result.newHistory![0].parts![0].text).toBe('Summary');
+  });


[Suggestion] Missing test coverage for the delta-based estimation path producing COMPRESSION_FAILED_INFLATED_TOKEN_COUNT. All existing INFLATED tests (lines 959 and 1171) use explicit promptTokenCount/candidatesTokenCount in mock usage, exercising the primary formula. No test covers the scenario where delta-based estimation (estimatedNewContentTokenCount > estimatedOriginalContentTokenCount) triggers the INFLATED guard — e.g., when the side-query returns a verbose summary larger than the compressed input in estimated terms. This branch at chatCompressionService.ts:723 is reachable via the delta path but untested.

Consider adding a test that constructs a small historyForSplit with a verbose summary (e.g., text: 'x'.repeat(40_000)) and usage: undefined, asserting COMPRESSION_FAILED_INFLATED_TOKEN_COUNT and newHistory === null.

— qwen3.7-max via Qwen Code /review

Covered in the current branch by should reject inflated local delta if usage metadata is missing. That test sets usage: undefined, exercises the local delta estimation path, and asserts COMPRESSION_FAILED_INFLATED_TOKEN_COUNT with newHistory === null when the estimated post-compression count is inflated.

Validated:

npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts

npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

wenshao

All R1-R2 findings addressed. Delta-based estimation approach is sound, tests cover the key scenarios (happy path, inflated rejection, cap guard). CI green (8/8). — qwen3.7-max via Qwen Code /review

wenshao · 2026-05-28T01:02:35Z

PR #4528 Verification Report

Branch: Jiarui/fix-compression-missing-usage
Test Environment: macOS Darwin 25.4.0

Build & Type Check

npm run build — PASSED
npx tsc --noEmit — PASSED

Unit Tests

npx vitest run packages/core/src/services/chatCompressionService.test.ts — 71 tests passed

New/Modified Test Coverage

Missing-usage compression succeeds with local estimation — when provider omits usage metadata, estimateContentTokens() fallback produces valid token counts and compression completes with COMPRESSED status
Inflated local delta rejected — 40K char summary on 800-token input correctly triggers COMPRESSION_FAILED_OUTPUT_TRUNCATED, preventing runaway estimation
Cap-sized summaries rejected — summaries hitting the output cap are properly rejected even when usage metadata is missing

Code Review Observations

Line ~546: compressionOutputTokenCount fallback uses estimateContentTokens(summary) with a warning log when usage metadata is absent — clean degradation path
Line ~659: New else branch for newTokenCount calculation estimates visible token delta locally while preserving the API-reported non-visible remainder (system/tool/prompt tokens) — avoids double-counting
Warning logs provide observability for when the fallback path activates
Guard rails (inflated delta rejection, cap-sized rejection) prevent local estimation from silently producing bad results

Verdict

Ready to merge. The fallback logic is well-bounded with proper rejection guards, and all 71 tests pass including the 3 new coverage scenarios.

— wenshao

BZ-D

PR 整体逻辑合理：delta-based 估算保留了 API 侧不可见 token 开销，仅对可见对话部分做本地 char/4 估算，方向安全（保守）。truncation guard 的本地补充也正确。发现一个会导致编译失败的 merge conflict。

BZ-D · 2026-05-31T11:34:03Z

        );
+      } else {
+        const estimatedOriginalVisibleTokenCount = estimateContentTokens(
+          historyForSplit,


[Critical — 合并冲突] historyForSplit 在当前 main 分支上已不存在。PR #4599 将 tail-preservation 压缩重构为 summary + restoration attachments 模式后，该变量被移除。GitHub 已将本 PR 标记为 CONFLICTING。

需要 rebase 到最新 main。rebase 后此处应改用 curatedHistory（全量对话历史），或根据新代码结构中 historyForCompose（已剥离 orphaned functionCall 的版本）来估算原始可见 token。语义上 curatedHistory 更合适，因为 token 估算需要覆盖压缩前的完整可见内容。

Fixed in 299e3d7.

I merged current origin/main into the PR branch, resolved the chatCompressionService.ts conflict, and changed the missing-usage fallback to estimate the original visible token baseline from curatedHistory instead of the removed historyForSplit path. The merge also keeps main's post-compact attachment token adjustment on the usage-metadata path.

Validation:

npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|missing usage|inflated|cap"

npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts

npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

npm run typecheck --workspace=packages/core

…n-missing-usage # Conflicts: # packages/core/src/services/chatCompressionService.ts

BZ-D · 2026-06-01T07:47:33Z

+    if (compressionOutputTokenCount === undefined && !isSummaryEmpty) {
+      compressionOutputTokenCount = estimateContentTokens(
+        [{ role: 'model', parts: [{ text: summary }] }],
+        slimmingConfig.imageTokenEstimate,


estimateContentTokens uses a fixed ~4 chars/token ratio (CHARS_PER_TOKEN). For CJK text the actual ratio is closer to ~0.67 chars/token (each CJK character ≈ 1.5 tokens). A CJK summary truncated at COMPACT_MAX_OUTPUT_TOKENS tokens would have only ~COMPACT_MAX_OUTPUT_TOKENS × 0.67 characters, yielding an estimate of ~COMPACT_MAX_OUTPUT_TOKENS × 0.17 — well below the threshold, so the truncation guard on line 413 would not fire.

This only affects providers that omit usage metadata AND produce CJK output, but for those providers a truncated summary could silently corrupt the compressed history.

One possible mitigation: supplement the char-based estimate with a length check against COMPACT_MAX_OUTPUT_TOKENS × CHARS_PER_TOKEN chars (the maximum the model could have written), or apply a safety margin.

Addressed in the current branch. The output-cap guard now uses a CJK-aware local estimate for summary output when provider usage is missing, so CJK summaries near the output-token cap are rejected as COMPRESSION_FAILED_OUTPUT_TRUNCATED instead of passing through via char/4 underestimation.

Coverage:

should reject CJK cap-sized summaries when usage metadata is missing

I also pushed 93b07ea08 to write that fixture as \u4e00, avoiding non-ASCII display/encoding noise while keeping the same CJK token-estimation case.

Validated:

npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts

npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

BZ-D

整体逻辑清晰，fallback 路径设计合理。留了一个 inline comment 关于 CJK 场景下 estimateContentTokens 的 char/token 比率偏差 —— 在无 usage metadata 的 provider + CJK 输出场景下，truncation guard 可能漏判。不 block，供参考。

wenshao

No new issues found. All R1-R3 findings addressed. The missing-usage fallback chain (totalTokenCount delta → local estimate → undefined) and the delta-based token math with non-visible remainder preservation are well-designed. Merge conflict resolution (historyForSplit → curatedHistory) is correct. LGTM ✅ — qwen-latest-series-invite-beta-v38 via Qwen Code /review

BZ-D

Reviewed the diff and ran the targeted compression tests; no blocking issues found.

…n-missing-usage # Conflicts: # packages/core/src/services/chatCompressionService.ts

Jerry2003826 · 2026-06-04T09:27:14Z

Merged latest origin/main in b39d03b8b and cleared the compression-service merge conflict.

Conflict resolution:

Kept this PR's local fallback path for missing compression usage metadata, including the CJK-aware estimateSummaryOutputTokens() helper.
Kept upstream's new MAX_HOOK_INSTRUCTIONS_CHARS cap and buildStateReminderParts import/path.
The two blocks are independent, so the resolution preserves both behaviors.

Validation on Windows:

npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "missing usage|usage metadata|CJK|local estimate|fallback|output cap|cap-sized|inflated"
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
npm run typecheck --workspace=packages/core
git diff --check

DragonnZhang

Review Summary

Verdict: APPROVE

No new findings. The fallback chain for missing usage metadata is well-designed and conservative throughout.

What was reviewed

chatCompressionService.ts: New estimateSummaryOutputTokens function with CJK awareness, local estimation fallback in the token math else branch, and the integration of local estimates into the existing cap guard and inflation guard.
chatCompressionService.test.ts: Four new/updated tests covering happy path, inflated delta rejection, cap-sized summary rejection, and CJK cap-sized summary rejection — all without usage metadata.

Assessment

The fallback chain (totalTokenCount delta → local char-based estimate → undefined) is sound and conservative.
The non-visible remainder preservation (originalTokenCount - estimatedOriginalVisibleTokenCount) correctly prevents the local estimator from replacing the API-authoritative total with a visible-history-only underestimate.
The CJK-aware output estimate takes max(generic, cjkAware), which is the safe (higher) direction — reducing false negatives in the cap guard.
The inflation guard (newTokenCount > originalTokenCount) correctly catches oversized summaries even without provider usage data.
Tests are comprehensive and validate both success and failure paths under missing usage metadata.
Deterministic checks (tsc, eslint): 0 findings.
CI: 14/14 checks passing.

— qwen-code via Qwen Code /review

Jerry2003826 · 2026-06-10T10:10:22Z

Addressed the review suggestions in 6020b735b.

Updates:

The missing-usage fallback test now asserts the post-compression estimate is tightly bounded (> 1000 and < 1100) instead of only checking that it is below the original 5000 tokens.
The same test now captures getDebugLogger().debug and asserts the fallback trace is emitted, including the preserved API-reported non-visible remainder.

Validated locally:

npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
npm run lint --workspace=packages/core
npm run typecheck --workspace=packages/core
npx prettier --check packages/core/src/services/chatCompressionService.test.ts

fix(core): compress when usage metadata is missing

0e4ea8d

github-actions Bot mentioned this pull request May 26, 2026

📊 AI CLI 工具社区动态日报 2026-05-26 ivanweng2077/big_model_radar#91

Open

This was referenced May 26, 2026

fix(core): guard oversized resumed history sends #4531

Merged

fix(core): include response tokens in prompt estimate #4525

Open

fix(core): bound hard rescue compression retries #4526

Open

wenshao requested changes May 27, 2026

View reviewed changes

fix(core): harden missing compression usage fallback

12769de

wenshao reviewed May 27, 2026

View reviewed changes

Jerry2003826 added 2 commits May 27, 2026 22:12

test(core): cover missing-usage inflated compression fallback

3f39d81

fix(core): clarify missing usage compression fallback

1680110

wenshao previously approved these changes May 27, 2026

View reviewed changes

BZ-D reviewed May 31, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into Jiarui/fix-compressio…

79b64b9

…n-missing-usage # Conflicts: # packages/core/src/services/chatCompressionService.ts

Jerry2003826 dismissed wenshao’s stale review via 299e3d7 June 1, 2026 07:14

BZ-D reviewed Jun 1, 2026

View reviewed changes

BZ-D previously approved these changes Jun 1, 2026

View reviewed changes

wenshao previously approved these changes Jun 1, 2026

View reviewed changes

test(core): align compression fallback assertion with summary trailer

9facc14

Jerry2003826 dismissed stale reviews from wenshao and BZ-D via 9facc14 June 1, 2026 17:31

Jerry2003826 force-pushed the Jiarui/fix-compression-missing-usage branch from 299e3d7 to 9facc14 Compare June 1, 2026 17:31

github-actions Bot mentioned this pull request Jun 2, 2026

📊 AI CLI 工具社区动态日报 2026-06-02 zx0828/big_model_radar#81

Open

BZ-D previously approved these changes Jun 2, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into Jiarui/fix-compressio…

b39d03b

…n-missing-usage # Conflicts: # packages/core/src/services/chatCompressionService.ts

Jerry2003826 dismissed BZ-D’s stale review via b39d03b June 4, 2026 09:27

wenshao previously approved these changes Jun 4, 2026

View reviewed changes

test(core): use ascii cjk fixture escape

93b07ea

Jerry2003826 dismissed wenshao’s stale review via 93b07ea June 8, 2026 09:10

DragonnZhang previously approved these changes Jun 8, 2026

View reviewed changes

test(core): tighten missing-usage compression assertions

6020b73

Jerry2003826 dismissed DragonnZhang’s stale review via 6020b73 June 10, 2026 10:10

wenshao approved these changes Jun 10, 2026

View reviewed changes

-            (compressionInputTokenCount - 1000) +
+      } else {
+        config.getDebugLogger().debug(
+          `[chat-compression] usage metadata omitted by provider; ` +
+            `falling back to local token estimation.`,
+        );
+        newTokenCount = estimateContentTokens(

	expect(result.info.newTokenCount).toBeLessThan(800);
	expect(result.info.newTokenCount).toBeGreaterThan(0);
	expect(result.info.newTokenCount).toBeLessThan(50);

Conversation

Jerry2003826 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why it's needed

Reviewer Test Plan

How to verify

Evidence (Before & After)

Tested on

Environment (optional)

Risk & Scope

Linked Issues

这个 PR 做了什么

为什么需要

Reviewer Test Plan

风险和范围

Uh oh!

longbinlai commented May 26, 2026

Uh oh!

longbinlai commented May 26, 2026

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented May 28, 2026

PR #4528 Verification Report

Build & Type Check

Unit Tests

New/Modified Test Coverage

Code Review Observations

Verdict

Uh oh!

BZ-D left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BZ-D left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

BZ-D left a comment

Choose a reason for hiding this comment

Uh oh!

Jerry2003826 commented Jun 4, 2026

Uh oh!

DragonnZhang left a comment

Choose a reason for hiding this comment

Review Summary

What was reviewed

Assessment

Uh oh!

Jerry2003826 commented Jun 10, 2026

Jerry2003826 commented May 26, 2026 •

edited

Loading