Skip to content

fix(core): compress when usage metadata is missing#4528

Open
Jerry2003826 wants to merge 9 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-compression-missing-usage
Open

fix(core): compress when usage metadata is missing#4528
Jerry2003826 wants to merge 9 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-compression-missing-usage

Conversation

@Jerry2003826

@Jerry2003826 Jerry2003826 commented May 26, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Allows chat compression to proceed safely when provider usage metadata is missing, while rejecting inflated local token deltas that would make the compression result unsafe.

Why it's needed

Some model responses omit usage metadata. The compression path still needs a conservative fallback instead of silently skipping or accepting an unsafe inflated estimate.

Reviewer Test Plan

How to verify

Run: npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|inflated local delta|cap-sized"; npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts; npx eslint packages/core/src/services/chatCompressionService.test.ts; npx prettier --check packages/core/src/services/chatCompressionService.test.ts; npm run typecheck --workspace=packages/core.

Evidence (Before & After)

Before: missing usage metadata could leave the compression path without coverage for inflated local estimates. After: regression tests cover missing-usage compression, inflated fallback rejection, and cap-sized safe fallback. This is core service behavior, so TUI screenshots are N/A.

Tested on

OS Status
macOS CI only
Windows Tested locally
Linux CI only

Environment (optional)

Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.

Risk & Scope

  • Main risk or tradeoff: The fallback remains conservative and only affects compression decisions when provider usage metadata is absent.
  • Not validated / out of scope: No unrelated refactors, public API changes, UI redesigns, or behavior outside the linked issue scope.
  • Breaking changes / migration notes: None expected.

Linked Issues

Fixes #3282

中文说明

这个 PR 做了什么

在 provider 没有返回 usage metadata 时,仍然允许压缩走安全的保守路径,并拒绝膨胀的本地 token 估算。

为什么需要

部分模型响应没有 usage metadata,压缩逻辑需要有可靠 fallback,不能跳过或接受不安全估算。

Reviewer Test Plan

本地在 Windows 跑了 chatCompressionService 定向测试、完整文件测试、eslint、prettier check 和 core typecheck;macOS/Linux 依赖 GitHub CI。

风险和范围

只影响缺失 usage metadata 时的压缩判断,不改压缩公共接口。

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4526, and PR #4531.

Key shared functions:

  • compresspackages/core/src/services/chatCompressionService.ts
  • estimateContentTokenspackages/core/src/services/tokenEstimation.ts

Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4525, PR #4526, and PR #4531.

Key shared functions:

  • compresspackages/core/src/services/chatCompressionService.ts
  • estimateContentTokenspackages/core/src/services/tokenEstimation.ts

Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (pre-existing code, not on diff line): The output-truncation guard at chatCompressionService.ts:560 (compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS) is a no-op when usage is undefined, because compressionOutputTokenCount is also undefined. A usage-omitting provider that returns a max-length truncated summary will have it accepted unconditionally by the new fallback path. Consider logging a warning when the fallback path is used and the truncation guard cannot fire, so the risk is at least observable.

— qwen3.7-max via Qwen Code /review

(compressionInputTokenCount - 1000) +
compressionOutputTokenCount,
);
} else {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] estimateContentTokens(extraHistory) produces a char/4 lower-bound of only the visible post-compression messages (summary + ack + historyToKeep). But originalTokenCount is API-reported and includes system prompt + tool definitions (~15–25K tokens of invisible context). These are on fundamentally different scales.

Downstream, geminiChat.ts:1362 stores newTokenCount as lastPromptTokenCount. On the next send, estimatePromptTokens() takes the lastPromptTokenCount > 0 branch and returns lastPromptTokenCount + estimateContentTokens([userMessage]) — missing the system-prompt-and-tools delta entirely. The auto-compaction gate then undercounts by 15–25K tokens until the next API response corrects it. For providers that consistently omit usage, the undercount persists across compression cycles.

Additionally, the newTokenCount > originalTokenCount (INFLATED_TOKEN_COUNT) guard becomes effectively dead code: the char/4 estimate of visible content will almost always be less than the API-reported full-prompt count, so even a pathological summary that makes the history larger would pass this check.

Suggested fix — estimate the delta rather than the absolute post-compression size, so newTokenCount stays on the same scale as originalTokenCount:

} else {
  // Preserve the API-reported baseline (system prompt + tools) by
  // estimating the reduction from content replacement rather than
  // the absolute post-compression size.
  const estimatedRemoved = estimateContentTokens(
    historyToCompress,
    slimmingConfig.imageTokenEstimate,
  );
  const syntheticMessages = extraHistory.slice(
    0,
    extraHistory.length - historyToKeep.length,
  );
  const estimatedAdded = estimateContentTokens(
    syntheticMessages,
    slimmingConfig.imageTokenEstimate,
  );
  newTokenCount = Math.max(
    0,
    originalTokenCount - estimatedRemoved + estimatedAdded,
  );
  canCalculateNewTokenCount = newTokenCount > 0;
}

This mirrors the primary path's structure (originalTokenCount - removed + added) and keeps both sides of the INFLATED guard comparison on the same measurement scale.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the current branch. The missing-usage path no longer replaces the API-reported total with a visible-history-only estimate. It computes the visible delta only, preserves the API-reported non-visible remainder (originalTokenCount - estimatedOriginalVisibleTokenCount), and adds the estimated post-compression visible history back on top.

Coverage:

  • should use estimated token count if usage metadata is missing

Validated:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

newTokenCount = Math.max(
0,
originalTokenCount -
(compressionInputTokenCount - 1000) +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This fallback branch silently switches measurement methodology with no log trace. Other paths in this file emit debug/warn messages (e.g., lines 494, 567). A 3 AM operator sees a chat_compression telemetry event with tokens_after populated but compression_input_token_count / compression_output_token_count absent — there is no breadcrumb explaining why.

Suggested change
(compressionInputTokenCount - 1000) +
} else {
config.getDebugLogger().debug(
`[chat-compression] usage metadata omitted by provider; ` +
`falling back to local token estimation.`,
);
newTokenCount = estimateContentTokens(

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6020b735b: the missing-usage fallback path now has an explicit debug breadcrumb. The test captures getDebugLogger().debug and asserts the usage metadata missing / preserved non-visible remainder message is emitted.

expect(result.info.originalTokenCount).toBe(800);
expect(result.info.newTokenCount).toBe(800);
expect(result.newHistory).toBeNull();
expect(result.info.newTokenCount).toBeLessThan(800);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] toBeLessThan(800) is too loose to verify the estimation fallback actually ran. With the test's tiny history ("Summary" + "Got it..." ack + a few kept messages ≈ 57 chars), estimateContentTokens returns approximately Math.ceil(57/4) = 15 tokens. The upper bound of 800 would pass even if the fallback branch were never entered and newTokenCount were accidentally close to originalTokenCount.

Consider a tighter assertion that actually proves the estimator ran:

Suggested change
expect(result.info.newTokenCount).toBeLessThan(800);
expect(result.info.newTokenCount).toBeGreaterThan(0);
expect(result.info.newTokenCount).toBeLessThan(50);

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6020b735b with a tighter assertion. I did not use <50 here because the current fallback intentionally preserves the API-reported non-visible remainder (5000 - 4000 = 1000) and only replaces the visible-history delta. The test now asserts newTokenCount > 1000 and < 1100, which verifies the fallback estimator ran while preserving the non-visible remainder.

expect(result.info.newTokenCount).toBeLessThan(5_000);
expect(result.newHistory).not.toBeNull();
expect(result.newHistory![0].parts![0].text).toBe('Summary');
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Missing test coverage for the delta-based estimation path producing COMPRESSION_FAILED_INFLATED_TOKEN_COUNT. All existing INFLATED tests (lines 959 and 1171) use explicit promptTokenCount/candidatesTokenCount in mock usage, exercising the primary formula. No test covers the scenario where delta-based estimation (estimatedNewContentTokenCount > estimatedOriginalContentTokenCount) triggers the INFLATED guard — e.g., when the side-query returns a verbose summary larger than the compressed input in estimated terms. This branch at chatCompressionService.ts:723 is reachable via the delta path but untested.

Consider adding a test that constructs a small historyForSplit with a verbose summary (e.g., text: 'x'.repeat(40_000)) and usage: undefined, asserting COMPRESSION_FAILED_INFLATED_TOKEN_COUNT and newHistory === null.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in the current branch by should reject inflated local delta if usage metadata is missing. That test sets usage: undefined, exercises the local delta estimation path, and asserts COMPRESSION_FAILED_INFLATED_TOKEN_COUNT with newHistory === null when the estimated post-compression count is inflated.

Validated:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

wenshao
wenshao previously approved these changes May 27, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All R1-R2 findings addressed. Delta-based estimation approach is sound, tests cover the key scenarios (happy path, inflated rejection, cap guard). CI green (8/8). — qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented May 28, 2026

Copy link
Copy Markdown
Collaborator

PR #4528 Verification Report

Branch: Jiarui/fix-compression-missing-usage
Test Environment: macOS Darwin 25.4.0

Build & Type Check

  • npm run buildPASSED
  • npx tsc --noEmitPASSED

Unit Tests

  • npx vitest run packages/core/src/services/chatCompressionService.test.ts71 tests passed

New/Modified Test Coverage

  1. Missing-usage compression succeeds with local estimation — when provider omits usage metadata, estimateContentTokens() fallback produces valid token counts and compression completes with COMPRESSED status
  2. Inflated local delta rejected — 40K char summary on 800-token input correctly triggers COMPRESSION_FAILED_OUTPUT_TRUNCATED, preventing runaway estimation
  3. Cap-sized summaries rejected — summaries hitting the output cap are properly rejected even when usage metadata is missing

Code Review Observations

  • Line ~546: compressionOutputTokenCount fallback uses estimateContentTokens(summary) with a warning log when usage metadata is absent — clean degradation path
  • Line ~659: New else branch for newTokenCount calculation estimates visible token delta locally while preserving the API-reported non-visible remainder (system/tool/prompt tokens) — avoids double-counting
  • Warning logs provide observability for when the fallback path activates
  • Guard rails (inflated delta rejection, cap-sized rejection) prevent local estimation from silently producing bad results

Verdict

Ready to merge. The fallback logic is well-bounded with proper rejection guards, and all 71 tests pass including the 3 new coverage scenarios.

— wenshao

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 整体逻辑合理:delta-based 估算保留了 API 侧不可见 token 开销,仅对可见对话部分做本地 char/4 估算,方向安全(保守)。truncation guard 的本地补充也正确。发现一个会导致编译失败的 merge conflict。

);
} else {
const estimatedOriginalVisibleTokenCount = estimateContentTokens(
historyForSplit,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical — 合并冲突] historyForSplit 在当前 main 分支上已不存在。PR #4599 将 tail-preservation 压缩重构为 summary + restoration attachments 模式后,该变量被移除。GitHub 已将本 PR 标记为 CONFLICTING。

需要 rebase 到最新 main。rebase 后此处应改用 curatedHistory(全量对话历史),或根据新代码结构中 historyForCompose(已剥离 orphaned functionCall 的版本)来估算原始可见 token。语义上 curatedHistory 更合适,因为 token 估算需要覆盖压缩前的完整可见内容。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 299e3d7.

I merged current origin/main into the PR branch, resolved the chatCompressionService.ts conflict, and changed the missing-usage fallback to estimate the original visible token baseline from curatedHistory instead of the removed historyForSplit path. The merge also keeps main's post-compact attachment token adjustment on the usage-metadata path.

Validation:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|missing usage|inflated|cap"
  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npm run typecheck --workspace=packages/core

…n-missing-usage

# Conflicts:
#	packages/core/src/services/chatCompressionService.ts
if (compressionOutputTokenCount === undefined && !isSummaryEmpty) {
compressionOutputTokenCount = estimateContentTokens(
[{ role: 'model', parts: [{ text: summary }] }],
slimmingConfig.imageTokenEstimate,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

estimateContentTokens uses a fixed ~4 chars/token ratio (CHARS_PER_TOKEN). For CJK text the actual ratio is closer to ~0.67 chars/token (each CJK character ≈ 1.5 tokens). A CJK summary truncated at COMPACT_MAX_OUTPUT_TOKENS tokens would have only ~COMPACT_MAX_OUTPUT_TOKENS × 0.67 characters, yielding an estimate of ~COMPACT_MAX_OUTPUT_TOKENS × 0.17 — well below the threshold, so the truncation guard on line 413 would not fire.

This only affects providers that omit usage metadata AND produce CJK output, but for those providers a truncated summary could silently corrupt the compressed history.

One possible mitigation: supplement the char-based estimate with a length check against COMPACT_MAX_OUTPUT_TOKENS × CHARS_PER_TOKEN chars (the maximum the model could have written), or apply a safety margin.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the current branch. The output-cap guard now uses a CJK-aware local estimate for summary output when provider usage is missing, so CJK summaries near the output-token cap are rejected as COMPRESSION_FAILED_OUTPUT_TRUNCATED instead of passing through via char/4 underestimation.

Coverage:

  • should reject CJK cap-sized summaries when usage metadata is missing

I also pushed 93b07ea08 to write that fixture as \u4e00, avoiding non-ASCII display/encoding noise while keeping the same CJK token-estimation case.

Validated:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体逻辑清晰,fallback 路径设计合理。留了一个 inline comment 关于 CJK 场景下 estimateContentTokens 的 char/token 比率偏差 —— 在无 usage metadata 的 provider + CJK 输出场景下,truncation guard 可能漏判。不 block,供参考。

BZ-D
BZ-D previously approved these changes Jun 1, 2026
wenshao
wenshao previously approved these changes Jun 1, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found. All R1-R3 findings addressed. The missing-usage fallback chain (totalTokenCount delta → local estimate → undefined) and the delta-based token math with non-visible remainder preservation are well-designed. Merge conflict resolution (historyForSplit → curatedHistory) is correct. LGTM ✅ — qwen-latest-series-invite-beta-v38 via Qwen Code /review

BZ-D
BZ-D previously approved these changes Jun 2, 2026

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the diff and ran the targeted compression tests; no blocking issues found.

…n-missing-usage

# Conflicts:
#	packages/core/src/services/chatCompressionService.ts
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Merged latest origin/main in b39d03b8b and cleared the compression-service merge conflict.

Conflict resolution:

  • Kept this PR's local fallback path for missing compression usage metadata, including the CJK-aware estimateSummaryOutputTokens() helper.
  • Kept upstream's new MAX_HOOK_INSTRUCTIONS_CHARS cap and buildStateReminderParts import/path.
  • The two blocks are independent, so the resolution preserves both behaviors.

Validation on Windows:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "missing usage|usage metadata|CJK|local estimate|fallback|output cap|cap-sized|inflated"
  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
  • npm run typecheck --workspace=packages/core
  • git diff --check

wenshao
wenshao previously approved these changes Jun 4, 2026
DragonnZhang
DragonnZhang previously approved these changes Jun 8, 2026

@DragonnZhang DragonnZhang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Verdict: APPROVE

No new findings. The fallback chain for missing usage metadata is well-designed and conservative throughout.

What was reviewed

  • chatCompressionService.ts: New estimateSummaryOutputTokens function with CJK awareness, local estimation fallback in the token math else branch, and the integration of local estimates into the existing cap guard and inflation guard.
  • chatCompressionService.test.ts: Four new/updated tests covering happy path, inflated delta rejection, cap-sized summary rejection, and CJK cap-sized summary rejection — all without usage metadata.

Assessment

  • The fallback chain (totalTokenCount delta → local char-based estimate → undefined) is sound and conservative.
  • The non-visible remainder preservation (originalTokenCount - estimatedOriginalVisibleTokenCount) correctly prevents the local estimator from replacing the API-authoritative total with a visible-history-only underestimate.
  • The CJK-aware output estimate takes max(generic, cjkAware), which is the safe (higher) direction — reducing false negatives in the cap guard.
  • The inflation guard (newTokenCount > originalTokenCount) correctly catches oversized summaries even without provider usage data.
  • Tests are comprehensive and validate both success and failure paths under missing usage metadata.
  • Deterministic checks (tsc, eslint): 0 findings.
  • CI: 14/14 checks passing.

— qwen-code via Qwen Code /review

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Addressed the review suggestions in 6020b735b.

Updates:

  • The missing-usage fallback test now asserts the post-compression estimate is tightly bounded (> 1000 and < 1100) instead of only checking that it is below the original 5000 tokens.
  • The same test now captures getDebugLogger().debug and asserts the fallback trace is emitted, including the preserved API-reported non-visible remainder.

Validated locally:

  • npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts
  • npm run lint --workspace=packages/core
  • npm run typecheck --workspace=packages/core
  • npx prettier --check packages/core/src/services/chatCompressionService.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

使用 MiniMax-2.7 模型,上下文压缩报错

5 participants