fix(core): compress when usage metadata is missing#4528
Conversation
|
This PR shares code with PR #4525, PR #4526, and PR #4531. Key shared functions:
Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
|
This PR shares code with PR #4525, PR #4526, and PR #4531. Key shared functions:
Recommendation: This PR is smaller (2 functions) but shares compression service code with PR #4525. This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications. |
wenshao
left a comment
There was a problem hiding this comment.
Suggestion (pre-existing code, not on diff line): The output-truncation guard at chatCompressionService.ts:560 (compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS) is a no-op when usage is undefined, because compressionOutputTokenCount is also undefined. A usage-omitting provider that returns a max-length truncated summary will have it accepted unconditionally by the new fallback path. Consider logging a warning when the fallback path is used and the truncation guard cannot fire, so the risk is at least observable.
— qwen3.7-max via Qwen Code /review
| (compressionInputTokenCount - 1000) + | ||
| compressionOutputTokenCount, | ||
| ); | ||
| } else { |
There was a problem hiding this comment.
[Critical] estimateContentTokens(extraHistory) produces a char/4 lower-bound of only the visible post-compression messages (summary + ack + historyToKeep). But originalTokenCount is API-reported and includes system prompt + tool definitions (~15–25K tokens of invisible context). These are on fundamentally different scales.
Downstream, geminiChat.ts:1362 stores newTokenCount as lastPromptTokenCount. On the next send, estimatePromptTokens() takes the lastPromptTokenCount > 0 branch and returns lastPromptTokenCount + estimateContentTokens([userMessage]) — missing the system-prompt-and-tools delta entirely. The auto-compaction gate then undercounts by 15–25K tokens until the next API response corrects it. For providers that consistently omit usage, the undercount persists across compression cycles.
Additionally, the newTokenCount > originalTokenCount (INFLATED_TOKEN_COUNT) guard becomes effectively dead code: the char/4 estimate of visible content will almost always be less than the API-reported full-prompt count, so even a pathological summary that makes the history larger would pass this check.
Suggested fix — estimate the delta rather than the absolute post-compression size, so newTokenCount stays on the same scale as originalTokenCount:
} else {
// Preserve the API-reported baseline (system prompt + tools) by
// estimating the reduction from content replacement rather than
// the absolute post-compression size.
const estimatedRemoved = estimateContentTokens(
historyToCompress,
slimmingConfig.imageTokenEstimate,
);
const syntheticMessages = extraHistory.slice(
0,
extraHistory.length - historyToKeep.length,
);
const estimatedAdded = estimateContentTokens(
syntheticMessages,
slimmingConfig.imageTokenEstimate,
);
newTokenCount = Math.max(
0,
originalTokenCount - estimatedRemoved + estimatedAdded,
);
canCalculateNewTokenCount = newTokenCount > 0;
}This mirrors the primary path's structure (originalTokenCount - removed + added) and keeps both sides of the INFLATED guard comparison on the same measurement scale.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Addressed in the current branch. The missing-usage path no longer replaces the API-reported total with a visible-history-only estimate. It computes the visible delta only, preserves the API-reported non-visible remainder (originalTokenCount - estimatedOriginalVisibleTokenCount), and adds the estimated post-compression visible history back on top.
Coverage:
should use estimated token count if usage metadata is missing
Validated:
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.tsnpx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
| newTokenCount = Math.max( | ||
| 0, | ||
| originalTokenCount - | ||
| (compressionInputTokenCount - 1000) + |
There was a problem hiding this comment.
[Suggestion] This fallback branch silently switches measurement methodology with no log trace. Other paths in this file emit debug/warn messages (e.g., lines 494, 567). A 3 AM operator sees a chat_compression telemetry event with tokens_after populated but compression_input_token_count / compression_output_token_count absent — there is no breadcrumb explaining why.
| (compressionInputTokenCount - 1000) + | |
| } else { | |
| config.getDebugLogger().debug( | |
| `[chat-compression] usage metadata omitted by provider; ` + | |
| `falling back to local token estimation.`, | |
| ); | |
| newTokenCount = estimateContentTokens( |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Addressed in 6020b735b: the missing-usage fallback path now has an explicit debug breadcrumb. The test captures getDebugLogger().debug and asserts the usage metadata missing / preserved non-visible remainder message is emitted.
| expect(result.info.originalTokenCount).toBe(800); | ||
| expect(result.info.newTokenCount).toBe(800); | ||
| expect(result.newHistory).toBeNull(); | ||
| expect(result.info.newTokenCount).toBeLessThan(800); |
There was a problem hiding this comment.
[Suggestion] toBeLessThan(800) is too loose to verify the estimation fallback actually ran. With the test's tiny history ("Summary" + "Got it..." ack + a few kept messages ≈ 57 chars), estimateContentTokens returns approximately Math.ceil(57/4) = 15 tokens. The upper bound of 800 would pass even if the fallback branch were never entered and newTokenCount were accidentally close to originalTokenCount.
Consider a tighter assertion that actually proves the estimator ran:
| expect(result.info.newTokenCount).toBeLessThan(800); | |
| expect(result.info.newTokenCount).toBeGreaterThan(0); | |
| expect(result.info.newTokenCount).toBeLessThan(50); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Addressed in 6020b735b with a tighter assertion. I did not use <50 here because the current fallback intentionally preserves the API-reported non-visible remainder (5000 - 4000 = 1000) and only replaces the visible-history delta. The test now asserts newTokenCount > 1000 and < 1100, which verifies the fallback estimator ran while preserving the non-visible remainder.
| expect(result.info.newTokenCount).toBeLessThan(5_000); | ||
| expect(result.newHistory).not.toBeNull(); | ||
| expect(result.newHistory![0].parts![0].text).toBe('Summary'); | ||
| }); |
There was a problem hiding this comment.
[Suggestion] Missing test coverage for the delta-based estimation path producing COMPRESSION_FAILED_INFLATED_TOKEN_COUNT. All existing INFLATED tests (lines 959 and 1171) use explicit promptTokenCount/candidatesTokenCount in mock usage, exercising the primary formula. No test covers the scenario where delta-based estimation (estimatedNewContentTokenCount > estimatedOriginalContentTokenCount) triggers the INFLATED guard — e.g., when the side-query returns a verbose summary larger than the compressed input in estimated terms. This branch at chatCompressionService.ts:723 is reachable via the delta path but untested.
Consider adding a test that constructs a small historyForSplit with a verbose summary (e.g., text: 'x'.repeat(40_000)) and usage: undefined, asserting COMPRESSION_FAILED_INFLATED_TOKEN_COUNT and newHistory === null.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Covered in the current branch by should reject inflated local delta if usage metadata is missing. That test sets usage: undefined, exercises the local delta estimation path, and asserts COMPRESSION_FAILED_INFLATED_TOKEN_COUNT with newHistory === null when the estimated post-compression count is inflated.
Validated:
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.tsnpx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
wenshao
left a comment
There was a problem hiding this comment.
All R1-R2 findings addressed. Delta-based estimation approach is sound, tests cover the key scenarios (happy path, inflated rejection, cap guard). CI green (8/8). — qwen3.7-max via Qwen Code /review
PR #4528 Verification ReportBranch: Build & Type Check
Unit Tests
New/Modified Test Coverage
Code Review Observations
VerdictReady to merge. The fallback logic is well-bounded with proper rejection guards, and all 71 tests pass including the 3 new coverage scenarios. — wenshao |
BZ-D
left a comment
There was a problem hiding this comment.
PR 整体逻辑合理:delta-based 估算保留了 API 侧不可见 token 开销,仅对可见对话部分做本地 char/4 估算,方向安全(保守)。truncation guard 的本地补充也正确。发现一个会导致编译失败的 merge conflict。
| ); | ||
| } else { | ||
| const estimatedOriginalVisibleTokenCount = estimateContentTokens( | ||
| historyForSplit, |
There was a problem hiding this comment.
[Critical — 合并冲突] historyForSplit 在当前 main 分支上已不存在。PR #4599 将 tail-preservation 压缩重构为 summary + restoration attachments 模式后,该变量被移除。GitHub 已将本 PR 标记为 CONFLICTING。
需要 rebase 到最新 main。rebase 后此处应改用 curatedHistory(全量对话历史),或根据新代码结构中 historyForCompose(已剥离 orphaned functionCall 的版本)来估算原始可见 token。语义上 curatedHistory 更合适,因为 token 估算需要覆盖压缩前的完整可见内容。
There was a problem hiding this comment.
Fixed in 299e3d7.
I merged current origin/main into the PR branch, resolved the chatCompressionService.ts conflict, and changed the missing-usage fallback to estimate the original visible token baseline from curatedHistory instead of the removed historyForSplit path. The merge also keeps main's post-compact attachment token adjustment on the usage-metadata path.
Validation:
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|missing usage|inflated|cap"npm run test --workspace=packages/core -- src/services/chatCompressionService.test.tsnpx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpm run typecheck --workspace=packages/core
…n-missing-usage # Conflicts: # packages/core/src/services/chatCompressionService.ts
| if (compressionOutputTokenCount === undefined && !isSummaryEmpty) { | ||
| compressionOutputTokenCount = estimateContentTokens( | ||
| [{ role: 'model', parts: [{ text: summary }] }], | ||
| slimmingConfig.imageTokenEstimate, |
There was a problem hiding this comment.
estimateContentTokens uses a fixed ~4 chars/token ratio (CHARS_PER_TOKEN). For CJK text the actual ratio is closer to ~0.67 chars/token (each CJK character ≈ 1.5 tokens). A CJK summary truncated at COMPACT_MAX_OUTPUT_TOKENS tokens would have only ~COMPACT_MAX_OUTPUT_TOKENS × 0.67 characters, yielding an estimate of ~COMPACT_MAX_OUTPUT_TOKENS × 0.17 — well below the threshold, so the truncation guard on line 413 would not fire.
This only affects providers that omit usage metadata AND produce CJK output, but for those providers a truncated summary could silently corrupt the compressed history.
One possible mitigation: supplement the char-based estimate with a length check against COMPACT_MAX_OUTPUT_TOKENS × CHARS_PER_TOKEN chars (the maximum the model could have written), or apply a safety margin.
There was a problem hiding this comment.
Addressed in the current branch. The output-cap guard now uses a CJK-aware local estimate for summary output when provider usage is missing, so CJK summaries near the output-token cap are rejected as COMPRESSION_FAILED_OUTPUT_TRUNCATED instead of passing through via char/4 underestimation.
Coverage:
should reject CJK cap-sized summaries when usage metadata is missing
I also pushed 93b07ea08 to write that fixture as \u4e00, avoiding non-ASCII display/encoding noise while keeping the same CJK token-estimation case.
Validated:
npm run test --workspace=packages/core -- src/services/chatCompressionService.test.tsnpx prettier --check packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.tsnpx eslint packages/core/src/services/chatCompressionService.ts packages/core/src/services/chatCompressionService.test.ts
BZ-D
left a comment
There was a problem hiding this comment.
整体逻辑清晰,fallback 路径设计合理。留了一个 inline comment 关于 CJK 场景下 estimateContentTokens 的 char/token 比率偏差 —— 在无 usage metadata 的 provider + CJK 输出场景下,truncation guard 可能漏判。不 block,供参考。
wenshao
left a comment
There was a problem hiding this comment.
No new issues found. All R1-R3 findings addressed. The missing-usage fallback chain (totalTokenCount delta → local estimate → undefined) and the delta-based token math with non-visible remainder preservation are well-designed. Merge conflict resolution (historyForSplit → curatedHistory) is correct. LGTM ✅ — qwen-latest-series-invite-beta-v38 via Qwen Code /review
299e3d7 to
9facc14
Compare
BZ-D
left a comment
There was a problem hiding this comment.
Reviewed the diff and ran the targeted compression tests; no blocking issues found.
…n-missing-usage # Conflicts: # packages/core/src/services/chatCompressionService.ts
|
Merged latest Conflict resolution:
Validation on Windows:
|
DragonnZhang
left a comment
There was a problem hiding this comment.
Review Summary
Verdict: APPROVE
No new findings. The fallback chain for missing usage metadata is well-designed and conservative throughout.
What was reviewed
chatCompressionService.ts: NewestimateSummaryOutputTokensfunction with CJK awareness, local estimation fallback in the token mathelsebranch, and the integration of local estimates into the existing cap guard and inflation guard.chatCompressionService.test.ts: Four new/updated tests covering happy path, inflated delta rejection, cap-sized summary rejection, and CJK cap-sized summary rejection — all without usage metadata.
Assessment
- The fallback chain (
totalTokenCountdelta → local char-based estimate →undefined) is sound and conservative. - The non-visible remainder preservation (
originalTokenCount - estimatedOriginalVisibleTokenCount) correctly prevents the local estimator from replacing the API-authoritative total with a visible-history-only underestimate. - The CJK-aware output estimate takes
max(generic, cjkAware), which is the safe (higher) direction — reducing false negatives in the cap guard. - The inflation guard (
newTokenCount > originalTokenCount) correctly catches oversized summaries even without provider usage data. - Tests are comprehensive and validate both success and failure paths under missing usage metadata.
- Deterministic checks (tsc, eslint): 0 findings.
- CI: 14/14 checks passing.
— qwen-code via Qwen Code /review
|
Addressed the review suggestions in Updates:
Validated locally:
|
What this PR does
Allows chat compression to proceed safely when provider usage metadata is missing, while rejecting inflated local token deltas that would make the compression result unsafe.
Why it's needed
Some model responses omit usage metadata. The compression path still needs a conservative fallback instead of silently skipping or accepting an unsafe inflated estimate.
Reviewer Test Plan
How to verify
Run: npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts -t "usage metadata|inflated local delta|cap-sized"; npm test --workspace=packages/core -- src/services/chatCompressionService.test.ts; npx eslint packages/core/src/services/chatCompressionService.test.ts; npx prettier --check packages/core/src/services/chatCompressionService.test.ts; npm run typecheck --workspace=packages/core.
Evidence (Before & After)
Before: missing usage metadata could leave the compression path without coverage for inflated local estimates. After: regression tests cover missing-usage compression, inflated fallback rejection, and cap-sized safe fallback. This is core service behavior, so TUI screenshots are N/A.
Tested on
Environment (optional)
Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.
Risk & Scope
Linked Issues
Fixes #3282
中文说明
这个 PR 做了什么
在 provider 没有返回 usage metadata 时,仍然允许压缩走安全的保守路径,并拒绝膨胀的本地 token 估算。
为什么需要
部分模型响应没有 usage metadata,压缩逻辑需要有可靠 fallback,不能跳过或接受不安全估算。
Reviewer Test Plan
本地在 Windows 跑了 chatCompressionService 定向测试、完整文件测试、eslint、prettier check 和 core typecheck;macOS/Linux 依赖 GitHub CI。
风险和范围
只影响缺失 usage metadata 时的压缩判断,不改压缩公共接口。