Skip to content

fix(core): include response tokens in prompt estimate#4525

Open
Jerry2003826 wants to merge 11 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-estimate-prompt-candidates-token-count
Open

fix(core): include response tokens in prompt estimate#4525
Jerry2003826 wants to merge 11 commits into
QwenLM:mainfrom
Jerry2003826:Jiarui/fix-estimate-prompt-candidates-token-count

Conversation

@Jerry2003826

@Jerry2003826 Jerry2003826 commented May 26, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Includes response-side tokens when estimating prompt candidates so request-size checks account for the full conversation payload more accurately.

Why it's needed

Ignoring response tokens can underestimate the size of candidate histories and allow oversized histories to advance too far before later guards reject them.

Reviewer Test Plan

How to verify

Run the targeted prompt-estimate tests in packages/core, then run eslint/prettier/typecheck for the touched core files. The GitHub CI matrix also runs lint and tests on macOS, Windows, and Linux.

Evidence (Before & After)

Before: prompt candidate estimates could be lower than the actual model-facing history. After: regression coverage verifies response tokens are included in the estimate. This is core estimation behavior, so TUI screenshots are N/A.

Tested on

OS Status
macOS CI only
Windows Tested locally
Linux CI only

Environment (optional)

Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included for PRs whose behavior is core, session, parser, or transport logic rather than a visible TUI state.

Risk & Scope

  • Main risk or tradeoff: Limited to token-size estimation for prompt candidates; no provider API or UI behavior changes.
  • Not validated / out of scope: No unrelated refactors, public API changes, UI redesigns, or behavior outside the linked issue scope.
  • Breaking changes / migration notes: None expected.

Linked Issues

Fixes #4349

中文说明

这个 PR 做了什么

估算 prompt candidates 时纳入 response 侧 tokens,使请求大小判断更接近真实上下文。

为什么需要

忽略 response tokens 会低估历史大小,让过大的历史走到更晚阶段才失败。

Reviewer Test Plan

本地按 PR 相关 core 测试、lint、format check、typecheck 验证;macOS/Linux 由 GitHub CI 验证。

风险和范围

只影响候选历史的 token 估算,不改 provider API 或 UI。

@wenshao wenshao added the type/bug Something isn't working as expected label May 26, 2026
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated the PR description to the latest template with Reviewer Test Plan, Evidence, Tested on, Risk & Scope, and Chinese summary.

No code changes were needed for this update; CI is already green across macOS, Windows, Linux, lint, and CodeQL.

Comment thread packages/core/src/core/geminiChat.ts Outdated
// Context usage tracks prompt size; output isn't in history yet.
const lastPromptTokenCount =
usageMetadata.promptTokenCount || usageMetadata.totalTokenCount;
this.lastCandidatesTokenCount =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Two issues at this assignment site:

1. Double-counting via totalTokenCount fallback. When promptTokenCount is absent (line 2538), lastPromptTokenCount falls back to totalTokenCount, which already includes candidatesTokenCount per the Gemini API. Adding candidatesTokenCount again here means the next estimatePromptTokens call computes (prompt + candidates) + candidates + userEst — over-counting by exactly candidatesTokenCount. This path is reachable via OpenAI-compatible providers that only report total_tokens and completion_tokens.

Consider guarding so lastCandidatesTokenCount is only set when lastPromptTokenCount came from the authoritative promptTokenCount:

if (usageMetadata.promptTokenCount) {
  this.lastCandidatesTokenCount = usageMetadata.candidatesTokenCount ?? 0;
}

2. Missing thoughtsTokenCount. For thinking models (Gemini 2.5 Pro, etc.), thinking tokens are reported in a separate thoughtsTokenCount field that is NOT included in candidatesTokenCount. Thought parts are appended to history (line ~2686) and count toward the next promptTokenCount, but this field only captures visible output tokens. For a response with 15K thinking tokens and 500 visible output tokens, the estimate under-counts by 15K.

Consider including thinking tokens:

this.lastCandidatesTokenCount =
  (usageMetadata.candidatesTokenCount ?? 0) +
  (usageMetadata.thoughtsTokenCount ?? 0);

(Consider renaming the field to lastOutputTokenCount to match the broader semantics.)

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 846ccf4. The usage metadata path now only carries previous-response output tokens when promptTokenCount is present, so the totalTokenCount fallback is not double-counted. It also adds thoughtsTokenCount to the carried output token count for thinking-model responses.

Added regressions for both cases: one where totalTokenCount + candidatesTokenCount must stay below the hard-tier force threshold, and one where promptTokenCount + candidatesTokenCount + thoughtsTokenCount must cross it.

Comment thread packages/core/src/core/geminiChat.ts Outdated
*/
setLastPromptTokenCount(count: number): void {
this.lastPromptTokenCount = count;
this.lastCandidatesTokenCount = 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This line has a hidden side-effect: it zeroes lastCandidatesTokenCount. The JSDoc on setLastPromptTokenCount (lines 1320-1326) only mentions seeding prompt-token counts and doesn't document this reset. A future caller who uses this method to update the prompt count mid-session (not just at seeding time) will silently discard the candidates count from the last response.

Consider updating the JSDoc to mention the side-effect:

* Side-effect: also resets `lastCandidatesTokenCount` to 0, because
* externally-seeded prompt counts come from a different response than
* the one that populated `lastCandidatesTokenCount`.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented in 846ccf4. The JSDoc now explicitly says setLastPromptTokenCount() clears the remembered previous-response output token count because externally seeded prompt counts come from a different chat instance and should not inherit this chat's last response size.

@Jerry2003826 Jerry2003826 force-pushed the Jiarui/fix-estimate-prompt-candidates-token-count branch from 1267375 to 846ccf4 Compare May 26, 2026 08:01
Comment thread packages/core/src/core/geminiChat.ts Outdated
usageMetadata.promptTokenCount ?? usageMetadata.totalTokenCount;
this.lastCandidatesTokenCount =
usageMetadata.promptTokenCount !== undefined
? (usageMetadata.candidatesTokenCount ?? 0) +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] For OpenAI-compatible providers, candidatesTokenCount already includes reasoning_tokens (the OpenAI converter at openaiContentGenerator/converter.ts sets candidatesTokenCount = completion_tokens, which includes reasoning_tokens, and separately sets thoughtsTokenCount = reasoning_tokens). Adding them here gives completion_tokens + reasoning_tokens — double-counting the reasoning portion.

Native Gemini is unaffected (candidatesTokenCount and thoughtsTokenCount are disjoint). But for OpenAI thinking models (e.g., o1/o3), this inflates lastCandidatesTokenCount by the size of reasoning_tokens (potentially 1K–10K per turn), making the hard-tier estimate more conservative than necessary.

Two possible fixes:

Option A — derive from totalTokenCount (provider-agnostic):

this.lastCandidatesTokenCount =
  usageMetadata.promptTokenCount !== undefined
    ? Math.max(0, (usageMetadata.totalTokenCount ?? 0) - usageMetadata.promptTokenCount)
    : 0;

Option B — fix the OpenAI converter to subtract reasoning from candidates:

candidatesTokenCount: Math.max(0, completionTokens - thinkingTokens),

Option A keeps the fix local; Option B aligns the converter with Gemini semantics (which may benefit other consumers too).

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in 85668cd. The carried previous-response output token count now prefers totalTokenCount - promptTokenCount when totalTokenCount is available, so OpenAI-compatible completion_tokens that already include reasoning_tokens are not counted again via thoughtsTokenCount. Added a regression covering this exact OpenAI-compatible reasoning-token shape.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4526, PR #4528, and PR #4531 — all modifying core token estimation and prompt handling logic.

Key shared functions:

  • sendMessageStreampackages/core/src/core/client.ts, packages/core/src/core/geminiChat.ts
  • getLastPromptTokenCountpackages/core/src/agents/runtime/agent-interactive.ts, packages/core/src/core/geminiChat.ts
  • setLastPromptTokenCountpackages/core/src/core/geminiChat.ts, packages/core/src/telemetry/uiTelemetry.ts
  • stripTrailingSessionStartContextBlockpackages/core/src/core/geminiChat.ts
  • tryCompresspackages/core/src/core/geminiChat.ts
  • compresspackages/core/src/services/chatCompressionService.ts
  • estimateContentTokens, estimatePromptTokenspackages/core/src/services/tokenEstimation.ts

Recommendation: All 4 PRs modify the same token estimation and compression pipeline. Review as a batch to ensure consistency in token counting logic.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@longbinlai

Copy link
Copy Markdown

⚠️ Cross-PR Conflict Detected — Token/Prompt Handling Group (4 PRs)

This PR shares code with PR #4526, PR #4528, and PR #4531 — all modifying core token estimation and prompt handling logic.

Key shared functions:

  • sendMessageStreampackages/core/src/core/client.ts, packages/core/src/core/geminiChat.ts
  • getLastPromptTokenCountpackages/core/src/agents/runtime/agent-interactive.ts, packages/core/src/core/geminiChat.ts
  • setLastPromptTokenCountpackages/core/src/core/geminiChat.ts, packages/core/src/telemetry/uiTelemetry.ts
  • stripTrailingSessionStartContextBlockpackages/core/src/core/geminiChat.ts
  • tryCompresspackages/core/src/core/geminiChat.ts
  • compresspackages/core/src/services/chatCompressionService.ts
  • estimateContentTokens, estimatePromptTokenspackages/core/src/services/tokenEstimation.ts

Recommendation: All 4 PRs modify the same token estimation and compression pipeline. Review as a batch to ensure consistency in token counting logic.


This conflict was detected automatically by CodeScope skills — structural analysis of call graphs and file modifications.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] lastCandidatesTokenCount is not persisted/restored on session resume (--continue). sessionService.getResumePromptTokenCount extracts only promptTokenCount, and setLastPromptTokenCount resets lastCandidatesTokenCount = 0. After resume, the first send's estimate is lower than it should be by the last response's output tokens. Consider extracting candidatesTokenCount + thoughtsTokenCount from the saved usageMetadata and threading it through a new setLastCandidatesTokenCount method.

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
const lastPromptTokenCount =
usageMetadata.promptTokenCount || usageMetadata.totalTokenCount;
usageMetadata.promptTokenCount ?? usageMetadata.totalTokenCount;
this.lastCandidatesTokenCount =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] lastCandidatesTokenCount uses promptTokenCount !== undefined as its guard, but lastPromptTokenCount (line 2547) uses a truthy check on the ?? result. When promptTokenCount === 0, lastCandidatesTokenCount IS updated but lastPromptTokenCount is NOT — the two counters become desynchronized, and the next estimatePromptTokens mixes a stale prompt count with a fresh candidate count.

Consider moving lastCandidatesTokenCount inside the if (lastPromptTokenCount) block so both counters update atomically:

if (lastPromptTokenCount) {
  this.lastPromptTokenCount = lastPromptTokenCount;
  this.lastCandidatesTokenCount =
    (usageMetadata.candidatesTokenCount ?? 0) +
    (usageMetadata.thoughtsTokenCount ?? 0);
  this.telemetryService?.setLastPromptTokenCount(lastPromptTokenCount);
}

Also: the field name lastCandidatesTokenCount is slightly misleading — it stores candidatesTokenCount + thoughtsTokenCount (all output tokens), not just candidates. Renaming to lastOutputTokenCount would prevent future maintainainers from accidentally double-counting thoughts. A debugLogger.debug at the assignment site would also help with compaction debugging.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 75afe0c. The previous-output counter now updates only inside the same if (lastPromptTokenCount) block as lastPromptTokenCount, so a zero prompt count cannot refresh one counter while leaving the other stale.

chat.getHistoryShallow(true),
pendingUserMessage,
originalTokenCount,
0,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The 0 here is lastCandidatesTokenCount. Consider adding an inline comment explaining why: the compression service lacks access to the chat's per-chat output token count, and the precomputedEffectiveTokens path (which already includes candidates) covers the common sendMessageStream caller.

Suggested change
0,
0, // lastCandidatesTokenCount: unavailable here; precomputedEffectiveTokens covers the common path

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in 75afe0c. Added the inline comment explaining that this path has no per-chat output-token state and that the common sendMessageStream caller uses precomputedEffectiveTokens, which already includes previous output tokens.

passedOpts.pendingUserMessage?.parts?.some(
(part) => part.text === userMessage,
),
).toBe(true);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Two test gaps:

  1. Compression-success reset: the existing test "resets previous response candidate tokens when seeding last prompt tokens externally" covers the setLastPromptTokenCount reset path (line 1332) but not the compression-success reset at line 1375. Add a test that primes lastCandidatesTokenCount, triggers successful compression, and asserts the follow-up's precomputedEffectiveTokens does NOT include pre-compression candidate tokens.

  2. Candidates ignored on fallback: no test verifies estimatePromptTokens(history, user, 0, 9999) ignores lastCandidatesTokenCount when lastPromptTokenCount is 0:

it('ignores lastCandidatesTokenCount when lastPromptTokenCount is 0', () => {
  const fullEst = estimateContentTokens([...history, user]);
  expect(estimatePromptTokens(history, user, 0, 9999)).toBe(fullEst);
});

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 75afe0c. New coverage verifies successful compression clears the previous-output token contribution before the next estimate, and estimatePromptTokens(..., 0, lastCandidatesTokenCount) ignores candidate tokens on the first-send fallback path.

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Handled in 1155387.

Resume now restores both sides of the prompt estimate:

  • sessionService extracts the prompt-token anchor and the previous assistant output token count from the latest saved usageMetadata.
  • GeminiClient.initialize() seeds the restored GeminiChat with both setLastPromptTokenCount() and the new setLastCandidatesTokenCount().
  • When resume falls back to totalTokenCount or compression metadata for the prompt anchor, the previous-output seed stays at 0 to avoid double-counting tokens already included in that fallback.

Validation:

  • npm run test --workspace=packages/core -- src/core/client.test.ts -t "seeds resumed chat"
  • npm run test --workspace=packages/core -- src/services/sessionService.test.ts -t "getResumePromptTokenCount"
  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hard-tier estimate|OpenAI-compatible reasoning"
  • npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/client.ts packages/core/src/core/client.test.ts packages/core/src/services/sessionService.ts packages/core/src/services/sessionService.test.ts
  • npm run typecheck --workspace=packages/core

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R4 review at 1155387. All R3 inline comments (guard mismatch, undocumented 0, compression-reset test gap) have been addressed — nice work. CI green (8/8), 437 tests pass. Two new Suggestions below: shared output-token formula to prevent drift, and a stream-path test for the candidates + thoughts fallback branch.

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
usageMetadata.promptTokenCount !== undefined
? usageMetadata.totalTokenCount !== undefined
? Math.max(
0,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The nested ternary here duplicates the same three-branch output-token derivation as getUsageOutputTokenCountForPromptEstimate() in sessionService.ts (added in this same PR). Both implement: (1) no promptTokenCount → 0, (2) totalTokenCount present → total - prompt, (3) fallback → candidates + thoughts. When a new provider reports tokens differently, a maintainer may update one copy and miss the other, causing resume vs. steady-state to produce different lastCandidatesTokenCount values.

Suggested change
0,
this.lastCandidatesTokenCount =
getUsageOutputTokenCountForPromptEstimate(usageMetadata);

(Export getUsageOutputTokenCountForPromptEstimate from sessionService.ts or move it to tokenEstimation.ts so both call sites share one implementation.)

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in 9a60577. Moved the output-token derivation into shared getUsageOutputTokenCountForPromptEstimate() in tokenEstimation.ts; both steady-state stream handling and resume restoration now use the same helper.


it('resets previous response candidate tokens when seeding last prompt tokens externally', async () => {
const compressSpy = vi
.spyOn(ChatCompressionService.prototype, 'compress')

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The stream handler's candidatesTokenCount + thoughtsTokenCount fallback branch (geminiChat.ts:2564-2565 — fires when promptTokenCount is defined but totalTokenCount is undefined) is not exercised by any stream-path test. All three existing tests that provide promptTokenCount also provide totalTokenCount. The equivalent computation in getUsageOutputTokenCountForPromptEstimate IS tested for this scenario ("should restore disjoint candidate and thought output tokens when total is unavailable"), but the stream handler's duplicate is not.

Add a stream-path test modeled on "includes previous response thought tokens" but with totalTokenCount omitted from mock usageMetadata:

mockResolvedValueOnce(
  makeStreamResponse('first', {
    promptTokenCount: 176_000,
    candidatesTokenCount: 1_200,
    thoughtsTokenCount: 300,
    // totalTokenCount intentionally omitted
  }),
)

Then assert that precomputedEffectiveTokens on the follow-up call reflects 176_000 + 1_500 (prompt + candidates + thoughts).

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 9a60577. The stream path now has coverage for usage metadata with promptTokenCount, candidatesTokenCount, and thoughtsTokenCount while totalTokenCount is omitted, and the follow-up hard-tier estimate includes candidates + thoughts.

setLastCandidatesTokenCount(count: number): void {
this.lastCandidatesTokenCount = Math.max(0, count);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] setLastCandidatesTokenCount uses Math.max(0, count), which does not guard against NaNMath.max(0, NaN) returns NaN. If a future caller passes NaN (e.g., from unchecked arithmetic on usage metadata), lastCandidatesTokenCount becomes NaN, propagating through estimatePromptTokens and making effectiveTokens >= hard evaluate to false — silently disabling the hard-tier rescue gate with no observable signal.

Suggested change
setLastCandidatesTokenCount(count: number): void {
this.lastCandidatesTokenCount = Number.isFinite(count) ? Math.max(0, count) : 0;
}

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 11d064d. The old candidates setter was replaced by seedResumeTokenCounts(), which normalizes non-finite output counts to 0 before storing them.

* under-counting the next request near the hard compaction threshold.
*/
private lastCandidatesTokenCount = 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The field lastCandidatesTokenCount (and the setter/parameter names throughout) stores candidatesTokenCount + thoughtsTokenCount — the total non-prompt output — not just the Gemini API's candidatesTokenCount. A future maintainer reading the field name will assume it maps 1:1 to the API's candidatesTokenCount and may omit thoughtsTokenCount in new code paths, or double-count thoughts if added separately.

Consider renaming to lastOutputTokenCount across the field, getter/setter, and the estimatePromptTokens parameter to match the semantic (total output tokens from the previous response).

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 11d064d. The per-chat counter and estimatePromptTokens parameter are now named lastOutputTokenCount, matching the candidates+thoughts semantics instead of the raw Gemini field name.

Comment thread packages/core/src/core/client.ts Outdated
uiTelemetryService.getLastPromptTokenCount(),
);
if (resumeTokenCounts) {
chat.setLastCandidatesTokenCount(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The resume flow depends on a strict call order: setLastPromptTokenCount (which silently zeroes lastCandidatesTokenCount as a side-effect), then setLastCandidatesTokenCount (which restores the value). Nothing at the type or API level enforces this ordering — a future refactor that reorders or wraps these calls will silently zero the candidates count with no compile error or runtime error.

Consider introducing an atomic seeder method to make the coupling explicit:

Suggested change
chat.setLastCandidatesTokenCount(
if (resumeTokenCounts) {
chat.seedResumeTokenCounts(
resumeTokenCounts.promptTokenCount,
resumeTokenCounts.candidatesTokenCount,
);
}

Keep setLastPromptTokenCount (with its zeroing side-effect) for subagent/post-compression paths where zeroing is correct.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 11d064d. Resume now seeds prompt/output counts through seedResumeTokenCounts(promptTokenCount, outputTokenCount), so the two restored counters are set atomically.

usageMetadata.promptTokenCount ?? usageMetadata.totalTokenCount;
if (lastPromptTokenCount) {
// Always update the per-chat counter so this chat (including
// subagents) can make its own compaction decisions.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The || to ?? change here is a behavioral change: when promptTokenCount === 0, the old || would fall through to totalTokenCount (entering the update block), while ?? keeps 0, causing if (lastPromptTokenCount) to be falsy and skipping both counter updates. No test exercises promptTokenCount: 0 to pin the intended behavior. Without a regression test, a future maintainer could revert to || thinking the change was accidental, reintroducing the original over-counting.

Consider adding a test with usageMetadata: { promptTokenCount: 0, totalTokenCount: 5000 } that documents the expected behavior (counters remain unchanged).

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 11d064d. Added does not replace token counters when usage reports zero prompt tokens, which pins that promptTokenCount: 0 does not reseed counters from totalTokenCount.

Comment thread packages/core/src/core/geminiChat.ts Outdated
? Math.max(
0,
usageMetadata.totalTokenCount -
usageMetadata.promptTokenCount,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The candidatesTokenCount + thoughtsTokenCount fallback branch in getUsageOutputTokenCountForPromptEstimate (when promptTokenCount is defined but totalTokenCount is absent) is not tested through the stream handler path. Every test with promptTokenCount also provides totalTokenCount, so the ternary always takes the subtraction branch. The same function IS tested via sessionService.test.ts ("should restore disjoint candidate and thought output tokens when total is unavailable"), but the stream handler's invocation of this branch remains uncovered.

Providers that return promptTokenCount without totalTokenCount rely on this untested path for correct compaction thresholds.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in 9a60577 by includes disjoint candidate and thought tokens when total token count is unavailable; the stream-path fixture omits totalTokenCount and verifies the next hard-tier estimate includes candidates + thoughts.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R5 review at 9a60577 — incremental review of 2 new commits since R4. Both R4 Suggestions addressed: (1) output-token formula extracted into shared getUsageOutputTokenCountForPromptEstimate() in tokenEstimation.ts; (2) stream-path fallback branch now has test coverage. All R3 inline comments resolved (guard mismatch, undocumented zero, compression-reset test gap). Downgraded from Approve to Comment: CI still running. tsc clean, eslint clean, 234 targeted tests pass (geminiChat 155 + tokenEstimation 10 + chatCompressionService 69), 9432 full-suite tests pass. One low-confidence Suggestion (terminal-only, not posted inline): shared helper lacks direct unit tests in tokenEstimation.test.ts — integration tests cover the branches but direct tests would pin the contract. — qwen3.7-max via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-file observation: agent-core.ts:365 seeds lastPromptTokenCount on subagent fork chats but does not seed lastCandidatesTokenCount. Since setLastPromptTokenCount resets lastCandidatesTokenCount = 0, fork subagents will under-count the prompt estimate by the parent's previous response output tokens — the same under-counting this PR fixes for the main chat and session resume paths. Not a blocker for this PR (agent-core.ts is out of scope), but worth tracking as follow-up.

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/client.ts Outdated
);
this.getChat().setLastPromptTokenCount(
uiTelemetryService.getLastPromptTokenCount(),
const resumeTokenCounts = getResumeTokenCounts(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] getResumeTokenCounts(conversation) is called twice for the same conversation object during resume: once inside replayUiTelemetryFromConversation() at line 228 (which was refactored in this PR to call getResumeTokenCounts internally), and again here. Both walk messages backward — an O(n) traversal done redundantly.

Consider having replayUiTelemetryFromConversation return the ResumeTokenCounts | undefined it already computed, and consuming the return value here instead of recomputing:

// sessionService.ts
export function replayUiTelemetryFromConversation(
  conversation: ConversationRecord,
): ResumeTokenCounts | undefined {
  const resumeTokenCounts = getResumeTokenCounts(conversation);
  if (resumeTokenCounts !== undefined) {
    uiTelemetryService.setLastPromptTokenCount(resumeTokenCounts.promptTokenCount);
  }
  return resumeTokenCounts;
}

// client.ts
const resumeTokenCounts = replayUiTelemetryFromConversation(
  resumedSessionData.conversation,
);

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4b27a34. replayUiTelemetryFromConversation() now returns the ResumeTokenCounts it computes, and the resume path consumes that return value instead of walking the conversation a second time.

Comment thread packages/core/src/core/client.test.ts Outdated
promptTokenCount: 200,
candidatesTokenCount: 60,
thoughtsTokenCount: 20,
totalTokenCount: 260,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The fixture has internally inconsistent token accounting: promptTokenCount: 200 + candidatesTokenCount: 60 + thoughtsTokenCount: 20 = 280, but totalTokenCount: 260. Because getUsageOutputTokenCountForPromptEstimate takes the totalTokenCount - promptTokenCount = 60 path when totalTokenCount is defined, it silently drops thoughtsTokenCount: 20 and the assertion toHaveBeenCalledWith(60) passes without verifying that thinking tokens are included.

Fix the fixture so totalTokenCount matches the sum (280), then assert toHaveBeenCalledWith(80) (candidates 60 + thoughts 20):

Suggested change
totalTokenCount: 260,
totalTokenCount: 280,
Suggested change
totalTokenCount: 260,
expect(setLastCandidatesTokenCountSpy).toHaveBeenCalledWith(80);

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4b27a34. The fixture now uses consistent accounting (promptTokenCount: 200, candidatesTokenCount: 60, thoughtsTokenCount: 20, totalTokenCount: 280) and asserts the resumed seed receives output token count 80.

if (usage.totalTokenCount !== undefined) {
return Math.max(0, usage.totalTokenCount - usage.promptTokenCount);
}
return (usage.candidatesTokenCount ?? 0) + (usage.thoughtsTokenCount ?? 0);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This disjoint fallback lacks a Math.max(0, ...) wrapper, unlike the sibling totalTokenCount - promptTokenCount branch on line 94. The streaming path in geminiChat.ts:2559-2560 assigns the return value directly to this.lastCandidatesTokenCount (bypassing the Math.max(0, count) guard in setLastCandidatesTokenCount()), so a provider that returns negative candidatesTokenCount or thoughtsTokenCount would propagate an unclamped value.

Wrap for consistency:

Suggested change
return (usage.candidatesTokenCount ?? 0) + (usage.thoughtsTokenCount ?? 0);
return Math.max(0, (usage.candidatesTokenCount ?? 0) + (usage.thoughtsTokenCount ?? 0));

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4b27a34. The disjoint candidates+thoughts fallback is clamped with Math.max(0, ...), and tokenEstimation.test.ts now covers negative disjoint output counts.

wenshao
wenshao previously approved these changes May 26, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R6 review at 4b27a34 — all R5 inline comments addressed (10 stale on old commits, 6 resolved with author replies on current commit). CI green (24/24), 373 tests pass (4 files). No high-confidence findings.

Low-confidence observations (for human review):

  1. chatCompressionService.ts:364 hardcodes 0 for lastOutputTokenCount because GeminiChat.lastOutputTokenCount is private with no getter. The comment documents this and the common sendMessageStream path is safe (uses precomputedEffectiveTokens), but a future caller of compress() without precomputed tokens would silently under-count. Consider adding a getLastOutputTokenCount() getter.

  2. tokenEstimation.test.ts covers only 1 of 3 branches in getUsageOutputTokenCountForPromptEstimate directly (the negative-disjoint clamp). The undefined-usage → 0 and total-prompt branches are indirectly tested via getResumeTokenCounts integration tests but lack direct unit coverage.

  3. sessionService.test.ts has no test for getResumeTokenCounts returning undefined (empty conversation or no usable records). Callers guard this correctly but the contract isn't asserted.

Cross-file observation: contextCommand.ts:321 (not in this PR) has a stale TODO referencing the old estimatePromptTokens(history, undefined, 0, imageTokenEstimate) 4-arg signature. After this PR, imageTokenEstimate moved to position 5. Anyone implementing that TODO would silently misplace the argument.

— qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented May 27, 2026

Copy link
Copy Markdown
Collaborator

Local verification report (merge reference)

Verified head SHA 4b27a344e (pr-4525 branch) on macOS via tmux session, fresh npm install against the PR-tip lockfile.

Reviewer Test Plan — all PASS

# Command Result
1 npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "disjoint candidate and thought|successful compression|zero prompt|ignores previous response|output tokens" ✅ 8 passed / 148 skipped
2 npm run test --workspace=packages/core -- src/core/client.test.ts -t "seeds resumed chat" ✅ 2 passed / 150 skipped
3 npm run test --workspace=packages/core -- src/services/sessionService.test.ts -t "getResumePromptTokenCount" ✅ 4 passed / 50 skipped
4 npm run test --workspace=packages/core -- src/services/tokenEstimation.test.ts -t "getUsageOutputTokenCountForPromptEstimate|estimatePromptTokens" ✅ 5 passed / 6 skipped
5 npx prettier --check (9 changed files) ✅ All matched files use Prettier code style
6 npm run typecheck --workspace=packages/core (tsc --noEmit) ✅ Clean

Wider regression sweep (also PASS)

Suite Result
Full files of all 4 affected test suites (no -t filter) ✅ 373 / 373 passed (4 files)
packages/core/src/services/chatCompressionService.test.ts (touched in this PR) ✅ 69 / 69 passed

CI on the PR head: 24/24 checks green (Test on macOS / Ubuntu / Windows × Node 22, Lint, CodeQL, Classify PR).

Review summary

  • Logic spot-check matches the PR description: lastOutputTokenCount is added on the lastPromptTokenCount > 0 branch only; cleared on setLastPromptTokenCount (forks/subagents) and after successful compression; ?? (not ||) keeps the zero-prompt-count behavior pinned; seedResumeTokenCounts clamps non-finite/negative inputs; resume restores prompt + output anchors atomically from the same assistant usage record (getResumeTokenCounts); getUsageOutputTokenCountForPromptEstimate prefers total - prompt and falls back to candidates + thoughts only when total is absent (avoids double-counting reasoning tokens).
  • Only one in-tree caller of replayUiTelemetryFromConversation (packages/core/src/core/client.ts); the new ResumeTokenCounts | undefined return is consumed correctly with the seedResumeTokenCounts / setLastPromptTokenCount fallback split.
  • Earlier review rounds (R1–R6) already addressed all inline comments; R6 approved at 4b27a344e.

Verdict

LGTM — verified locally; no blockers. Cleared for merge from my side.

— wenshao

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体逻辑清晰,边界处理到位。发现一个参数错位的 bug,详见行内评论。

this.lastPromptTokenCount > 0 ? [] : this.getHistoryShallow(true),
userContent,
this.lastPromptTokenCount,
this.lastOutputTokenCount,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: estimatePromptTokens 的签名新增了第 4 个参数 lastOutputTokenCount(插在 imageTokenEstimate 之前),但同文件下方 localPromptTokensAfterCompression 的计算(原 L1603-1608)没有同步更新:

const localPromptTokensAfterCompression = shouldForceFromHard
  ? estimatePromptTokens(
      this.lastPromptTokenCount > 0 ? [] : this.getHistoryShallow(true),
      userContent,
      this.lastPromptTokenCount,
      imageTokenEstimate, // ← 现在被当作 lastOutputTokenCount
    )
  : 0;

imageTokenEstimate(通常 1600)被当成 lastOutputTokenCount 传入,而 imageTokenEstimate 则回退到默认值。恰好两个值目前相同所以不会出明显问题,但逻辑上是错的,用户自定义 imageTokenEstimate 时两个参数会同时出错。

建议加上 0

estimatePromptTokens(
    ...,
    this.lastPromptTokenCount,
    0,
    imageTokenEstimate,
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the current PR head (4b27a344e) and this appears to be stale against an older shape of the diff.

The localPromptTokensAfterCompression call site no longer exists (rg localPromptTokensAfterCompression returns no matches). The active geminiChat.ts call now passes both parameters in order:

estimatePromptTokens(
  this.lastPromptTokenCount > 0 ? [] : this.getHistoryShallow(true),
  userContent,
  this.lastPromptTokenCount,
  this.lastOutputTokenCount,
  imageTokenEstimate,
)

The compression-service fallback call site also uses the explicit unavailable-output-token value before the image estimate:

estimatePromptTokens(..., 0, slimmingConfig.imageTokenEstimate)

No code change made for this thread unless there is another current call site I missed.

}
if (usage.totalTokenCount !== undefined) {
return Math.max(0, usage.totalTokenCount - usage.promptTokenCount);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When totalTokenCount is unavailable, this sums candidatesTokenCount + thoughtsTokenCount assuming they are disjoint. Some OpenAI-compatible providers (e.g. certain DeepSeek-compatible endpoints) include thinking/reasoning tokens inside candidatesTokenCount, making them overlap. In that case this double-counts, inflating the prompt estimate by up to the full thoughtsTokenCount.

The totalTokenCount branch correctly avoids this, but this fallback branch does not. Over-estimating could trigger unnecessary hard-rescue compression one send earlier than needed.

Consider guarding against the overlap case, e.g.:

const candidates = usage.candidatesTokenCount ?? 0;
const thoughts = usage.thoughtsTokenCount ?? 0;
// If thoughts look like they are included in candidates, do not add them.
return Math.max(0, Math.max(candidates, candidates + thoughts));

or documenting the known over-count as acceptable conservatism.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in the current branch. When totalTokenCount is unavailable, getUsageOutputTokenCountForPromptEstimate() now treats thoughtsTokenCount as potentially overlapping only when candidatesTokenCount strictly dominates it, so DeepSeek-compatible shapes that include reasoning inside candidates avoid double-counting. Verified with npm run test --workspace=packages/core -- src/services/tokenEstimation.test.ts, Prettier, and ESLint.

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastOutputTokenCount 的引入和 ?? vs || 的修正都很好。留了一个 inline comment 关于 getUsageOutputTokenCountForPromptEstimatetotalTokenCount 缺失时假设 candidates 和 thoughts disjoint —— 部分 provider 会把 thinking tokens 包含在 candidates 里导致 double-count。实际影响是可能提前一轮触发 hard-rescue,不 block,供参考。

BZ-D
BZ-D previously approved these changes Jun 1, 2026
@wenshao

wenshao commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Follow-up to @BZ-D's param-mismatch finding — it's a merge-order trap, not a same-branch miss

@BZ-D is right that the post-compression estimatePromptTokens call ends up feeding imageTokenEstimate into the new lastOutputTokenCount slot, and the suggested fix is right. One clarification on where it lives, because it changes when and how to fix — and why none of our gates caught it.

That call does not exist in this branch yet. Head 4b27a344e is 47 commits behind main (merge-base e8b79d772). grep finds only the single estimatePromptTokens(...) call here — the effectiveTokens one, which is correctly updated. The localPromptTokensAfterCompression call BZ-D quoted was introduced on main by #4531 (7a31c80f0, "guard oversized resumed history sends"), which landed after this PR's merge-base. So there is nothing at "L1603-1608 in this file" to patch today — the buggy call site arrives on rebase.

Merge preview confirms it. git merge-tree origin/main <pr-head> merges cleanly, and the merged geminiChat.ts keeps the call in its old 4-arg form:

const localPromptTokensAfterCompression = shouldForceFromHard
  ? estimatePromptTokens(
      this.lastPromptTokenCount > 0 ? [] : this.getHistoryShallow(true),
      userContent,
      this.lastPromptTokenCount,
      imageTokenEstimate,   // ← lands on the new `lastOutputTokenCount` param
    )
  : 0;

This is silent on all three guards:

  • git — the signature change and the call site are in different regions of the file → no merge conflict.
  • tsc — both params are number → no type error.
  • CI — green here only because CI runs this 47-behind branch, which does not contain the call. The current green run does not exercise the merged code.

Impact — low-probability, fail-safe direction, but a real regression: shouldStopAfterHardRescue gates on localPromptTokensAfterCompression >= hardLimit. The mismatch inflates the post-compression estimate by imageTokenEstimate (default 1600), so a successful hard-rescue compression whose result lands in [hardLimit-1600, hardLimit) is misjudged as "still oversized" and the send is aborted. That's the exact path this PR set out to make more accurate.

Action items

  1. Rebase onto current main (now includes fix(core): guard oversized resumed history sends #4531).
  2. After rebase the localPromptTokensAfterCompression call appears — add the output-token arg. I'd suggest this.lastOutputTokenCount (consistent with the effectiveTokens call above). BZ-D's 0 is also safe, but on the NOOP path it re-introduces the very output-token under-count this PR fixes.
  3. Re-run CI on the rebased branch.

This is also the concrete instance of @longbinlai's cross-PR conflict warning (the #4531 overlap on estimatePromptTokens).

Dismissing my earlier R6 approval pending rebase + fix + a green CI run on the rebased branch, since the regression only surfaces post-merge. Good catch by @BZ-D either way.

@wenshao wenshao dismissed their stale review June 1, 2026 15:16

Dismissing pending rebase onto main (now includes #4531), the resulting estimatePromptTokens param fix at the localPromptTokensAfterCompression call site, and a green CI run on the rebased branch. This regression only surfaces post-merge — see my follow-up comment #4525 (comment) for the merge-tree evidence. Will re-review after rebase.

@Jerry2003826 Jerry2003826 force-pushed the Jiarui/fix-estimate-prompt-candidates-token-count branch 2 times, most recently from 47265fc to 01a805c Compare June 1, 2026 17:41

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Stale TODO in packages/cli/src/ui/commands/contextCommand.ts:329 references the old 4-arg estimatePromptTokens signature: estimatePromptTokens(history, undefined, 0, imageTokenEstimate). This PR inserts lastOutputTokenCount as the new 4th parameter, shifting imageTokenEstimate to 5th. A future implementor following this TODO would pass imageTokenEstimate as lastOutputTokenCount, producing a silently wrong estimate.

Suggested fix: update to estimatePromptTokens(history, undefined, 0, 0, imageTokenEstimate).

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
@@ -2632,11 +2669,13 @@ export class GeminiChat {
usageMetadata = chunk.usageMetadata;
// Context usage tracks prompt size; output isn't in history yet.
const lastPromptTokenCount =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] seedResumeTokenCounts (line 1375) correctly guards with Number.isFinite() + Math.max(0, ...), proving awareness of malformed-input risk. This streaming path assigns promptTokenCount ?? totalTokenCount directly with no finite/non-negative guard. A negative value (e.g. promptTokenCount: -1000) passes the truthy check, sets a negative counter, and causes estimatePromptTokens to under-count — silently disabling the compaction gate. getUsageOutputTokenCountForPromptEstimate also lacks a finite guard on its return value. Low probability with production providers but inconsistent with the resume path.

Suggested change
const lastPromptTokenCount =
const rawPromptCount =
usageMetadata.promptTokenCount ?? usageMetadata.totalTokenCount;
const lastPromptTokenCount =
Number.isFinite(rawPromptCount) && rawPromptCount > 0
? rawPromptCount
: 0;

And for the output count:

Suggested change
const lastPromptTokenCount =
const rawOutput =
getUsageOutputTokenCountForPromptEstimate(usageMetadata);
this.lastOutputTokenCount = Number.isFinite(rawOutput)
? rawOutput
: 0;

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-checked the current branch: this is already covered. The streaming path now uses hasUsablePromptTokenCount / hasUsableTotalTokenCount with typeof === number, Number.isFinite(...), and >= 0, then stores the coerced count. Invalid, negative, or non-finite usage no longer reaches lastPromptTokenCount from this path.

return Math.max(0, usage.totalTokenCount - usage.promptTokenCount);
}
const candidates = Math.max(0, usage.candidatesTokenCount ?? 0);
const thoughts = Math.max(0, usage.thoughtsTokenCount ?? 0);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] At equality (candidatesTokenCount === thoughtsTokenCount, both non-zero), this returns only candidates, assuming overlap. But equality does not imply overlap — a provider could report equal but disjoint counts. Since this estimate is trigger-only (false-positive acceptable, false-negative is not), using > (strictly greater) would be safer: equality falls through to the additive branch.

Suggested change
const thoughts = Math.max(0, usage.thoughtsTokenCount ?? 0);
return candidates > thoughts ? candidates : candidates + thoughts;

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in the current branch. The overlap heuristic now uses candidates > thoughts, so equality is treated as ambiguous/disjoint and conservatively adds both counts. Added adds equal candidate and thought counts because equality does not prove overlap. Verified with npm run test --workspace=packages/core -- src/services/tokenEstimation.test.ts, Prettier, and ESLint.

BZ-D
BZ-D previously approved these changes Jun 2, 2026

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the diff and ran the targeted token accounting and resume tests; no blocking issues found.

@Jerry2003826 Jerry2003826 force-pushed the Jiarui/fix-estimate-prompt-candidates-token-count branch from 01a805c to f0702b0 Compare June 2, 2026 19:57
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated in f0702b0 for the latest TODO-signature review. The stale comment in contextCommand.ts now shows the current estimatePromptTokens(history, undefined, 0, 0, imageTokenEstimate) call shape, so imageTokenEstimate is documented as the fifth argument rather than the new lastOutputTokenCount slot.

Validation:

  • npx prettier --check packages/cli/src/ui/commands/contextCommand.ts
  • npx eslint packages/cli/src/ui/commands/contextCommand.ts
  • git diff --check

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Follow-up: merged current origin/main in d476846 and resolved the geminiChat.ts usage-metadata conflict.

Resolution notes:

  • Kept upstream hostile-provider usage coercion.
  • Preserved this PR's previous-output token accounting.
  • Preserved the zero-prompt-count behavior: promptTokenCount: 0 does not reseed counters.
  • When promptTokenCount is missing or hostile and we fall back to totalTokenCount, previous-output seed stays 0 to avoid double-counting the same tokens on the next estimate.
  • Kept the latest review fix in contextCommand.ts: the TODO now documents estimatePromptTokens(history, undefined, 0, 0, imageTokenEstimate).

Validation run locally on Windows:

  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts -t "hostile|output tokens|zero prompt|hard-tier estimate|double-count|image token"
  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts
  • npm run test --workspace=packages/core -- src/core/client.test.ts -t "seeds resumed chat"
  • npm run test --workspace=packages/core -- src/services/sessionService.test.ts -t "getResumePromptTokenCount|getResumeTokenCounts"
  • npm run test --workspace=packages/core -- src/services/tokenEstimation.test.ts
  • npx prettier --check packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts packages/core/src/services/tokenEstimation.ts packages/core/src/services/tokenEstimation.test.ts packages/cli/src/ui/commands/contextCommand.ts
  • npx eslint packages/core/src/core/geminiChat.ts packages/core/src/core/geminiChat.test.ts packages/core/src/services/tokenEstimation.ts packages/core/src/services/tokenEstimation.test.ts packages/cli/src/ui/commands/contextCommand.ts
  • npm run typecheck --workspace=packages/core
  • git diff --check

Note: when I initially ran several core Vitest commands in parallel on Windows, two runs hit the known coverage/.tmp ENOENT race after their tests had passed. I reran the affected checks serially, and they passed.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] [typecheck] packages/cli/src/ui/components/InputPrompt.test.tsx:205 — After the merge from main, BackgroundTaskViewState gained two new fields (livePanelFocused: boolean and livePanelSelectedIndex: number) but the mock at line 205 was not updated. tsc --noEmit fails with TS2345:

Argument of type '{ entries: never[]; selectedIndex: number; dialogMode: "closed"; dialogOpen: false; pillFocused: false; }' is not assignable to parameter of type 'BackgroundTaskViewState'.

Add the missing fields to the mock:

    mockedUseBackgroundTaskViewState.mockReturnValue({
      entries: [],
      selectedIndex: 0,
      dialogMode: 'closed',
      dialogOpen: false,
      pillFocused: false,
      livePanelFocused: false,
      livePanelSelectedIndex: 0,
    });

— qwen3.7-max via Qwen Code /review

…rompt-candidates-token-count

# Conflicts:
#	packages/core/src/core/geminiChat.ts
@Jerry2003826 Jerry2003826 force-pushed the Jiarui/fix-estimate-prompt-candidates-token-count branch from d476846 to 8a7921b Compare June 4, 2026 08:48
@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Updated in 8a7921b95 for the latest InputPrompt.test.tsx typecheck review.

Change:

  • Added livePanelFocused: false and livePanelSelectedIndex: 0 to the BackgroundTaskViewState mock.

Validation on Windows:

  • npx prettier --check packages/cli/src/ui/components/InputPrompt.test.tsx
  • npx eslint packages/cli/src/ui/components/InputPrompt.test.tsx
  • npm run test --workspace=packages/cli -- src/ui/components/InputPrompt.test.tsx (149 passed, 2 skipped)
  • git diff --check

Note: I also retried npm run typecheck --workspace=packages/cli. The original mock error no longer appears. The local run is currently blocked by unrelated workspace build artifacts / Windows path handling around packages such as web-templates, acp-bridge, and channel packages, so I am relying on CI for the full workspace typecheck signal.

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

Addressed the equality case in getUsageOutputTokenCountForPromptEstimate in 4dce3f4b2.

Change:

  • Treat candidatesTokenCount === thoughtsTokenCount as ambiguous/disjoint and count both values.
  • Only assume possible overlap when candidatesTokenCount strictly dominates thoughtsTokenCount.
  • Added a regression test for the equal-count case.

Validation:

  • npm run test --workspace=packages/core -- src/services/tokenEstimation.test.ts
  • npm run test --workspace=packages/core -- src/core/geminiChat.test.ts
  • npm run lint --workspace=packages/core
  • npm run typecheck --workspace=packages/core
  • npx prettier --check packages/core/src/services/tokenEstimation.ts packages/core/src/services/tokenEstimation.test.ts

Note: one earlier parallel Vitest attempt hit the known Windows coverage/.tmp race; the serial rerun of the full geminiChat.test.ts passed.

@wenshao

wenshao commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Verification Report

Reviewer: wenshao
Environment: macOS Darwin 25.4.0, Node.js v22.17.0
Branch: Jiarui/fix-estimate-prompt-candidates-token-count @ 4dce3f4

Build

Check Result
TypeScript (Core tsc --noEmit) ✅ Clean (pre-existing sdk.ts / converter.ts errors excluded — not touched by PR)
TypeScript (CLI tsc --noEmit) ✅ Clean
ESLint (all 6 PR-changed source files) ✅ Clean

Tests

Test File Tests Result
geminiChat.test.ts 168 ✅ All passed
client.test.ts 143 ✅ All passed
tokenEstimation.test.ts 15 ✅ All passed
sessionService.test.ts 54 ✅ All passed
chatCompressionService.test.ts 55 ✅ All passed
compactionInputSlimming.test.ts 32 ✅ All passed
postCompactAttachments.test.ts 56 ✅ All passed
InputPrompt.test.tsx (CLI) 150 ✅ All passed (1 skipped)
Total 673 All passed

Code Review

tokenEstimation.ts

  • estimatePromptTokens now accepts lastOutputTokenCount (default 0) — added to the steady-state branch (lastPromptTokenCount > 0) so the estimate accounts for the previous model response appended to history after prompt count was reported
  • getUsageOutputTokenCountForPromptEstimate handles the totalTokenCount-available vs fallback paths, and conservatively handles the candidates/thoughts overlap for OpenAI-compat providers (candidates > thoughts ? candidates : candidates + thoughts)

geminiChat.ts

  • New lastOutputTokenCount field tracks per-chat output tokens from previous response
  • setLastPromptTokenCount resets lastOutputTokenCount to 0 (correct for cross-chat seeding like subagents)
  • seedResumeTokenCounts sets both atomically without clearing (correct for resume path)
  • coerceUsageCount defends against hostile provider values (NaN/Infinity/negative → 0) with debug logging
  • Hard-rescue post-compression guard: if compressed prompt still exceeds hard limit, rolls back in-memory history and throws — deferred JSONL recording prevents invalid checkpoint
  • deferChatCompressionRecord option allows caller to delay checkpoint write until post-compression guards pass

sessionService.ts

  • getResumeTokenCounts returns both promptTokenCount and outputTokenCount
  • replayUiTelemetryFromConversation now returns ResumeTokenCounts for the caller

client.ts

  • Resume path uses seedResumeTokenCounts (atomic) instead of setLastPromptTokenCount (which would clear output count)
  • Removed unused TEST_ONLY export

chatCompressionService.ts

  • estimatePromptTokens call sites updated with 5th arg lastOutputTokenCount=0 (correct — output count unavailable in this context)
  • Screenshot-overflow trigger + triggerReason plumbing on COMPRESSED result

Verdict

✅ LGTM — ready to merge. Fixes the prompt estimate under-counting by including response tokens. Defensive coercion of hostile provider values, atomic resume seeding, and hard-rescue rollback are all correctly implemented. All 673 affected tests pass.

…rompt-candidates-token-count

# Conflicts:
#	packages/core/src/core/geminiChat.test.ts

@DragonnZhang DragonnZhang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen Code Review — PR #4525

Verdict: Looks good. No new findings.

Summary

This PR adds lastOutputTokenCount to the prompt-candidate estimation, closing an accuracy gap where previous model-response output tokens were not counted in the next prompt's size estimate. The change is well-scoped, touches the right layers (shared helper, per-chat state, resume restoration, compression reset), and is backed by thorough test coverage.

What was reviewed

  • Deterministic checks: tsc and eslint clean (0 findings). 4 non-TS toolchains skipped (expected).
  • Semantic review: All 11 changed files read in full. Key design decisions verified:
    • getUsageOutputTokenCountForPromptEstimate correctly handles the totalTokenCount available/absent branches with appropriate clamping.
    • seedResumeTokenCounts validates inputs with Number.isFinite + Math.max(0, ...).
    • setLastPromptTokenCount correctly clears lastOutputTokenCount (documented in JSDoc).
    • Compression success resets lastOutputTokenCount to 0.
    • All call sites of estimatePromptTokens updated for the new 5-arg signature.
    • Subagent fork path (agent-core.ts:368) correctly uses setLastPromptTokenCount which clears inherited output token state.
    • chatCompressionService.ts passes 0 with an inline comment explaining the unavailability.

Prior review coverage

This PR has been extensively reviewed by @wenshao and @BZ-D over multiple rounds (R1–R6). All previously identified issues — double-counting via totalTokenCount fallback, hidden side-effect in setter, guard mismatch (|| vs ??), shared formula extraction, stale TODO signature, NaN propagation, resume call-order fragility, OpenAI reasoning-token overlap — have been addressed in the current commit. The remaining 12 stale inline comments are on old commits and already resolved.

— qwen-code via Qwen Code /review

@Jerry2003826

Copy link
Copy Markdown
Contributor Author

I re-checked the current branch. This is already covered in the latest diff: the streaming usage path no longer assigns promptTokenCount ?? totalTokenCount directly.

Current behavior:

  • hasUsablePromptTokenCount requires typeof === 'number', Number.isFinite(...), and >= 0.
  • hasUsableTotalTokenCount applies the same guard before using total tokens for output-token estimation.
  • The stored prompt count uses the guarded prompt count when usable, otherwise the coerced total count (coerceUsageCount(...) turns hostile values into 0).

So invalid / negative / non-finite prompt usage no longer reaches lastPromptTokenCount from the streaming path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/bug Something isn't working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

estimatePromptTokens: include previous turn's candidatesTokenCount in steady-state estimate

5 participants