Skip to content

feat(core)!: redesign auto-compaction thresholds with three-tier ladder#4168

Closed
LaZzyMan wants to merge 14 commits into
mainfrom
lazzy/trusting-grothendieck-8a8501
Closed

feat(core)!: redesign auto-compaction thresholds with three-tier ladder#4168
LaZzyMan wants to merge 14 commits into
mainfrom
lazzy/trusting-grothendieck-8a8501

Conversation

@LaZzyMan

@LaZzyMan LaZzyMan commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • What changed: Replaces qwen-code's single 70% proportional auto-compaction threshold with a three-tier ladder (warn / auto / hard) that combines proportional fallback with absolute reservation. Also disables thinking + caps maxOutputTokens on the compression sideQuery, upgrades failure handling from a one-shot lock to a 3-strike circuit breaker, adds a local token estimator for the cheap-gate, plumbs a hard-tier rescue into sendMessageStream, rewires the /context command and tipRegistry tips around the new thresholds, and removes the chatCompression.contextPercentageThreshold setting.
  • Why it changed: The old 70% formula reserved 30% of the window unconditionally — on a 1M model that's 300K wasted. Aligning with claude-code's absolute-reservation design recovers ~267K on 1M and ~44K on 256K models while keeping proportional behaviour for small windows. Bundled in: failure recovery (1-shot lock made transient errors permanent), first-send / --continue coverage (lastPromptTokenCount = 0 previously bypassed all gates), and predictable buffer math across providers (thinking budget semantics vary).
  • Reviewer focus:
    • packages/core/src/services/chatCompressionService.ts — new computeThresholds(), tier constants, cheap-gate
    • packages/core/src/services/tokenEstimation.ts — local char/4 estimator
    • packages/core/src/core/geminiChat.ts — hard-tier rescue + consecutiveFailures breaker
    • packages/cli/src/ui/commands/contextCommand.ts/context display

Validation

  • Commands run:
    ```bash
    npm run typecheck # clean (4 workspaces)
    npm run lint # clean (project files; pre-existing e2e-testing/scripts/*.js untouched)
    cd packages/core && npx vitest run src/services src/core # 1930/1930 pass
    cd packages/cli && npx vitest run # 5995/5995 pass + 9 skipped
    ```
  • Quickest reviewer verification path:
    1. `computeThresholds(window)` is pure — read it and the 6 unit-test cases in `chatCompressionService.test.ts` (32K / 64K / 128K / 200K / 1M / 10K-extreme) to confirm the math.
    2. The cheap-gate / hard-rescue wiring is covered by the new describe blocks in `chatCompressionService.test.ts` ("cheap-gate uses computeThresholds.auto", "computeThresholds") and `geminiChat.test.ts` ("compression failure circuit breaker", "sendMessageStream hard-tier rescue", "sendMessageStream first-turn estimation").
    3. /context output covered by new `contextCommand.test.ts` tests (warn / auto / hard / safe tier classification + estimated-path fallback).
  • Evidence:
    • Threshold table across windows (matches design doc):

      window warn auto hard dominant
      32K 19.2K (pct) 22.4K (pct) 22.4K (退化) proportional
      128K 76.8K (pct) 95K (abs) 105K (abs) mixed
      200K 147K (abs) 167K (abs) 177K (abs) absolute
      1M 947K (abs) 967K (abs) 977K (abs) absolute
  • Not validated:
    • No live model run. All tests are unit/integration with mocked sideQuery; the new `maxOutputTokens=20K` cap and `includeThoughts=false` settings on the compression call were verified to be passed correctly but not yet exercised against a real provider. Worth running a few real auto-compactions on Dashscope / Anthropic / Gemini before merge to confirm summary quality holds.

Scope / Risk

  • Main risk: Tighter buffer on large windows. On a 1M model auto used to trigger at 700K and now triggers at 967K; that's intentional (recovers 267K of usable context) but compression has less headroom. The `maxOutputTokens=20K` cap mitigates output-side blowups; reactive overflow still latches as a safety net.
  • Breaking changes / migration notes:
    • `chatCompression.contextPercentageThreshold` setting removed. Settings files containing it log a one-line stderr deprecation warning at startup and the value is ignored. The proportional floor (`DEFAULT_PCT = 0.7`) is now an internal constant.
    • Public API surface expansion: `@qwen-code/qwen-code-core` now re-exports `computeThresholds` and `CompactionThresholds` (consumed by the CLI `/context` command and tip registry).
  • Not covered / not validated:
    • `packages/cli/src/ui/components/views/ContextUsage.tsx` still consumes `autocompactBuffer` (redefined as `contextWindowSize - thresholds.auto` for compat). Refactoring it to render the three-tier ladder directly is a recommended follow-up.
    • Telemetry-based calibration of the 20K `SUMMARY_RESERVE` against qwen workloads is not in scope; claude-code's p99.99 of 17K is the basis. Worth observing once shipped.
    • The `COMPACT_MAX_OUTPUT_TOKENS = 20K` cap could clip very long compaction summaries — a `finish_reason === MAX_TOKENS` NOOP guard in `compress()` would be a sensible follow-up.

Testing Matrix

🍏 🪟 🐧
npm run ⚠️ ⚠️
npx ⚠️ ⚠️ ⚠️
Docker ⚠️ ⚠️ ⚠️
Podman ⚠️ N/A N/A
Seatbelt ⚠️ N/A N/A

Testing matrix notes:

  • Implementation and verification ran on macOS (Apple Silicon). All other rows are unit-test coverage only — no platform-specific code was touched (this PR is pure TypeScript logic + tests), so cross-platform risk is low. Windows / Linux validation welcome before merge.

Design references

  • Design doc: `docs/design/auto-compaction-threshold-redesign.md`
  • Implementation plan: `docs/plans/2026-05-14-auto-compaction-threshold-redesign.md`

Both are committed in this PR so the rationale is visible alongside the code.

@github-actions

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR redesigns qwen-code's auto-compaction threshold system from a single 70% proportional threshold to a three-tier ladder (warn/auto/hard) combining proportional fallback with absolute reservation. The implementation aligns with claude-code's design, recovers significant wasted context on large windows (~267K on 1M models), and bundles several related improvements: 3-strike failure circuit breaker, local token estimation for accurate threshold gating, and predictable output budget control via maxOutputTokens cap with thinking disabled.

🔍 General Feedback

  • Excellent design documentation: The design doc (docs/design/auto-compaction-threshold-redesign.md) provides clear rationale, mathematical formulas, empirical data tables, and implementation phases. This makes the review significantly easier.
  • Well-structured implementation: The code changes follow the design spec closely, with constants co-located, pure functions for threshold computation, and clear separation of concerns.
  • Comprehensive test coverage: New unit tests cover threshold computation across window sizes (32K/64K/128K/200K/1M/extreme), token estimation functions, cheap-gate behavior, and circuit breaker logic.
  • Thoughtful risk acknowledgment: The PR body explicitly identifies the main risk (tighter buffer on large windows), breaking changes, and what wasn't validated (live model runs).
  • Cross-file consistency: Constants and formulas are consistent across chatCompressionService.ts, geminiChat.ts, contextCommand.ts, and tipRegistry.ts.

🎯 Specific Feedback

🟡 High Priority Issues

  • File: packages/core/src/services/chatCompressionService.ts:374-380 - The runSideQuery call sets maxOutputTokens: COMPACT_MAX_OUTPUT_TOKENS and includeThoughts: false, but there's no guard for when finish_reason === 'MAX_TOKENS'. If the summary gets clipped at 20K, the code may persist a truncated summary. The design doc (risk Where is the config saved? #2) acknowledges this and suggests a follow-up, but given this is a fundamental change to compression behavior, consider adding at least a defensive check now:

    // After runSideQuery, check if summary was truncated
    if (summaryResult.usage?.candidatesTokenCount === COMPACT_MAX_OUTPUT_TOKENS) {
      // Log warning or treat as NOOP to avoid persisting truncated summary
      config.getDebugLogger().warn('Compression summary hit maxOutputTokens limit');
    }
  • File: packages/core/src/services/tokenEstimation.ts:34-40 - The estimateContentTokens function uses DEFAULT_IMAGE_TOKEN_ESTIMATE as a fallback when called without precomputed counts. The comment notes this is "a test-friendly default" but production callers "MUST pass precomputedCharCounts". This is a footgun — consider making the parameter required or using a clearer default that won't silently produce wrong estimates if a future caller forgets to pass it.

🟢 Medium Priority Issues

  • File: packages/core/src/services/chatCompressionService.ts:67-75 - The TOOL_ROUND_RETAIN_COUNT constant is exported but only used internally by findCompressSplitPoint and splitPointRetainingTrailingPairs. Unless there's a planned external consumer, this should be private to avoid polluting the public API surface.

  • File: packages/cli/src/ui/commands/contextCommand.ts:177-183 - The code still references config.getChatCompression()?.contextPercentageThreshold for backward compatibility, but this PR removes that field. The deprecation warning logic in config.ts should handle this, but the context command should migrate to using computeThresholds() directly for consistency with the new design.

  • File: packages/core/src/core/geminiChat.ts - The PR description mentions consecutiveFailures counter and hard-tier rescue wiring in sendMessageStream, but the diff shows these changes are in a file that already exists in the repo. Need to verify the consecutiveFailures state is properly initialized and reset across session boundaries (e.g., when chat is restored from disk via --continue).

🔵 Low Priority Suggestions

  • File: packages/core/src/services/chatCompressionService.ts:89-105 - The computeThresholds function is excellent and well-documented. Consider adding a JSDoc @example showing the threshold table from the design doc (32K/128K/200K/1M rows) so future maintainers can quickly verify the formula behavior without consulting external docs.

  • File: packages/core/src/services/tokenEstimation.ts - The BYTES_PER_TOKEN = 4 constant is used throughout, but BYTES_PER_TOKEN_JSON = 2 (mentioned in design doc) isn't actually used anywhere in the implementation. Either remove it from the code or add a comment explaining where it would apply (currently only estimateContentChars in compactionInputSlimming.ts uses it for functionCall/functionResponse).

  • File: packages/cli/src/services/tips/tipRegistry.ts - The PR description mentions rewriting three context-* tips to follow the new thresholds, but this file isn't in the changed files list. Verify the tip registry changes are included before merge.

  • File: packages/core/src/index.ts - The PR mentions exporting computeThresholds and CompactionThresholds from @qwen-code/qwen-code-core for CLI consumption. Ensure these exports are added to the public index to avoid breaking the CLI's import path.

✅ Highlights

  • Mathematically sound threshold design: The computeThresholds function elegantly handles both small windows (proportional fallback) and large windows (absolute reservation) via max() formulas. The test cases verify the ladder always satisfies warn <= auto <= hard.

  • Circuit breaker upgrade: Moving from "1 failure = permanent lock" to "3 consecutive failures = temporary熔断" is a significant reliability improvement. The implementation correctly excludes force=true calls from the counter.

  • Token estimation closes critical gaps: The estimatePromptTokens function properly handles both the "lagging by one turn" and "first-turn zero" issues that previously caused threshold bypass scenarios.

  • Cross-provider consistency: Disabling thinking and capping maxOutputTokens addresses the inconsistent semantics across Anthropic/OpenAI/Gemini providers, making buffer predictions reliable.

  • Comprehensive validation: All 1930 core tests and 5995 CLI tests pass, plus typecheck and lint are clean. The test matrix in the PR body is transparent about platform coverage.

@LaZzyMan LaZzyMan force-pushed the lazzy/trusting-grothendieck-8a8501 branch from 1dcef8c to d270af0 Compare May 15, 2026 08:09
LaZzyMan added a commit that referenced this pull request May 15, 2026
Adds a defensive guard in ChatCompressionService.compress() that detects
when the side-query summary hit COMPACT_MAX_OUTPUT_TOKENS (20K). In that
case the summary is likely truncated mid-content, so we drop it and
return NOOP rather than persist a half-summary. The next send re-tries;
reactive overflow still catches the catastrophic case where the API
rejects the next request as too large.

Documented in the design doc as risk #2; the bot reviewer on PR #4168
correctly pushed for it to land alongside the threshold redesign rather
than as a follow-up since the new 20K cap is what makes truncation
likely in the first place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LaZzyMan

Copy link
Copy Markdown
Collaborator Author

Review response — commit 6ce81e73c

Thanks for the thorough pass. Triage results per finding:

# Outcome Notes
🟡 H1 — MAX_TOKENS guard Fixed Added a defensive check in compress() that NOOPs when compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS so a truncated summary isn't persisted. Includes a unit test asserting the path. Commit 6ce81e73c.
🟡 H2 — estimateContentTokens footgun ❌ Declined (false-positive) The "MUST pass precomputedCharCounts" warning the comment references is on findCompressSplitPoint, not estimateContentTokens. The latter's imageTokenEstimate parameter has a benign default (DEFAULT_IMAGE_TOKEN_ESTIMATE = 1600) that matches the splitter's default, keeping the two estimators in sync. Different functions, different contracts.
🟢 M1 — Hide TOOL_ROUND_RETAIN_COUNT ❌ Declined (out of PR scope) TOOL_ROUND_RETAIN_COUNT was already exported in chatCompressionService.ts before this PR. Reducing export visibility is a separate cleanup that I don't want to bundle into a threshold-redesign change.
🟢 M2 — contextCommand.ts:177-183 still references the deprecated field ❌ Declined (stale read) This was actually rewritten by Task 11 of the redesign — contextCommand.ts now imports computeThresholds (line 28) and uses computeThresholds(contextWindowSize) (line 190). Grep contextPercentageThreshold in packages/cli/src/ui/commands/contextCommand.ts is empty.
🟢 M3 — consecutiveFailures across --continue ❌ Declined (works as intended) consecutiveFailures is a private field on GeminiChat initialized to 0. --continue constructs a fresh GeminiChat (history is restored separately), so the counter naturally resets — which is the correct semantics (a restarted session should get a fresh 3-strike budget rather than inheriting a latched breaker from a previous run).
🔵 L1 — JSDoc @example with threshold table ❌ Declined (filter 3) The same table lives in docs/design/auto-compaction-threshold-redesign.md (committed in this PR). Duplicating it in JSDoc creates two sources of truth that can drift independently when the constants are tuned.
🔵 L2 — Missing BYTES_PER_TOKEN_JSON = 2 ❌ Declined (not in code) BYTES_PER_TOKEN_JSON doesn't exist in tokenEstimation.ts. The design doc only mentions it as a future possibility for json-dense content; the implementation deliberately uses a single BYTES_PER_TOKEN = 4 ratio (matching claude-code's approach).
🔵 L3 / L4 — Verify files in PR N/A Both packages/cli/src/services/tips/tipRegistry.ts and packages/core/src/index.ts are in this PR (commit 28eb867a8); please re-check the changed-files view.

Net: 1 fix accepted, 5 declined, 2 hallucinations dismissed. Force-pushed earlier (rebase onto main + the consecutiveFailures test fixup d270af030); this comment lands on top of the new merge-conflict-free branch tip 6ce81e73c.

🤖 Drafted with Claude Code using the review-response skill.

@github-actions

github-actions Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Code Coverage Summary

Package Lines Statements Functions Branches
CLI 77.23% 77.23% 79.89% 79.84%
Core 79.59% 79.59% 82.19% 82.85%
CLI Package - Full Text Report
-------------------|---------|----------|---------|---------|-------------------
File               | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s 
-------------------|---------|----------|---------|---------|-------------------
All files          |   77.23 |    79.84 |   79.89 |   77.23 |                   
 src               |    75.9 |    69.11 |   80.55 |    75.9 |                   
  gemini.tsx       |   68.53 |     66.4 |   76.47 |   68.53 | ...29,946-949,957 
  ...ractiveCli.ts |   80.23 |     68.3 |   78.57 |   80.23 | ...1054,1092,1195 
  ...liCommands.ts |   74.51 |    73.17 |     100 |   74.51 | ...41-265,290,391 
  ...ActiveAuth.ts |     100 |     87.5 |     100 |     100 | 66-80             
 ...cp-integration |   61.97 |    65.24 |   78.12 |   61.97 |                   
  acpAgent.ts      |   63.32 |    65.35 |   83.05 |   63.32 | ...2112,2126-2134 
  authMethods.ts   |   12.19 |      100 |       0 |   12.19 | 11-31,34-38,41-50 
  errorCodes.ts    |       0 |        0 |       0 |       0 | 1-22              
  ...DirContext.ts |     100 |      100 |     100 |     100 |                   
 ...ration/service |   68.65 |    83.33 |   66.66 |   68.65 |                   
  filesystem.ts    |   68.65 |    83.33 |   66.66 |   68.65 | ...32,77-94,97-98 
 ...ration/session |   75.88 |    72.05 |   86.25 |   75.88 |                   
  ...ryReplayer.ts |   67.34 |     75.6 |   81.81 |   67.34 | ...54-269,282-283 
  Session.ts       |   74.93 |    70.81 |   88.46 |   74.93 | ...2658,2664-2667 
  ...entTracker.ts |   90.85 |    84.84 |      90 |   90.85 | ...35,199,251-260 
  index.ts         |       0 |        0 |       0 |       0 | 1-40              
  ...ssionUtils.ts |   84.21 |    77.77 |     100 |   84.21 | ...37-153,209-211 
  types.ts         |       0 |        0 |       0 |       0 | 1                 
 ...ssion/emitters |   96.01 |    90.75 |    92.3 |   96.01 |                   
  BaseEmitter.ts   |   76.92 |    66.66 |      80 |   76.92 | 23-24,39-40,55-56 
  ...ageEmitter.ts |     100 |    89.47 |     100 |     100 | 109,111           
  PlanEmitter.ts   |     100 |      100 |     100 |     100 |                   
  ...allEmitter.ts |   98.06 |     92.3 |     100 |   98.06 | 227-228,327,335   
  index.ts         |       0 |        0 |       0 |       0 | 1-10              
 ...ession/rewrite |   90.36 |    87.83 |   94.11 |   90.36 |                   
  LlmRewriter.ts   |      81 |       84 |     100 |      81 | ...,88-89,155-159 
  ...Middleware.ts |   95.83 |    85.71 |     100 |   95.83 | 119,127-129       
  TurnBuffer.ts    |     100 |      100 |     100 |     100 |                   
  config.ts        |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  types.ts         |       0 |        0 |       0 |       0 | 1                 
 src/auth          |    97.7 |    94.81 |   95.45 |    97.7 |                   
  allProviders.ts  |     100 |      100 |     100 |     100 |                   
  ...iderConfig.ts |    97.6 |    95.04 |     100 |    97.6 | ...61,411,433-434 
  types.ts         |       0 |        0 |       0 |       0 | 1                 
 src/auth/install  |   98.57 |    88.88 |     100 |   98.57 |                   
  ...nstallPlan.ts |   98.57 |    88.88 |     100 |   98.57 | 80,93             
 ...viders/alibaba |   96.96 |    66.66 |   66.66 |   96.96 |                   
  ...baStandard.ts |     100 |      100 |     100 |     100 |                   
  codingPlan.ts    |   93.67 |    66.66 |   66.66 |   93.67 | 83,87-89,94       
  tokenPlan.ts     |     100 |      100 |     100 |     100 |                   
 ...oviders/custom |     100 |      100 |     100 |     100 |                   
  ...omProvider.ts |     100 |      100 |     100 |     100 |                   
 ...roviders/oauth |    91.5 |    77.03 |   97.05 |    91.5 |                   
  openrouter.ts    |   84.37 |    33.33 |     100 |   84.37 | 43-48             
  ...outerOAuth.ts |    91.9 |    79.06 |   96.87 |    91.9 | ...53-655,699-701 
 ...ers/thirdParty |     100 |      100 |     100 |     100 |                   
  deepseek.ts      |     100 |      100 |     100 |     100 |                   
  idealab.ts       |     100 |      100 |     100 |     100 |                   
  minimax.ts       |     100 |      100 |     100 |     100 |                   
  modelscope.ts    |     100 |      100 |     100 |     100 |                   
  zai.ts           |     100 |      100 |     100 |     100 |                   
 src/commands      |   47.93 |    85.71 |   43.47 |   47.93 |                   
  auth.ts          |     100 |    83.33 |     100 |     100 | 11,14             
  channel.ts       |   56.66 |      100 |       0 |   56.66 | 15-19,27-34       
  extensions.tsx   |   96.55 |      100 |      50 |   96.55 | 37                
  hooks.tsx        |   66.66 |      100 |       0 |   66.66 | 20-24             
  mcp.ts           |   94.73 |      100 |      50 |   94.73 | 28                
  review.ts        |   51.85 |      100 |       0 |   51.85 | 24-35,38          
  serve.ts         |    7.74 |      100 |       0 |    7.74 | ...51-147,149-230 
 ...mmands/channel |   39.25 |    79.45 |      50 |   39.25 |                   
  ...l-registry.ts |    8.57 |      100 |       0 |    8.57 | 6-21,24-42        
  config-utils.ts  |      92 |      100 |   66.66 |      92 | 21-26             
  configure.ts     |    14.7 |      100 |       0 |    14.7 | 18-21,23-84       
  pairing.ts       |   26.31 |      100 |       0 |   26.31 | ...30,40-50,52-65 
  pidfile.ts       |   96.34 |    86.95 |     100 |   96.34 | 49,59,91          
  start.ts         |   30.98 |       52 |   69.23 |   30.98 | ...72-475,484-486 
  status.ts        |   17.85 |      100 |       0 |   17.85 | 15-26,32-76       
  stop.ts          |      20 |      100 |       0 |      20 | 14-48             
 ...nds/extensions |    84.5 |    88.95 |   81.81 |    84.5 |                   
  consent.ts       |   71.65 |    89.28 |   42.85 |   71.65 | ...85-141,156-162 
  disable.ts       |     100 |      100 |     100 |     100 |                   
  enable.ts        |     100 |      100 |     100 |     100 |                   
  install.ts       |    75.6 |    66.66 |   66.66 |    75.6 | ...39-142,145-153 
  link.ts          |     100 |      100 |     100 |     100 |                   
  list.ts          |     100 |      100 |     100 |     100 |                   
  new.ts           |     100 |      100 |     100 |     100 |                   
  settings.ts      |   99.15 |      100 |   83.33 |   99.15 | 151               
  uninstall.ts     |    37.5 |      100 |   33.33 |    37.5 | 23-45,57-64,67-70 
  update.ts        |   96.32 |      100 |     100 |   96.32 | 101-105           
  utils.ts         |   60.24 |    28.57 |     100 |   60.24 | ...81,83-87,89-93 
 ...les/mcp-server |       0 |        0 |       0 |       0 |                   
  example.ts       |       0 |        0 |       0 |       0 | 1-60              
 src/commands/mcp  |   92.29 |    86.08 |   88.88 |   92.29 |                   
  add.ts           |     100 |    98.03 |     100 |     100 | 293               
  list.ts          |   91.22 |    80.76 |      80 |   91.22 | ...19-121,146-147 
  reconnect.ts     |   76.72 |    71.42 |   85.71 |   76.72 | 35-48,153-175     
  remove.ts        |     100 |       80 |     100 |     100 | 21-25             
 ...ommands/review |   11.57 |      100 |       0 |   11.57 |                   
  cleanup.ts       |   17.94 |      100 |       0 |   17.94 | ...01-106,108-109 
  deterministic.ts |   13.75 |      100 |       0 |   13.75 | ...22-738,740-741 
  fetch-pr.ts      |   11.36 |      100 |       0 |   11.36 | ...80-201,203-204 
  load-rules.ts    |   11.32 |      100 |       0 |   11.32 | ...41-153,155-156 
  pr-context.ts    |    6.22 |      100 |       0 |    6.22 | ...97-312,314-315 
  presubmit.ts     |    9.35 |      100 |       0 |    9.35 | ...62-287,289-290 
 ...nds/review/lib |      30 |      100 |       0 |      30 |                   
  gh.ts            |   22.58 |      100 |       0 |   22.58 | ...49,53-54,62-69 
  git.ts           |   22.72 |      100 |       0 |   22.72 | 15-18,29-39,43-44 
  paths.ts         |   52.94 |      100 |       0 |   52.94 | ...26,37-38,42-43 
 src/config        |   92.79 |    84.88 |   88.09 |   92.79 |                   
  auth.ts          |   86.98 |    80.32 |     100 |   86.98 | ...26-227,243-244 
  config.ts        |   87.96 |    84.36 |      80 |   87.96 | ...1856,1858-1866 
  keyBindings.ts   |   96.55 |       50 |     100 |   96.55 | 193-196           
  ...idersScope.ts |      92 |       90 |     100 |      92 | 11-12             
  sandboxConfig.ts |   61.64 |    71.87 |   66.66 |   61.64 | ...54-68,73,77-89 
  settings.ts      |   85.76 |    87.25 |   89.18 |   85.76 | ...1148,1153-1156 
  ...ingsSchema.ts |     100 |      100 |     100 |     100 |                   
  ...tedFolders.ts |   96.22 |       94 |     100 |   96.22 | ...88-190,205-206 
 ...nfig/migration |   94.89 |    78.94 |   83.33 |   94.89 |                   
  index.ts         |   94.87 |    88.88 |     100 |   94.87 | 91-92             
  scheduler.ts     |   96.55 |    77.77 |     100 |   96.55 | 19-20             
  types.ts         |       0 |        0 |       0 |       0 | 1                 
 ...ation/versions |   94.74 |       96 |     100 |   94.74 |                   
  ...-v2-shared.ts |     100 |      100 |     100 |     100 |                   
  v1-to-v2.ts      |   81.75 |    90.19 |     100 |   81.75 | ...28-229,231-247 
  v2-to-v3.ts      |     100 |      100 |     100 |     100 |                   
  v3-to-v4.ts      |     100 |      100 |     100 |     100 |                   
 src/core          |     100 |      100 |     100 |     100 |                   
  auth.ts          |     100 |      100 |     100 |     100 |                   
  initializer.ts   |     100 |      100 |     100 |     100 |                   
  theme.ts         |     100 |      100 |     100 |     100 |                   
 src/dualOutput    |   63.09 |    64.51 |   55.55 |   63.09 |                   
  ...tputBridge.ts |   62.94 |    65.51 |   56.25 |   62.94 | ...22-323,331-334 
  ...utContext.tsx |     100 |      100 |     100 |     100 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-8               
 src/export        |       0 |        0 |       0 |       0 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-7               
 src/generated     |     100 |      100 |     100 |     100 |                   
  git-commit.ts    |     100 |      100 |     100 |     100 |                   
 src/i18n          |   81.47 |    75.94 |   65.71 |   81.47 |                   
  index.ts         |   63.68 |    69.56 |   53.84 |   63.68 | ...70-271,281-286 
  languages.ts     |   96.92 |    86.66 |     100 |   96.92 | 134-135,167,184   
  ...nslateKeys.ts |     100 |      100 |     100 |     100 |                   
  ...lationDict.ts |   93.33 |    66.66 |     100 |   93.33 | 15                
 src/i18n/locales  |     100 |      100 |     100 |     100 |                   
  ca.js            |     100 |      100 |     100 |     100 |                   
  de.js            |     100 |      100 |     100 |     100 |                   
  en.js            |     100 |      100 |     100 |     100 |                   
  fr.js            |     100 |      100 |     100 |     100 |                   
  ja.js            |     100 |      100 |     100 |     100 |                   
  pt.js            |     100 |      100 |     100 |     100 |                   
  ru.js            |     100 |      100 |     100 |     100 |                   
  zh-TW.js         |     100 |      100 |     100 |     100 |                   
  zh.js            |     100 |      100 |     100 |     100 |                   
 ...nonInteractive |   72.57 |    71.12 |   74.07 |   72.57 |                   
  session.ts       |   76.64 |     69.4 |   85.71 |   76.64 | ...23-824,833-843 
  types.ts         |    42.5 |      100 |   33.33 |    42.5 | ...80-581,584-585 
 ...active/control |   77.04 |    88.23 |      80 |   77.04 |                   
  ...rolContext.ts |    7.14 |        0 |       0 |    7.14 | 49-84             
  ...Dispatcher.ts |   91.66 |    91.83 |   88.88 |   91.66 | ...54-372,388,391 
  ...rolService.ts |       8 |        0 |       0 |       8 | 46-179            
 ...ol/controllers |    7.03 |       80 |   13.33 |    7.03 |                   
  ...Controller.ts |   19.32 |      100 |      60 |   19.32 | 81-118,127-210    
  ...Controller.ts |       0 |        0 |       0 |       0 | 1-56              
  ...Controller.ts |    3.94 |      100 |   11.11 |    3.94 | ...63-381,391-496 
  ...Controller.ts |   14.06 |      100 |       0 |   14.06 | ...82-117,130-133 
  ...Controller.ts |    5.21 |      100 |       0 |    5.21 | ...21-433,442-471 
 .../control/types |       0 |        0 |       0 |       0 |                   
  serviceAPIs.ts   |       0 |        0 |       0 |       0 | 1                 
 ...Interactive/io |   97.98 |     93.7 |   95.18 |   97.98 |                   
  ...putAdapter.ts |   97.89 |    92.82 |   98.07 |   97.89 | ...1303,1398-1399 
  ...putAdapter.ts |      96 |     90.9 |   85.71 |      96 | 51-52             
  ...nputReader.ts |     100 |    94.73 |     100 |     100 | 67                
  ...putAdapter.ts |   98.28 |      100 |      90 |   98.28 | 81-82,122-123     
  index.ts         |     100 |      100 |     100 |     100 |                   
 src/patches       |       0 |        0 |       0 |       0 |                   
  is-in-ci.ts      |       0 |        0 |       0 |       0 | 1-17              
 src/remoteInput   |   86.98 |       75 |   85.71 |   86.98 |                   
  ...utContext.tsx |     100 |      100 |     100 |     100 |                   
  ...putWatcher.ts |   88.12 |    76.08 |   91.66 |   88.12 | ...21-222,233-236 
  index.ts         |       0 |        0 |       0 |       0 | 1-8               
 src/serve         |    79.3 |     78.8 |   92.85 |    79.3 |                   
  auth.ts          |   88.49 |    88.63 |     100 |   88.49 | ...49-150,153-155 
  capabilities.ts  |     100 |     90.9 |     100 |     100 | 264               
  ...usProvider.ts |   67.01 |    51.42 |     100 |   67.01 | ...40-245,278-286 
  debugMode.ts     |     100 |      100 |     100 |     100 |                   
  demo.ts          |     100 |      100 |     100 |     100 |                   
  envSnapshot.ts   |    92.3 |       84 |     100 |    92.3 | 108-111,170-177   
  eventBus.ts      |     100 |      100 |     100 |     100 |                   
  httpAcpBridge.ts |   79.62 |    78.84 |   96.38 |   79.62 | ...4246,4277-4318 
  ...oryChannel.ts |     100 |      100 |     100 |     100 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-106             
  loopbackBinds.ts |     100 |      100 |     100 |     100 |                   
  runQwenServe.ts  |   73.98 |    87.83 |   55.55 |   73.98 | ...94-710,735-737 
  server.ts        |   86.18 |    82.94 |   90.62 |   86.18 | ...2478,2543-2552 
  status.ts        |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
  ...paceAgents.ts |   64.87 |    70.45 |    90.9 |   64.87 | ...1306,1316-1326 
  ...paceMemory.ts |   87.13 |    78.46 |     100 |   87.13 | ...54-361,421-428 
 src/serve/auth    |   86.54 |    78.75 |   93.75 |   86.54 |                   
  deviceFlow.ts    |   96.33 |    79.51 |    97.5 |   96.33 | ...1526,1630,1700 
  ...owProvider.ts |   45.23 |    74.07 |      75 |   45.23 | ...90-359,375,379 
 src/serve/fs      |   84.85 |    79.75 |     100 |   84.85 |                   
  audit.ts         |     100 |    96.15 |     100 |     100 | 201               
  errors.ts        |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  paths.ts         |   77.82 |    77.08 |     100 |   77.82 | ...64,493-497,510 
  policy.ts        |   90.32 |    89.18 |     100 |   90.32 | 142-150           
  ...FileSystem.ts |   83.55 |    76.22 |     100 |   83.55 | ...1859,1886-1887 
 src/serve/routes  |   89.41 |       70 |     100 |   89.41 |                   
  ...ceFileRead.ts |   94.41 |    76.92 |     100 |   94.41 | ...28-329,390-392 
  ...eFileWrite.ts |    82.1 |    60.52 |     100 |    82.1 | ...42-244,247-249 
 src/services      |   91.67 |    91.21 |   97.56 |   91.67 |                   
  ...mandLoader.ts |     100 |    93.75 |     100 |     100 | 93                
  ...killLoader.ts |     100 |    96.15 |     100 |     100 | 47                
  ...andService.ts |    98.7 |      100 |     100 |    98.7 | 107               
  ...mandLoader.ts |   86.83 |    83.87 |     100 |   86.83 | ...30-335,340-345 
  ...omptLoader.ts |   75.84 |    80.64 |   83.33 |   75.84 | ...10-211,277-278 
  ...mandLoader.ts |     100 |      100 |     100 |     100 |                   
  ...nd-factory.ts |   91.42 |    91.66 |     100 |   91.42 | 128,137-144       
  ...ation-tool.ts |     100 |    95.45 |     100 |     100 | 125               
  ...ndMetadata.ts |   98.21 |    96.66 |     100 |   98.21 | 83,87             
  commandUtils.ts  |      96 |     90.9 |     100 |      96 | 48                
  ...and-parser.ts |   90.69 |    85.71 |     100 |   90.69 | 63-66             
  ...ionService.ts |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...ght/generators |    85.9 |    85.61 |   90.47 |    85.9 |                   
  DataProcessor.ts |   85.63 |     85.6 |   92.85 |   85.63 | ...1122,1126-1133 
  ...tGenerator.ts |   98.21 |    85.71 |     100 |   98.21 | 46                
  ...teRenderer.ts |   45.45 |      100 |       0 |   45.45 | 13-51             
 .../insight/types |       0 |       50 |      50 |       0 |                   
  ...sightTypes.ts |       0 |        0 |       0 |       0 |                   
  ...sightTypes.ts |       0 |        0 |       0 |       0 | 1                 
 ...mpt-processors |   97.27 |    94.04 |     100 |   97.27 |                   
  ...tProcessor.ts |     100 |      100 |     100 |     100 |                   
  ...eProcessor.ts |   94.52 |    84.21 |     100 |   94.52 | 46-47,93-94       
  ...tionParser.ts |     100 |      100 |     100 |     100 |                   
  ...lProcessor.ts |   97.41 |    95.65 |     100 |   97.41 | 95-98             
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/services/tips |   97.35 |    85.29 |     100 |   97.35 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  tipHistory.ts    |   92.45 |       70 |     100 |   92.45 | ...22,144,151,160 
  tipRegistry.ts   |     100 |      100 |     100 |     100 |                   
  tipScheduler.ts  |     100 |    91.66 |     100 |     100 | 55                
 src/test-utils    |   93.75 |    83.33 |      80 |   93.75 |                   
  ...omMatchers.ts |   69.69 |       50 |      50 |   69.69 | 32-35,37-39,45-47 
  ...andContext.ts |     100 |      100 |     100 |     100 |                   
  render.tsx       |     100 |      100 |     100 |     100 |                   
 src/ui            |   65.28 |    73.08 |   60.34 |   65.28 |                   
  App.tsx          |     100 |      100 |     100 |     100 |                   
  AppContainer.tsx |   63.38 |    64.68 |      50 |   63.38 | ...3156,3160-3164 
  ...tionNudge.tsx |    9.58 |      100 |       0 |    9.58 | 24-94             
  ...ackDialog.tsx |   29.23 |      100 |       0 |   29.23 | 25-75             
  ...tionNudge.tsx |    7.69 |      100 |       0 |    7.69 | 25-103            
  colors.ts        |      60 |      100 |   35.29 |      60 | ...52,54-55,60-61 
  constants.ts     |     100 |      100 |     100 |     100 |                   
  keyMatchers.ts   |   95.91 |    97.05 |     100 |   95.91 | 25-26             
  ...tic-colors.ts |     100 |      100 |     100 |     100 |                   
  ...inePresets.ts |   98.17 |    88.88 |     100 |   98.17 | ...12,239,387-389 
  textConstants.ts |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/ui/auth       |   55.06 |    51.13 |   35.48 |   55.06 |                   
  AuthDialog.tsx   |   64.26 |    44.44 |   16.66 |   64.26 | ...59,366-388,392 
  ...nProgress.tsx |       0 |        0 |       0 |       0 | 1-64              
  ...etupSteps.tsx |    39.5 |       32 |   38.46 |    39.5 | ...69,472,478,481 
  useAuth.ts       |   76.63 |    68.29 |     100 |   76.63 | ...48,493-499,560 
  ...rSetupFlow.ts |   44.61 |    33.33 |      50 |   44.61 | ...57-378,395-438 
 src/ui/commands   |   75.19 |    81.23 |   83.08 |   75.19 |                   
  aboutCommand.ts  |     100 |      100 |     100 |     100 |                   
  agentsCommand.ts |   83.78 |      100 |      60 |   83.78 | 30-32,42-44       
  ...odeCommand.ts |   89.04 |    81.25 |     100 |   89.04 | 91-92,94-99       
  arenaCommand.ts  |   62.81 |    58.73 |   65.21 |   62.81 | ...91-596,681-689 
  authCommand.ts   |     100 |      100 |     100 |     100 |                   
  branchCommand.ts |     100 |      100 |     100 |     100 |                   
  btwCommand.ts    |   95.59 |    71.42 |     100 |   95.59 | 72,154-159        
  bugCommand.ts    |   81.13 |    71.42 |     100 |   81.13 | 60-69             
  clearCommand.ts  |      92 |    76.47 |     100 |      92 | 43-44,72-73,91-92 
  ...essCommand.ts |    64.7 |       50 |      75 |    64.7 | ...48-149,163-166 
  ...extCommand.ts |   65.09 |    53.84 |   84.61 |   65.09 | ...66-601,612-613 
  copyCommand.ts   |   98.28 |    94.89 |     100 |   98.28 | ...80,280,321,327 
  deleteCommand.ts |     100 |      100 |     100 |     100 |                   
  diffCommand.ts   |     100 |     87.5 |     100 |     100 | ...61,224-225,238 
  ...ryCommand.tsx |   68.09 |    77.77 |   77.77 |   68.09 | ...56-261,315-323 
  docsCommand.ts   |     100 |    88.88 |     100 |     100 | 25                
  doctorCommand.ts |   95.06 |    88.28 |     100 |   95.06 | ...92-293,320-321 
  dreamCommand.ts  |      75 |    66.66 |   66.66 |      75 | 22-27,44-47       
  editorCommand.ts |     100 |      100 |     100 |     100 |                   
  exportCommand.ts |   98.25 |    91.02 |     100 |   98.25 | ...81,198-199,364 
  ...onsCommand.ts |   48.66 |     90.9 |   63.63 |   48.66 | ...05-109,159-211 
  forgetCommand.ts |   26.82 |      100 |      50 |   26.82 | 18-51             
  goalCommand.ts   |   91.25 |    83.33 |      90 |   91.25 | ...83-186,198-201 
  helpCommand.ts   |     100 |      100 |     100 |     100 |                   
  hooksCommand.ts  |    20.4 |       40 |      40 |    20.4 | ...48-180,204-205 
  ideCommand.ts    |   60.75 |    64.28 |   41.17 |   60.75 | ...05-306,310-324 
  initCommand.ts   |   84.33 |    72.72 |     100 |   84.33 | 68,82-87,89-94    
  ...ghtCommand.ts |   74.56 |    68.42 |     100 |   74.56 | ...31-245,250-273 
  ...ageCommand.ts |   92.17 |    82.69 |     100 |   92.17 | ...43,164,173-183 
  lspCommand.ts    |     100 |    86.95 |     100 |     100 | 31,101-102        
  ...elsCommand.ts |     100 |      100 |     100 |     100 |                   
  mcpCommand.ts    |     100 |      100 |     100 |     100 |                   
  memoryCommand.ts |     100 |      100 |     100 |     100 |                   
  modelCommand.ts  |   75.09 |    78.18 |      75 |   75.09 | ...20-225,262-267 
  ...onsCommand.ts |     100 |      100 |     100 |     100 |                   
  planCommand.ts   |   78.82 |    76.92 |     100 |   78.82 | 30-35,51-56,68-73 
  quitCommand.ts   |     100 |      100 |     100 |     100 |                   
  recapCommand.ts  |   21.81 |      100 |      50 |   21.81 | 24-73             
  ...berCommand.ts |   32.43 |      100 |      50 |   32.43 | 23-57             
  renameCommand.ts |   85.71 |    86.04 |     100 |   85.71 | ...02-209,216-221 
  ...oreCommand.ts |    92.3 |    87.87 |     100 |    92.3 | ...,83-88,129-130 
  resumeCommand.ts |     100 |      100 |     100 |     100 |                   
  rewindCommand.ts |      80 |      100 |      50 |      80 | 19-21             
  ...ngsCommand.ts |     100 |      100 |     100 |     100 |                   
  ...hubCommand.ts |   81.43 |    65.21 |      80 |   81.43 | ...70-173,176-179 
  skillsCommand.ts |   15.04 |      100 |      25 |   15.04 | ...90-106,109-136 
  statsCommand.ts  |   88.19 |    84.21 |     100 |   88.19 | ...,58-61,143-146 
  ...ineCommand.ts |     100 |      100 |     100 |     100 |                   
  ...aryCommand.ts |    6.46 |      100 |      50 |    6.46 | 31-329            
  tasksCommand.ts  |   77.22 |    72.13 |     100 |   77.22 | ...46-150,172-177 
  ...tupCommand.ts |     100 |      100 |     100 |     100 |                   
  themeCommand.ts  |     100 |      100 |     100 |     100 |                   
  toolsCommand.ts  |     100 |      100 |     100 |     100 |                   
  trustCommand.ts  |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
  vimCommand.ts    |   54.54 |      100 |      50 |   54.54 | 19-29             
 src/ui/components |   62.86 |    74.64 |   65.25 |   62.86 |                   
  AboutBox.tsx     |     100 |      100 |     100 |     100 |                   
  AnsiOutput.tsx   |   65.57 |      100 |      50 |   65.57 | 69-90             
  ApiKeyInput.tsx  |       0 |        0 |       0 |       0 | 1-97              
  AppHeader.tsx    |   89.39 |       75 |     100 |   89.39 | 35,37-42,44       
  ...odeDialog.tsx |     9.7 |      100 |       0 |     9.7 | 35-47,50-182      
  AsciiArt.ts      |     100 |      100 |     100 |     100 |                   
  ...Indicator.tsx |   13.04 |      100 |       0 |   13.04 | 18-61             
  ...TextInput.tsx |   77.01 |       76 |     100 |   77.01 | ...20,234-236,263 
  Composer.tsx     |    81.6 |     64.7 |     100 |    81.6 | ...90,108,160,173 
  ...entPrompt.tsx |     100 |      100 |     100 |     100 |                   
  ...ryDisplay.tsx |   75.89 |    62.06 |     100 |   75.89 | ...,88,93-108,113 
  ...geDisplay.tsx |   68.42 |    57.14 |     100 |   68.42 | 16-17,31-32,42-50 
  ...ification.tsx |   28.57 |      100 |       0 |   28.57 | 16-36             
  ...gProfiler.tsx |       0 |        0 |       0 |       0 | 1-36              
  ...ogManager.tsx |   11.99 |      100 |       0 |   11.99 | 66-517            
  DiffDialog.tsx   |    2.47 |      100 |       0 |    2.47 | 68-732            
  ...ngsDialog.tsx |    8.44 |      100 |       0 |    8.44 | 37-195            
  ExitWarning.tsx  |     100 |      100 |     100 |     100 |                   
  ...hProgress.tsx |    87.8 |    33.33 |     100 |    87.8 | 28-31,56          
  ...ustDialog.tsx |     100 |      100 |     100 |     100 |                   
  Footer.tsx       |   76.59 |    48.64 |     100 |   76.59 | ...35-136,175-180 
  ...ngSpinner.tsx |   68.42 |       80 |      50 |   68.42 | 35-52,73,80-81    
  GoalPill.tsx     |   76.19 |    81.81 |     100 |   76.19 | 24-30,46-50       
  Header.tsx       |   98.62 |    94.28 |     100 |   98.62 | 162,164           
  Help.tsx         |   98.32 |    89.88 |     100 |   98.32 | ...24,381,447-448 
  ...emDisplay.tsx |    61.7 |       36 |     100 |    61.7 | ...42,345,348-354 
  ...ngeDialog.tsx |     100 |      100 |     100 |     100 |                   
  InputPrompt.tsx  |   83.01 |    79.78 |   83.33 |   83.01 | ...1399,1531,1581 
  ...Shortcuts.tsx |   20.87 |      100 |       0 |   20.87 | ...6,49-51,67-125 
  ...Indicator.tsx |     100 |    91.42 |     100 |     100 | 65,74             
  ...firmation.tsx |   91.42 |      100 |      50 |   91.42 | 26-31             
  MainContent.tsx  |   81.75 |       75 |     100 |   81.75 | ...70-274,282-286 
  ...elsDialog.tsx |   71.05 |    69.11 |   72.72 |   71.05 | ...77,590,601-603 
  MemoryDialog.tsx |    55.1 |    54.54 |   57.14 |    55.1 | ...56,368,381-383 
  ...geDisplay.tsx |       0 |        0 |       0 |       0 | 1-41              
  ModelDialog.tsx  |   80.12 |    63.55 |     100 |   80.12 | ...39-555,612-616 
  ...tsDisplay.tsx |     100 |    97.22 |     100 |     100 | 270               
  ...fications.tsx |   18.18 |      100 |       0 |   18.18 | 15-58             
  ...onsDialog.tsx |    2.13 |      100 |       0 |    2.13 | 62-133,148-1004   
  ...ryDisplay.tsx |     100 |      100 |     100 |     100 |                   
  ...icePrompt.tsx |   92.64 |    85.71 |     100 |   92.64 | 102-106,134-139   
  PrepareLabel.tsx |   91.66 |    77.27 |     100 |   91.66 | 73-75,77-79,110   
  ...atePrompt.tsx |    8.57 |      100 |       0 |    8.57 | 24-55,58-134      
  ...geDisplay.tsx |     100 |      100 |     100 |     100 |                   
  ...ngDisplay.tsx |   21.42 |      100 |       0 |   21.42 | 13-39             
  ...hProgress.tsx |   85.25 |    88.46 |     100 |   85.25 | 121-147           
  ...dSelector.tsx |   41.26 |    61.53 |   71.42 |   41.26 | ...74-472,476-520 
  ...ionPicker.tsx |   83.66 |    72.13 |     100 |   83.66 | ...96,402,444-466 
  ...onPreview.tsx |   92.42 |    84.37 |     100 |   92.42 | ...,70-71,143-145 
  ...ryDisplay.tsx |     100 |      100 |     100 |     100 |                   
  ...putPrompt.tsx |   72.56 |       80 |      40 |   72.56 | ...06-109,114-117 
  ...ngsDialog.tsx |   66.27 |    71.16 |      75 |   66.27 | ...12-820,826-827 
  ...ionDialog.tsx |    87.8 |      100 |   33.33 |    87.8 | 36-39,44-51       
  ...putPrompt.tsx |    15.9 |      100 |       0 |    15.9 | 20-63             
  ...Indicator.tsx |   57.14 |      100 |       0 |   57.14 | 12-15             
  ...MoreLines.tsx |      28 |      100 |       0 |      28 | 18-40             
  ...ionPicker.tsx |   17.59 |      100 |       0 |   17.59 | 55-172            
  StatsDisplay.tsx |     100 |      100 |     100 |     100 |                   
  ...ineDialog.tsx |   93.69 |    83.92 |     100 |   93.69 | ...11,273,293-295 
  ...yTodoList.tsx |   94.17 |       80 |     100 |   94.17 | 56-57,131-134     
  ...nsDisplay.tsx |   87.25 |       64 |     100 |   87.25 | ...45-147,154-156 
  ThemeDialog.tsx  |   89.95 |    46.15 |      75 |   89.95 | ...71-173,243-245 
  Tips.tsx         |   93.54 |       75 |     100 |   93.54 | 39-40             
  TodoDisplay.tsx  |     100 |      100 |     100 |     100 |                   
  ...tsDisplay.tsx |     100 |     87.5 |     100 |     100 | 31-32             
  TrustDialog.tsx  |     100 |    81.81 |     100 |     100 | 71-86             
  ...ification.tsx |   36.36 |      100 |       0 |   36.36 | 15-22             
  ...ackDialog.tsx |    7.84 |      100 |       0 |    7.84 | 24-134            
  ...xitDialog.tsx |   80.36 |    43.47 |      60 |   80.36 | ...24-238,248-251 
 ...nts/agent-view |   38.33 |    70.83 |   36.36 |   38.33 |                   
  ...atContent.tsx |    8.79 |      100 |       0 |    8.79 | 53-265,271-273    
  ...tChatView.tsx |   21.05 |      100 |       0 |   21.05 | 21-39             
  ...tComposer.tsx |    9.95 |      100 |       0 |    9.95 | 57-308            
  AgentFooter.tsx  |   17.07 |      100 |       0 |   17.07 | 28-66             
  AgentHeader.tsx  |   15.38 |      100 |       0 |   15.38 | 27-64             
  AgentTabBar.tsx  |    87.8 |    27.27 |     100 |    87.8 | ...,85,98-106,124 
  ...oryAdapter.ts |     100 |    91.83 |     100 |     100 | 103,109-110,138   
  index.ts         |       0 |        0 |       0 |       0 | 1-12              
 ...mponents/arena |   45.72 |    70.53 |   60.86 |   45.72 |                   
  ArenaCards.tsx   |   73.06 |    71.79 |   85.71 |   73.06 | ...83-185,321-326 
  ...ectDialog.tsx |   83.48 |    69.86 |   88.88 |   83.48 | ...88-392,409-410 
  ...artDialog.tsx |   10.15 |      100 |       0 |   10.15 | 27-161            
  ...tusDialog.tsx |    5.63 |      100 |       0 |    5.63 | 33-75,80-288      
  ...topDialog.tsx |    6.17 |      100 |       0 |    6.17 | 33-213            
 ...ackground-view |   75.63 |    84.49 |   85.29 |   75.63 |                   
  ...sksDialog.tsx |   70.92 |    80.48 |   76.19 |   70.92 | ...1118,1194-1196 
  ...TasksPill.tsx |   63.75 |    86.95 |     100 |   63.75 | 44,86-106,114-122 
  ...gentPanel.tsx |   99.53 |    93.18 |     100 |   99.53 | 123               
 ...nts/extensions |   45.28 |    33.33 |      60 |   45.28 |                   
  ...gerDialog.tsx |   44.31 |    34.14 |      75 |   44.31 | ...71-480,483-488 
  index.ts         |       0 |        0 |       0 |       0 | 1-9               
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...tensions/steps |   54.88 |    94.23 |   66.66 |   54.88 |                   
  ...ctionStep.tsx |   95.12 |    92.85 |   85.71 |   95.12 | 84-86,89          
  ...etailStep.tsx |    6.18 |      100 |       0 |    6.18 | 17-128            
  ...nListStep.tsx |   88.43 |    94.73 |      80 |   88.43 | 52-53,59-72,106   
  ...electStep.tsx |   13.46 |      100 |       0 |   13.46 | 20-70             
  ...nfirmStep.tsx |   19.56 |      100 |       0 |   19.56 | 23-65             
  index.ts         |     100 |      100 |     100 |     100 |                   
 ...mponents/hooks |   68.67 |    69.07 |   69.56 |   68.67 |                   
  ...etailStep.tsx |   74.68 |    66.66 |   66.66 |   74.68 | ...71-184,188-201 
  ...etailStep.tsx |    87.4 |    73.68 |     100 |    87.4 | 41-42,99-113,119  
  ...abledStep.tsx |     100 |      100 |     100 |     100 |                   
  ...sListStep.tsx |     100 |      100 |     100 |     100 |                   
  ...entDialog.tsx |   34.51 |    47.05 |   42.85 |   34.51 | ...78,482-495,499 
  constants.ts     |     100 |      100 |     100 |     100 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-13              
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...components/mcp |   20.98 |    86.36 |   83.33 |   20.98 |                   
  ...ealthPill.tsx |   68.42 |    85.71 |     100 |   68.42 | 40-46             
  ...entDialog.tsx |    3.64 |      100 |       0 |    3.64 | 41-717            
  constants.ts     |     100 |      100 |     100 |     100 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-30              
  types.ts         |     100 |      100 |     100 |     100 |                   
  utils.ts         |   95.83 |    88.88 |     100 |   95.83 | 16,20,109-110     
 ...ents/mcp/steps |   26.74 |    54.54 |   42.85 |   26.74 |                   
  ...icateStep.tsx |    5.88 |      100 |       0 |    5.88 | 40-55,58-296      
  ...electStep.tsx |   10.95 |      100 |       0 |   10.95 | 16-88             
  ...etailStep.tsx |    5.26 |      100 |       0 |    5.26 | 31-247            
  ...rListStep.tsx |   75.18 |    59.37 |     100 |   75.18 | ...53-158,169-173 
  ...etailStep.tsx |   10.41 |      100 |       0 |   10.41 | ...1,67-79,82-139 
  ToolListStep.tsx |   69.02 |       50 |     100 |   69.02 | ...22,125,134-143 
 ...nents/messages |   82.44 |    79.55 |    72.6 |   82.44 |                   
  ...ionDialog.tsx |   80.84 |     77.6 |    62.5 |   80.84 | ...98,516,534-536 
  BtwMessage.tsx   |     100 |      100 |     100 |     100 |                   
  ...upDisplay.tsx |   97.67 |    83.72 |     100 |   97.67 | 119,142,150       
  ...onMessage.tsx |   91.93 |    82.35 |     100 |   91.93 | 57-59,61,63       
  ...nMessages.tsx |   79.06 |      100 |      70 |   79.06 | ...51-264,268-280 
  DiffRenderer.tsx |   93.19 |    86.17 |     100 |   93.19 | ...09,237-238,304 
  ...tsDisplay.tsx |   97.82 |    77.27 |     100 |   97.82 | 87,89             
  ...usMessage.tsx |   76.31 |     42.1 |   66.66 |   76.31 | ...99,101,124,155 
  ...ssMessage.tsx |    12.5 |      100 |       0 |    12.5 | 18-59             
  ...edMessage.tsx |   16.66 |      100 |       0 |   16.66 | 22-38             
  ...sMessages.tsx |   55.67 |       40 |   28.57 |   55.67 | ...20-125,133-145 
  ...ryMessage.tsx |   14.28 |      100 |       0 |   14.28 | 23-62             
  ...onMessage.tsx |   81.02 |    69.23 |   33.33 |   81.02 | ...24-426,433-435 
  ...upMessage.tsx |      84 |    93.61 |     100 |      84 | ...56-383,405-420 
  ToolMessage.tsx  |   88.84 |    75.71 |    92.3 |   88.84 | ...44-749,776-778 
 ...ponents/shared |   85.36 |    78.48 |   95.77 |   85.36 |                   
  ...ctionList.tsx |   99.03 |    95.65 |     100 |   99.03 | 85                
  ...tonSelect.tsx |     100 |      100 |     100 |     100 |                   
  EnumSelector.tsx |     100 |    96.42 |     100 |     100 | 58                
  MaxSizedBox.tsx  |   83.01 |    86.25 |   88.88 |   83.01 | ...12-513,618-619 
  MultiSelect.tsx  |   84.31 |    74.19 |     100 |   84.31 | ...37,193-195,205 
  ...tonSelect.tsx |     100 |      100 |     100 |     100 |                   
  ...eSelector.tsx |     100 |       60 |     100 |     100 | 40-45             
  TextInput.tsx    |   77.01 |    48.78 |      80 |   77.01 | ...08-212,224-230 
  ...apsedTime.tsx |     100 |      100 |     100 |     100 |                   
  ...Indicator.tsx |     100 |      100 |     100 |     100 |                   
  text-buffer.ts   |   83.68 |    78.55 |   97.61 |   83.68 | ...2270-2272,2368 
  ...er-actions.ts |   86.71 |    67.79 |     100 |   86.71 | ...07-608,809-811 
 ...ents/subagents |   30.87 |        0 |       0 |   30.87 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  index.ts         |       0 |        0 |       0 |       0 | 1-11              
  reducers.tsx     |    12.1 |      100 |       0 |    12.1 | 33-190            
  types.ts         |     100 |      100 |     100 |     100 |                   
  utils.ts         |   10.95 |      100 |       0 |   10.95 | ...1,56-57,60-102 
 ...bagents/create |    9.13 |      100 |       0 |    9.13 |                   
  ...ionWizard.tsx |    7.28 |      100 |       0 |    7.28 | 34-299            
  ...rSelector.tsx |   14.75 |      100 |       0 |   14.75 | 26-85             
  ...onSummary.tsx |    4.26 |      100 |       0 |    4.26 | 27-331            
  ...tionInput.tsx |    8.63 |      100 |       0 |    8.63 | 23-177            
  ...dSelector.tsx |   33.33 |      100 |       0 |   33.33 | 20-21,26-27,36-63 
  ...nSelector.tsx |    37.5 |      100 |       0 |    37.5 | 20-21,26-27,36-58 
  ...EntryStep.tsx |   12.76 |      100 |       0 |   12.76 | 34-78             
  ToolSelector.tsx |    4.16 |      100 |       0 |    4.16 | 31-253            
 ...bagents/manage |   21.51 |    59.52 |   27.27 |   21.51 |                   
  ...ctionStep.tsx |   10.25 |      100 |       0 |   10.25 | 21-103            
  ...eleteStep.tsx |   20.93 |      100 |       0 |   20.93 | 23-62             
  ...tEditStep.tsx |   25.53 |      100 |       0 |   25.53 | ...2,37-38,51-124 
  ...ctionStep.tsx |   35.42 |    59.52 |     100 |   35.42 | ...20-432,437-439 
  ...iewerStep.tsx |   13.72 |      100 |       0 |   13.72 | 18-73             
  ...gerDialog.tsx |    6.74 |      100 |       0 |    6.74 | 35-341            
 ...mponents/views |   70.11 |       68 |    64.7 |   70.11 |                   
  ContextUsage.tsx |   70.67 |    65.71 |      80 |   70.67 | ...16-422,459-553 
  DoctorReport.tsx |     9.8 |      100 |       0 |     9.8 | 25-54,57-131      
  ...sionsList.tsx |   87.69 |    73.68 |     100 |   87.69 | 65-72             
  McpStatus.tsx    |   89.53 |    60.52 |     100 |   89.53 | ...72,175-177,262 
  SkillsList.tsx   |   27.27 |      100 |       0 |   27.27 | 18-35             
  ToolsList.tsx    |     100 |      100 |     100 |     100 |                   
 src/ui/contexts   |   77.34 |    78.06 |   80.35 |   77.34 |                   
  ...ewContext.tsx |    64.7 |    85.71 |      50 |    64.7 | ...22-225,231-241 
  AppContext.tsx   |      80 |       50 |     100 |      80 | 19-20             
  ...ewContext.tsx |   95.18 |    67.56 |      50 |   95.18 | ...94-195,222-226 
  ...deContext.tsx |     100 |      100 |     100 |     100 |                   
  ...igContext.tsx |   81.81 |       50 |     100 |   81.81 | 15-16             
  ...ssContext.tsx |   82.31 |    82.84 |     100 |   82.31 | ...1153,1159-1161 
  ...owContext.tsx |   89.28 |       80 |   66.66 |   89.28 | 34,47-48,60-62    
  ...deContext.tsx |     100 |      100 |      50 |     100 |                   
  ...onContext.tsx |   43.28 |     62.5 |    62.5 |   43.28 | ...56-259,263-266 
  ...gsContext.tsx |   83.33 |       50 |     100 |   83.33 | 17-18             
  ...usContext.tsx |     100 |      100 |     100 |     100 |                   
  ...ngContext.tsx |   71.42 |       50 |     100 |   71.42 | 17-20             
  ...utContext.tsx |   85.71 |      100 |   66.66 |   85.71 | 13-14             
  ...nsContext.tsx |   88.23 |       50 |     100 |   88.23 | 120-121           
  ...teContext.tsx |   86.66 |       50 |     100 |   86.66 | 195-196           
  ...deContext.tsx |   76.08 |    72.72 |     100 |   76.08 | 47-48,52-59,77-78 
 src/ui/daemon     |   90.76 |    73.73 |   95.45 |   90.76 |                   
  ...TuiAdapter.ts |   90.76 |    73.73 |   95.45 |   90.76 | ...53,771-772,858 
 src/ui/editors    |   93.33 |    85.71 |   66.66 |   93.33 |                   
  ...ngsManager.ts |   93.33 |    85.71 |   66.66 |   93.33 | 49,63-64          
 src/ui/hooks      |   82.14 |    82.17 |   86.69 |   82.14 |                   
  ...dProcessor.ts |   83.12 |    82.56 |     100 |   83.12 | ...88-389,408-435 
  keyToAnsi.ts     |    3.92 |      100 |       0 |    3.92 | 19-77             
  ...dProcessor.ts |    94.8 |    70.58 |     100 |    94.8 | ...76-277,282-283 
  ...dProcessor.ts |   75.59 |    62.58 |   61.53 |   75.59 | ...88,912,931-935 
  ...amingState.ts |   12.22 |      100 |       0 |   12.22 | 54-157            
  ...agerDialog.ts |   88.23 |      100 |     100 |   88.23 | 20,24             
  ...ationFrame.ts |      32 |       60 |     100 |      32 | 42-44,51-90       
  ...odeCommand.ts |   58.82 |      100 |     100 |   58.82 | 28,33-48          
  ...enaCommand.ts |      85 |      100 |     100 |      85 | 23-24,29          
  ...aInProcess.ts |   19.81 |    66.66 |      25 |   19.81 | 57-175            
  ...Completion.ts |   92.77 |    89.09 |     100 |   92.77 | ...86-187,220-223 
  ...ifications.ts |   92.07 |    96.29 |     100 |   92.07 | 116-124           
  ...tIndicator.ts |   83.49 |    70.96 |     100 |   83.49 | ...60,168,170-178 
  ...waySummary.ts |   96.22 |    69.69 |     100 |   96.22 | 125-127,169       
  ...ndTaskView.ts |   94.21 |    76.08 |     100 |   94.21 | 122-126,213,219   
  ...ketedPaste.ts |    23.8 |      100 |       0 |    23.8 | 19-37             
  ...nchCommand.ts |   94.36 |    74.35 |     100 |   94.36 | ...60,168-169,209 
  ...ompletion.tsx |   95.95 |    82.75 |     100 |   95.95 | ...22-223,225-226 
  ...dMigration.ts |   90.62 |       75 |     100 |   90.62 | 38-40             
  useCompletion.ts |    92.4 |     87.5 |     100 |    92.4 | 68-69,93-94,98-99 
  ...nitMessage.ts |     100 |      100 |     100 |     100 |                   
  ...extualTips.ts |   77.27 |       50 |     100 |   77.27 | ...2,75-79,93-101 
  ...eteCommand.ts |   78.53 |    88.57 |     100 |   78.53 | ...96-104,112-113 
  ...ialogClose.ts |   13.33 |      100 |     100 |   13.33 | 91-182            
  useDiffData.ts   |   11.62 |      100 |       0 |   11.62 | 44-87             
  ...oublePress.ts |   53.12 |       75 |     100 |   53.12 | 33-35,41-54       
  ...orSettings.ts |     100 |      100 |     100 |     100 |                   
  ...Completion.ts |   99.12 |     97.7 |     100 |   99.12 | 182-183           
  ...ionUpdates.ts |   93.45 |     92.3 |     100 |   93.45 | ...83-287,300-306 
  ...agerDialog.ts |   88.88 |      100 |     100 |   88.88 | 21,25             
  ...backDialog.ts |   54.47 |       50 |   33.33 |   54.47 | ...69-171,193-194 
  useFocus.ts      |     100 |      100 |     100 |     100 |                   
  ...olderTrust.ts |     100 |      100 |     100 |     100 |                   
  ...ggestions.tsx |   89.15 |     62.5 |      50 |   89.15 | ...22-124,149-150 
  ...miniStream.ts |    77.7 |    74.93 |   91.66 |    77.7 | ...2497,2510-2518 
  ...BranchName.ts |    90.9 |     92.3 |     100 |    90.9 | 19-20,55-58       
  ...oryManager.ts |   93.15 |    93.75 |     100 |   93.15 | 44,107-110        
  ...ooksDialog.ts |    87.5 |      100 |     100 |    87.5 | 19,23             
  ...stListener.ts |     100 |      100 |     100 |     100 |                   
  ...nAuthError.ts |   76.19 |       50 |     100 |   76.19 | 39-40,43-45       
  ...putHistory.ts |   92.59 |    85.71 |     100 |   92.59 | 63-64,72,94-96    
  ...storyStore.ts |     100 |    94.11 |     100 |     100 | 69                
  useKeypress.ts   |     100 |      100 |     100 |     100 |                   
  ...rdProtocol.ts |   36.36 |      100 |       0 |   36.36 | 24-31             
  ...unchEditor.ts |    9.67 |      100 |       0 |    9.67 | 11-32,39-90       
  ...gIndicator.ts |     100 |      100 |     100 |     100 |                   
  useLogger.ts     |   21.05 |      100 |       0 |   21.05 | 15-37             
  useMCPHealth.ts  |   63.15 |       75 |      50 |   63.15 | 42-52,64-67       
  ...elsCommand.ts |     100 |      100 |     100 |     100 |                   
  useMcpDialog.ts  |    87.5 |      100 |     100 |    87.5 | 19,23             
  ...moryDialog.ts |    87.5 |      100 |     100 |    87.5 | 19,23             
  ...oryMonitor.ts |     100 |      100 |     100 |     100 |                   
  ...ssageQueue.ts |     100 |      100 |     100 |     100 |                   
  ...delCommand.ts |     100 |       75 |     100 |     100 | 22                
  ...raseCycler.ts |   84.74 |    76.47 |     100 |   84.74 | ...49,52-53,69-71 
  ...derUpdates.ts |   86.38 |    77.19 |     100 |   86.38 | ...22,281-293,341 
  useQwenAuth.ts   |     100 |      100 |     100 |     100 |                   
  ...lScheduler.ts |    84.7 |    93.33 |     100 |    84.7 | ...71-276,372-382 
  ...oryCommand.ts |       0 |        0 |       0 |       0 | 1-7               
  ...umeCommand.ts |   97.08 |    83.33 |     100 |   97.08 | 103-104,133       
  ...ompletion.tsx |   90.59 |    83.33 |     100 |   90.59 | ...01,104,137-140 
  ...ectionList.ts |   96.98 |    95.69 |     100 |   96.98 | ...83-184,238-241 
  ...sionPicker.ts |   92.87 |    90.35 |     100 |   92.87 | ...99-501,503-505 
  ...earchInput.ts |     100 |      100 |     100 |     100 |                   
  ...ngsCommand.ts |   18.75 |      100 |       0 |   18.75 | 10-25             
  ...ellHistory.ts |   91.74 |    79.41 |     100 |   91.74 | ...74,122-123,133 
  ...oryCommand.ts |       0 |        0 |       0 |       0 | 1-73              
  ...Completion.ts |   82.67 |    85.41 |   94.73 |   82.67 | ...68-670,678-714 
  ...tateAndRef.ts |     100 |      100 |     100 |     100 |                   
  useStatusLine.ts |   96.09 |    90.37 |     100 |   96.09 | ...62-365,450-457 
  ...eateDialog.ts |   88.23 |      100 |     100 |   88.23 | 14,18             
  ...tification.ts |     100 |    85.71 |     100 |     100 | 47                
  ...alProgress.ts |   53.06 |       50 |   66.66 |   53.06 | ...53,61-68,79-85 
  ...rminalSize.ts |   76.19 |      100 |      50 |   76.19 | 21-25             
  ...emeCommand.ts |   67.01 |    29.41 |     100 |   67.01 | ...10-111,115-116 
  useTimer.ts      |   88.09 |    85.71 |     100 |   88.09 | 44-45,51-53       
  ...lMigration.ts |       0 |        0 |       0 |       0 |                   
  ...rustModify.ts |     100 |      100 |     100 |     100 |                   
  useTurnDiffs.ts  |   95.12 |    78.57 |     100 |   95.12 | 133-134,156-157   
  ...elcomeBack.ts |   87.36 |     90.9 |     100 |   87.36 | ...,94-96,114-115 
  ...reeSession.ts |   93.75 |       75 |     100 |   93.75 | 44-45,87          
  vim.ts           |   83.77 |    80.31 |     100 |   83.77 | ...55,759-767,776 
 src/ui/layouts    |   89.72 |     87.5 |     100 |   89.72 |                   
  ...AppLayout.tsx |   89.88 |     87.5 |     100 |   89.88 | 51-53,93-98       
  ...AppLayout.tsx |   89.47 |     87.5 |     100 |   89.47 | 58-63             
 ...i/manageModels |   93.61 |       48 |     100 |   93.61 |                   
  manageModels.ts  |   93.61 |       48 |     100 |   93.61 | ...63-166,179,209 
 src/ui/models     |   80.24 |    79.16 |   71.42 |   80.24 |                   
  ...ableModels.ts |   80.24 |    79.16 |   71.42 |   80.24 | ...,61-71,123-125 
 ...noninteractive |     100 |      100 |   14.28 |     100 |                   
  ...eractiveUi.ts |     100 |      100 |   14.28 |     100 |                   
 src/ui/state      |   94.91 |    81.81 |     100 |   94.91 |                   
  extensions.ts    |   94.91 |    81.81 |     100 |   94.91 | 68-69,88          
 src/ui/themes     |   98.53 |    70.58 |     100 |   98.53 |                   
  ansi-light.ts    |     100 |      100 |     100 |     100 |                   
  ansi.ts          |     100 |      100 |     100 |     100 |                   
  atom-one-dark.ts |     100 |      100 |     100 |     100 |                   
  ayu-light.ts     |     100 |      100 |     100 |     100 |                   
  ayu.ts           |     100 |      100 |     100 |     100 |                   
  color-utils.ts   |     100 |      100 |     100 |     100 |                   
  default-light.ts |     100 |      100 |     100 |     100 |                   
  default.ts       |     100 |      100 |     100 |     100 |                   
  ...inal-theme.ts |   88.59 |    85.96 |     100 |   88.59 | ...57-261,266-270 
  dracula.ts       |     100 |      100 |     100 |     100 |                   
  github-dark.ts   |     100 |      100 |     100 |     100 |                   
  github-light.ts  |     100 |      100 |     100 |     100 |                   
  googlecode.ts    |     100 |      100 |     100 |     100 |                   
  no-color.ts      |     100 |      100 |     100 |     100 |                   
  qwen-dark.ts     |     100 |      100 |     100 |     100 |                   
  qwen-light.ts    |     100 |      100 |     100 |     100 |                   
  ...tic-tokens.ts |     100 |      100 |     100 |     100 |                   
  ...-of-purple.ts |     100 |      100 |     100 |     100 |                   
  theme-manager.ts |   87.98 |    82.89 |     100 |   87.98 | ...48-357,362-363 
  theme.ts         |     100 |    38.02 |     100 |     100 | ...34-449,457-461 
  xcode.ts         |     100 |      100 |     100 |     100 |                   
 src/ui/utils      |   83.98 |    82.97 |   92.61 |   83.98 |                   
  ...Colorizer.tsx |   79.53 |    83.78 |     100 |   79.53 | ...51-152,249-275 
  ...nRenderer.tsx |   68.83 |    70.14 |      50 |   68.83 | ...52-254,274-293 
  ...wnDisplay.tsx |   86.01 |    87.41 |     100 |   86.01 | ...87,704,729-754 
  ...idDiagram.tsx |   87.79 |    95.34 |     100 |   87.79 | 156-179           
  ...eRenderer.tsx |   92.08 |    80.45 |      95 |   92.08 | ...76-679,723-728 
  ...dWorkUtils.ts |     100 |      100 |     100 |     100 |                   
  ...boardUtils.ts |   59.61 |    58.82 |     100 |   59.61 | ...,86-88,107-149 
  commandUtils.ts  |    95.9 |    88.42 |     100 |    95.9 | ...62,164-165,289 
  computeStats.ts  |     100 |      100 |     100 |     100 |                   
  customBanner.ts  |   90.68 |    91.22 |     100 |   90.68 | ...13,324-327,334 
  displayUtils.ts  |   88.37 |    72.22 |     100 |   88.37 | 23,25,29,31,33    
  formatters.ts    |   95.23 |    98.27 |     100 |   95.23 | 117-120           
  gradientUtils.ts |     100 |      100 |     100 |     100 |                   
  highlight.ts     |     100 |      100 |     100 |     100 |                   
  ...oryMapping.ts |     100 |    94.28 |     100 |     100 | 35,57             
  historyUtils.ts  |   94.11 |       94 |     100 |   94.11 | 94-97             
  isNarrowWidth.ts |     100 |      100 |     100 |     100 |                   
  ...olDetector.ts |    8.23 |      100 |       0 |    8.23 | ...31-132,135-136 
  latexRenderer.ts |   94.95 |     73.8 |     100 |   94.95 | ...76-178,184-187 
  layoutUtils.ts   |     100 |      100 |     100 |     100 |                   
  ...ightLoader.ts |     100 |    89.47 |     100 |     100 | 81,110            
  ...nUtilities.ts |   69.84 |    85.71 |     100 |   69.84 | 75-91,100-101     
  ...ToolGroups.ts |   98.66 |    96.77 |     100 |   98.66 | 48-49             
  ...geRenderer.ts |   86.23 |    69.06 |   95.12 |   86.23 | ...1284,1324-1330 
  ...alRenderer.ts |   86.69 |     71.9 |     100 |   86.69 | ...1476,1513-1519 
  ...lsBySource.ts |     100 |    95.23 |     100 |     100 | 84                
  osc8.ts          |   94.71 |    87.41 |     100 |   94.71 | ...43,428,432-433 
  ...mConstants.ts |     100 |      100 |     100 |     100 |                   
  restoreGoal.ts   |   98.98 |    97.05 |     100 |   98.98 | 98                
  ...storyUtils.ts |   61.89 |    69.87 |      90 |   61.89 | ...76,424,429-451 
  ...ickerUtils.ts |     100 |      100 |     100 |     100 |                   
  ...izedOutput.ts |   94.94 |      100 |   88.88 |   94.94 | 112-117           
  ...wOptimizer.ts |     100 |    96.77 |     100 |     100 | 69                
  terminalSetup.ts |    4.37 |      100 |       0 |    4.37 | 44-393            
  textUtils.ts     |   97.61 |    94.84 |   92.85 |   97.61 | ...50-251,386-387 
  todoSnapshot.ts  |   89.11 |    93.33 |     100 |   89.11 | ...,66-78,180-181 
  updateCheck.ts   |     100 |    80.95 |     100 |     100 | 30-42             
 ...i/utils/export |   56.77 |     40.8 |   79.41 |   56.77 |                   
  collect.ts       |   55.92 |    50.58 |   86.36 |   55.92 | ...25-640,642-647 
  index.ts         |     100 |      100 |     100 |     100 |                   
  normalize.ts     |   57.47 |    20.51 |      80 |   57.47 | ...09-310,324-359 
  types.ts         |       0 |        0 |       0 |       0 | 1                 
  utils.ts         |      40 |      100 |       0 |      40 | 11-13             
 ...ort/formatters |    3.38 |      100 |       0 |    3.38 |                   
  html.ts          |    9.61 |      100 |       0 |    9.61 | ...28,34-76,82-84 
  json.ts          |      50 |      100 |       0 |      50 | 14-15             
  jsonl.ts         |     3.5 |      100 |       0 |     3.5 | 14-76             
  markdown.ts      |    0.94 |      100 |       0 |    0.94 | 13-295            
 src/utils         |   76.06 |    89.51 |   93.82 |   76.06 |                   
  acpModelUtils.ts |     100 |      100 |     100 |     100 |                   
  apiPreconnect.ts |   96.72 |    97.14 |     100 |   96.72 | 165-168           
  checks.ts        |   33.33 |      100 |       0 |   33.33 | 23-28             
  cleanup.ts       |   84.12 |    93.33 |      80 |   84.12 | 75,106-115        
  commands.ts      |     100 |      100 |     100 |     100 |                   
  commentJson.ts   |   87.17 |     90.9 |     100 |   87.17 | 64-73             
  ...Calculator.ts |     100 |      100 |     100 |     100 |                   
  deepMerge.ts     |     100 |       90 |     100 |     100 | 41-43,49          
  ...ScopeUtils.ts |   97.56 |    88.88 |     100 |   97.56 | 67                
  doctorChecks.ts  |   71.06 |       75 |     100 |   71.06 | ...95-301,325-341 
  ...putCapture.ts |   90.65 |    86.17 |     100 |   90.65 | ...72,370,372-373 
  ...arResolver.ts |   94.28 |       88 |     100 |   94.28 | 28-29,125-126     
  errors.ts        |   98.67 |    96.36 |     100 |   98.67 | 67-68             
  events.ts        |     100 |      100 |     100 |     100 |                   
  gitUtils.ts      |   91.91 |    84.61 |     100 |   91.91 | 78-81,124-127     
  ...AutoUpdate.ts |   90.76 |    93.33 |   88.88 |   90.76 | 103-114           
  ...lationInfo.ts |     100 |      100 |     100 |     100 |                   
  languageUtils.ts |   97.89 |    96.42 |     100 |   97.89 | 132-133           
  math.ts          |       0 |        0 |       0 |       0 | 1-15              
  ...iagnostics.ts |   94.57 |    83.01 |   88.88 |   94.57 | ...05,311,315-317 
  ...onfigUtils.ts |     100 |      100 |     100 |     100 |                   
  ...iveHelpers.ts |   96.79 |    93.28 |     100 |   96.79 | ...76-477,575,588 
  osc.ts           |    97.5 |      100 |   88.88 |    97.5 | 195-196           
  package.ts       |   88.88 |       80 |     100 |   88.88 | 33-34             
  processUtils.ts  |     100 |      100 |     100 |     100 |                   
  readStdin.ts     |   79.62 |       90 |      80 |   79.62 | 33-40,52-54       
  relaunch.ts      |   98.07 |    76.92 |     100 |   98.07 | 70                
  resolvePath.ts   |   66.66 |       25 |     100 |   66.66 | 12-13,16,18-19    
  sandbox.ts       |       0 |        0 |       0 |       0 | 1-1047            
  settingsUtils.ts |   82.89 |    90.67 |   89.47 |   82.89 | ...52-663,670-678 
  spawnWrapper.ts  |     100 |      100 |     100 |     100 |                   
  ...upProfiler.ts |   98.46 |    94.52 |     100 |   98.46 | 130-131,305       
  ...upWarnings.ts |     100 |      100 |     100 |     100 |                   
  stdioHelpers.ts  |     100 |       60 |     100 |     100 | 23,32             
  systemInfo.ts    |   95.12 |    89.06 |     100 |   95.12 | ...43-244,249-253 
  ...InfoFields.ts |   87.61 |       65 |     100 |   87.61 | ...22-123,144-145 
  ...iffPreview.ts |   94.11 |    83.33 |     100 |   94.11 | 13                
  ...entEmitter.ts |     100 |      100 |     100 |     100 |                   
  ...upWarnings.ts |   91.17 |    82.35 |     100 |   91.17 | 67-68,73-74,77-78 
  version.ts       |     100 |       50 |     100 |     100 | 11                
  windowTitle.ts   |     100 |      100 |     100 |     100 |                   
  ...WithBackup.ts |   63.15 |    81.25 |     100 |   63.15 | 93,118-157        
-------------------|---------|----------|---------|---------|-------------------
Core Package - Full Text Report
-------------------|---------|----------|---------|---------|-------------------
File               | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s 
-------------------|---------|----------|---------|---------|-------------------
All files          |   79.59 |    82.85 |   82.19 |   79.59 |                   
 src               |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
 src/__mocks__/fs  |       0 |        0 |       0 |       0 |                   
  promises.ts      |       0 |        0 |       0 |       0 | 1-48              
 src/agents        |   87.58 |    78.93 |   91.76 |   87.58 |                   
  ...transcript.ts |   92.25 |    85.71 |     100 |   92.25 | ...87,306-307,438 
  ...ent-resume.ts |   82.53 |    71.28 |   77.41 |   82.53 | ...1045-1049,1052 
  ...ound-tasks.ts |    95.4 |    86.48 |     100 |    95.4 | ...55-756,827-828 
  index.ts         |     100 |      100 |     100 |     100 |                   
 src/agents/arena  |   76.54 |    66.87 |   78.72 |   76.54 |                   
  ...gentClient.ts |   79.47 |    88.88 |   81.81 |   79.47 | ...68-183,189-204 
  ArenaManager.ts  |   75.37 |    63.37 |   78.26 |   75.37 | ...1860,1866-1867 
  arena-events.ts  |   64.44 |      100 |      50 |   64.44 | ...71-175,178-183 
  diff-summary.ts  |    87.5 |    72.34 |     100 |    87.5 | ...32-133,137-138 
  index.ts         |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...gents/backends |   76.29 |    86.15 |   73.04 |   76.29 |                   
  ITermBackend.ts  |   97.97 |    93.93 |     100 |   97.97 | ...78-180,255,307 
  ...essBackend.ts |   91.25 |    90.62 |   86.66 |   91.25 | ...94,249-269,328 
  TmuxBackend.ts   |    90.7 |    76.55 |   97.36 |    90.7 | ...87,697,743-747 
  detect.ts        |   31.25 |      100 |       0 |   31.25 | 34-88             
  index.ts         |     100 |      100 |     100 |     100 |                   
  iterm-it2.ts     |     100 |     92.1 |     100 |     100 | 37-38,106         
  tmux-commands.ts |    6.64 |      100 |    3.03 |    6.64 | ...93-363,386-503 
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...agents/runtime |   81.14 |     76.7 |   71.42 |   81.14 |                   
  agent-context.ts |     100 |      100 |     100 |     100 |                   
  agent-core.ts    |   76.49 |    72.35 |   60.86 |   76.49 | ...1608,1635-1682 
  agent-events.ts  |     100 |      100 |     100 |     100 |                   
  ...t-headless.ts |   81.19 |    71.73 |   60.86 |   81.19 | ...98-399,402-403 
  ...nteractive.ts |   79.71 |    79.62 |      75 |   79.71 | ...54,456,458,461 
  ...statistics.ts |   98.19 |    82.35 |     100 |   98.19 | 127,151,192,225   
  agent-types.ts   |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
 src/agents/tasks  |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/config        |   78.38 |    82.01 |   65.78 |   78.38 |                   
  config.ts        |   76.22 |     80.8 |   61.16 |   76.22 | ...3748,3759-3771 
  constants.ts     |     100 |      100 |     100 |     100 |                   
  models.ts        |     100 |      100 |     100 |     100 |                   
  storage.ts       |   95.01 |     90.9 |   90.47 |   95.01 | ...71-372,375-376 
 ...nfirmation-bus |   98.29 |    97.14 |     100 |   98.29 |                   
  message-bus.ts   |   98.14 |    97.05 |     100 |   98.14 | 42-43             
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/core          |   86.36 |    83.04 |   90.09 |   86.36 |                   
  baseLlmClient.ts |   87.24 |    76.47 |    87.5 |   87.24 | ...82,484-494,503 
  client.ts        |   87.56 |     81.3 |   86.11 |   87.56 | ...1925,1964-1967 
  ...tGenerator.ts |    72.1 |    61.11 |     100 |    72.1 | ...63,365,372-375 
  ...lScheduler.ts |   80.33 |     80.9 |   93.47 |   80.33 | ...2559,2611-2615 
  geminiChat.ts    |   90.65 |    86.85 |   91.66 |   90.65 | ...1779,1846-1847 
  geminiRequest.ts |     100 |      100 |     100 |     100 |                   
  ...htProtocol.ts |    9.09 |      100 |       0 |    9.09 | 34-42,45-49,52-87 
  logger.ts        |   87.33 |    87.02 |     100 |   87.33 | ...61-565,611-625 
  ...tyDefaults.ts |     100 |      100 |     100 |     100 |                   
  ...olExecutor.ts |   92.59 |       75 |      50 |   92.59 | 41-42             
  ...on-helpers.ts |   85.71 |    70.58 |     100 |   85.71 | ...90-191,205-214 
  ...issionFlow.ts |   98.59 |    94.73 |     100 |   98.59 | 93                
  prompts.ts       |   89.36 |    86.41 |   76.92 |   89.36 | ...-977,1180-1181 
  tokenLimits.ts   |     100 |    89.47 |     100 |     100 | 51-52             
  ...okTriggers.ts |   99.31 |    90.41 |     100 |   99.31 | 124,135           
  turn.ts          |   96.46 |    88.88 |     100 |   96.46 | ...19,432-433,481 
 ...ntentGenerator |   94.92 |    82.59 |   93.87 |   94.92 |                   
  ...tGenerator.ts |   96.48 |    84.28 |   92.59 |   96.48 | ...01,919-923,963 
  converter.ts     |   94.51 |    80.72 |     100 |   94.51 | ...06-607,617,823 
  index.ts         |       0 |        0 |       0 |       0 | 1-21              
  usage.ts         |     100 |      100 |     100 |     100 |                   
 ...ntentGenerator |   91.53 |    71.64 |   93.33 |   91.53 |                   
  ...tGenerator.ts |      90 |    70.96 |   92.85 |      90 | ...80-286,304-305 
  index.ts         |     100 |       80 |     100 |     100 | 50                
 ...ntentGenerator |   93.34 |    80.28 |   90.32 |   93.34 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...tGenerator.ts |   93.32 |    80.28 |   90.32 |   93.32 | ...01,911-912,940 
 ...ntentGenerator |   81.66 |    84.08 |    90.9 |   81.66 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  converter.ts     |   76.88 |    82.25 |    87.5 |   76.88 | ...1589,1610-1616 
  errorHandler.ts  |     100 |      100 |     100 |     100 |                   
  index.ts         |   52.38 |    44.44 |      50 |   52.38 | ...77,81-85,89-93 
  ...tGenerator.ts |    66.4 |    70.58 |   88.88 |    66.4 | ...51-157,168-169 
  pipeline.ts      |   93.67 |     84.9 |     100 |   93.67 | ...80-481,489,554 
  ...ureContext.ts |     100 |      100 |     100 |     100 |                   
  ...ingOptions.ts |       0 |        0 |       0 |       0 | 1                 
  ...CallParser.ts |   90.66 |    88.57 |     100 |   90.66 | ...15-319,349-350 
  ...kingParser.ts |     100 |    96.87 |     100 |     100 | 42                
  types.ts         |       0 |        0 |       0 |       0 | 1                 
 ...rator/provider |   96.83 |    89.55 |   95.65 |   96.83 |                   
  dashscope.ts     |   97.29 |    89.77 |   93.33 |   97.29 | ...81-282,358-359 
  deepseek.ts      |   95.55 |    90.56 |     100 |   95.55 | ...31-132,145-146 
  default.ts       |   95.79 |    89.65 |   88.88 |   95.79 | 122-123,193-195   
  index.ts         |     100 |      100 |     100 |     100 |                   
  minimax.ts       |     100 |      100 |     100 |     100 |                   
  mistral.ts       |   96.07 |    73.33 |     100 |   96.07 | 32-33             
  modelscope.ts    |     100 |      100 |     100 |     100 |                   
  openrouter.ts    |     100 |      100 |     100 |     100 |                   
  types.ts         |       0 |        0 |       0 |       0 |                   
 src/extension     |   60.56 |    79.46 |    78.4 |   60.56 |                   
  ...-converter.ts |   62.35 |    47.82 |      90 |   62.35 | ...90-791,800-832 
  ...ionManager.ts |   47.04 |    82.06 |    65.9 |   47.04 | ...1398,1408-1427 
  ...onSettings.ts |   93.46 |    93.05 |     100 |   93.46 | ...17-221,228-232 
  ...-converter.ts |   54.88 |    94.44 |      60 |   54.88 | ...35-146,158-192 
  github.ts        |   44.94 |    88.52 |      60 |   44.94 | ...53-359,398-451 
  index.ts         |     100 |      100 |     100 |     100 |                   
  marketplace.ts   |   97.29 |    93.75 |     100 |   97.29 | ...64,184-185,274 
  npm.ts           |   48.66 |    76.08 |      75 |   48.66 | ...18-420,427-431 
  override.ts      |   94.11 |    88.88 |     100 |   94.11 | 63-64,81-82       
  settings.ts      |   66.26 |      100 |      50 |   66.26 | 81-108,143-149    
  storage.ts       |     100 |      100 |     100 |     100 |                   
  ...ableSchema.ts |     100 |      100 |     100 |     100 |                   
  variables.ts     |   88.75 |    83.33 |     100 |   88.75 | ...28-231,234-237 
 src/followup      |   55.57 |    84.14 |   81.25 |   55.57 |                   
  followupState.ts |      96 |    89.74 |     100 |      96 | 159-161,218-219   
  index.ts         |     100 |      100 |     100 |     100 |                   
  overlayFs.ts     |   95.06 |       84 |     100 |   95.06 | 78,108,122,133    
  speculation.ts   |   13.02 |      100 |   16.66 |   13.02 | 89-464,524-575    
  ...onToolGate.ts |     100 |    96.42 |     100 |     100 | 94                
  ...nGenerator.ts |    71.6 |    72.13 |   83.33 |    71.6 | ...88-246,316-318 
 src/generated     |       0 |        0 |       0 |       0 |                   
  git-commit.ts    |       0 |        0 |       0 |       0 | 1-10              
 src/goals         |   89.57 |    83.45 |   94.44 |   89.57 |                   
  ...eGoalStore.ts |    85.1 |    95.45 |   84.61 |    85.1 | ...63-166,174-182 
  goalHook.ts      |   97.26 |    91.48 |     100 |   97.26 | 100-105           
  goalJudge.ts     |   84.33 |    74.28 |     100 |   84.33 | ...57-358,366-368 
  index.ts         |     100 |      100 |     100 |     100 |                   
 src/hooks         |   83.48 |    84.87 |   86.83 |   83.48 |                   
  ...okRegistry.ts |   86.48 |    77.08 |     100 |   86.48 | ...41-344,362-369 
  ...bortSignal.ts |     100 |      100 |     100 |     100 |                   
  ...terpolator.ts |   96.66 |    93.33 |     100 |   96.66 | 66-67             
  ...HookRunner.ts |   96.68 |    87.23 |     100 |   96.68 | 110-112,231-233   
  ...Aggregator.ts |    96.4 |    90.78 |     100 |    96.4 | ...91,293-294,367 
  ...entHandler.ts |   94.56 |    83.78 |   93.33 |   94.56 | ...38,795-796,806 
  hookPlanner.ts   |   84.13 |    76.59 |      90 |   84.13 | ...38,144,162-173 
  hookRegistry.ts  |   90.17 |    83.33 |     100 |   90.17 | ...33,352,356,360 
  hookRunner.ts    |   58.56 |    71.26 |   66.66 |   58.56 | ...48-749,758-759 
  hookSystem.ts    |   84.57 |      100 |   65.85 |   84.57 | ...21-622,628-629 
  ...HookRunner.ts |   75.51 |     61.9 |      80 |   75.51 | ...05-406,424-425 
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...HookRunner.ts |   93.63 |    89.47 |      90 |   93.63 | ...45-353,427-428 
  ...SkillHooks.ts |   78.75 |       75 |   66.66 |   78.75 | 62-66,137-152     
  ...oksManager.ts |   96.66 |    91.66 |     100 |   96.66 | ...90,209-210,223 
  ssrfGuard.ts     |   77.22 |    85.36 |     100 |   77.22 | ...57,261-267,273 
  stopHookCap.ts   |     100 |      100 |     100 |     100 |                   
  trustedHooks.ts  |       0 |        0 |       0 |       0 | 1-124             
  types.ts         |   91.21 |    92.04 |   85.71 |   91.21 | ...40-441,501-505 
  urlValidator.ts  |     100 |      100 |     100 |     100 |                   
 src/ide           |   74.28 |    83.39 |   78.33 |   74.28 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  detect-ide.ts    |     100 |      100 |     100 |     100 |                   
  ide-client.ts    |    64.2 |    81.48 |   66.66 |    64.2 | ...9-970,999-1007 
  ide-installer.ts |   89.06 |    79.31 |     100 |   89.06 | ...36,143-147,160 
  ideContext.ts    |     100 |      100 |     100 |     100 |                   
  process-utils.ts |   84.84 |    71.79 |     100 |   84.84 | ...37,151,193-194 
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/lsp           |   41.24 |    52.14 |   51.42 |   41.24 |                   
  ...nfigLoader.ts |   70.27 |    35.89 |   94.73 |   70.27 | ...20-422,426-432 
  ...ionFactory.ts |   42.69 |    79.16 |      50 |   42.69 | ...62-413,419-436 
  ...Normalizer.ts |   23.09 |    13.72 |   30.43 |   23.09 | ...04-905,909-924 
  ...verManager.ts |   25.31 |    62.06 |   41.66 |   25.31 | ...85-704,710-740 
  ...eLspClient.ts |   32.77 |       80 |   17.64 |   32.77 | ...84-288,294-295 
  ...LspService.ts |   48.49 |    67.16 |   65.71 |   48.49 | ...1352,1369-1379 
  constants.ts     |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/mcp           |   78.69 |    75.34 |   75.92 |   78.69 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  ...h-provider.ts |   86.95 |      100 |   33.33 |   86.95 | ...,93,97,101-102 
  ...h-provider.ts |   73.82 |    53.92 |     100 |   73.82 | ...88-895,902-904 
  ...en-storage.ts |   98.62 |    97.72 |     100 |   98.62 | 87-88             
  oauth-utils.ts   |   70.58 |    85.29 |    90.9 |   70.58 | ...70-290,315-344 
  ...n-provider.ts |   89.83 |    95.83 |   45.45 |   89.83 | ...43,147,151-152 
 .../token-storage |   79.52 |    86.66 |   86.36 |   79.52 |                   
  ...en-storage.ts |     100 |      100 |     100 |     100 |                   
  ...en-storage.ts |   82.87 |    82.35 |   92.85 |   82.87 | ...63-173,181-182 
  ...en-storage.ts |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...en-storage.ts |   68.14 |    82.35 |   64.28 |   68.14 | ...81-295,298-314 
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/memory        |      68 |    76.57 |   66.66 |      68 |                   
  const.ts         |     100 |      100 |     100 |     100 |                   
  dream.ts         |   65.65 |    73.33 |      50 |   65.65 | 50,107-148        
  ...entPlanner.ts |   57.84 |    72.72 |   33.33 |   57.84 | ...35,140-147,152 
  entries.ts       |   63.77 |    79.16 |      50 |   63.77 | ...72-180,183-189 
  extract.ts       |    95.2 |    79.16 |     100 |    95.2 | 81-86,125         
  ...entPlanner.ts |   63.08 |    65.71 |   41.17 |   63.08 | ...17,222-223,332 
  ...ionPlanner.ts |       0 |        0 |       0 |       0 | 1                 
  forget.ts        |    45.8 |    61.53 |   44.44 |    45.8 | ...04,211,214-346 
  indexer.ts       |   83.87 |    45.45 |     100 |   83.87 | ...50,56-57,69-70 
  manager.ts       |   75.31 |    81.04 |    75.6 |   75.31 | ...1278,1291-1293 
  memoryAge.ts     |   90.47 |    77.77 |     100 |   90.47 | 50-51             
  paths.ts         |   55.47 |    89.47 |   85.71 |   55.47 | ...,89-90,106-114 
  prompt.ts        |   93.36 |    71.42 |     100 |   93.36 | ...58,161,228-229 
  recall.ts        |   77.54 |    69.38 |   88.88 |   77.54 | ...53-258,282-293 
  ...ceSelector.ts |   91.86 |    77.27 |     100 |   91.86 | ...15,117-118,126 
  scan.ts          |   87.91 |    68.42 |     100 |   87.91 | ...47-48,58,82-87 
  ...entPlanner.ts |    11.5 |      100 |       0 |    11.5 | ...57-192,210-298 
  status.ts        |   10.52 |      100 |       0 |   10.52 | 41-98             
  store.ts         |   94.44 |    83.33 |     100 |   94.44 | 56-57,92-93       
  types.ts         |     100 |      100 |     100 |     100 |                   
  ...ontextFile.ts |   79.38 |    81.03 |   81.81 |   79.38 | ...58-272,286-291 
 src/mocks         |       0 |        0 |       0 |       0 |                   
  msw.ts           |       0 |        0 |       0 |       0 | 1-9               
 src/models        |   89.35 |    85.67 |    87.5 |   89.35 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  ...tor-config.ts |   90.24 |    91.42 |     100 |   90.24 | 142,148,151-160   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...nfigErrors.ts |   74.22 |       44 |   84.61 |   74.22 | ...,67-74,106-117 
  ...igResolver.ts |   98.66 |    92.85 |     100 |   98.66 | 162,324,330       
  modelRegistry.ts |     100 |    98.59 |     100 |     100 | 222               
  modelsConfig.ts  |   84.57 |    82.14 |   81.57 |   84.57 | ...1223,1252-1253 
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/output        |     100 |      100 |     100 |     100 |                   
  ...-formatter.ts |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/permissions   |   74.28 |    88.55 |   57.55 |   74.28 |                   
  autoMode.ts      |   61.59 |    93.54 |   83.33 |   61.59 | ...00-238,340-356 
  ...transcript.ts |      98 |       84 |     100 |      98 | 200-201           
  classifier.ts    |   92.89 |     87.5 |     100 |   92.89 | 146-153,333-337   
  ...erousRules.ts |     100 |    83.87 |     100 |     100 | 101,113,137-143   
  ...alTracking.ts |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...on-manager.ts |   78.26 |    85.24 |   82.14 |   78.26 | ...-916,1022-1026 
  rule-parser.ts   |   95.99 |    93.22 |     100 |   95.99 | ...-864,1013-1015 
  ...-semantics.ts |   58.28 |    85.27 |    30.2 |   58.28 | ...1604-1614,1643 
  types.ts         |     100 |      100 |     100 |     100 |                   
 ...sifier-prompts |   98.18 |       90 |     100 |   98.18 |                   
  system-prompt.ts |   98.18 |       90 |     100 |   98.18 | 150               
 src/prompts       |   83.63 |      100 |    87.5 |   83.63 |                   
  mcp-prompts.ts   |   18.18 |      100 |       0 |   18.18 | 11-19             
  ...t-registry.ts |     100 |      100 |     100 |     100 |                   
 src/qwen          |   83.87 |    77.23 |   95.83 |   83.87 |                   
  ...tGenerator.ts |   98.64 |    98.18 |     100 |   98.64 | 105-106           
  qwenOAuth2.ts    |   80.85 |    70.27 |   90.32 |   80.85 | ...1169-1185,1215 
  ...kenManager.ts |   83.76 |    76.22 |     100 |   83.76 | ...62-767,788-793 
 src/services      |   85.68 |    83.51 |   91.11 |   85.68 |                   
  ...ionTrailer.ts |     100 |      100 |     100 |     100 |                   
  ...llRegistry.ts |   98.44 |    91.83 |     100 |   98.44 | 268-269           
  ...ionService.ts |   98.03 |    96.42 |   85.71 |   98.03 | ...98,700-704,837 
  ...ingService.ts |   83.88 |    83.33 |   83.33 |   83.88 | ...1266,1283-1284 
  ...ttribution.ts |   91.73 |    87.71 |      90 |   91.73 | ...80-685,826-827 
  ...utSlimming.ts |     100 |    96.77 |     100 |     100 | 141,190           
  cronScheduler.ts |   97.56 |    92.98 |     100 |   97.56 | 62-63,77,155      
  ...eryService.ts |   80.43 |    95.45 |      75 |   80.43 | ...19-134,140-141 
  ...oryService.ts |   86.18 |    76.76 |   91.17 |   86.18 | ...1150,1191-1194 
  fileReadCache.ts |     100 |      100 |     100 |     100 |                   
  ...temService.ts |   91.27 |    82.69 |    90.9 |   91.27 | ...94,196,294-301 
  ...ratedFiles.ts |      96 |    88.23 |     100 |      96 | 119-120,146-147   
  gitInit.ts       |     100 |      100 |     100 |     100 |                   
  gitService.ts    |   68.75 |     92.3 |   55.55 |   68.75 | ...12-122,125-129 
  ...reeService.ts |   73.83 |    69.31 |    97.5 |   73.83 | ...1460,1488-1489 
  ...ionService.ts |   98.13 |     97.8 |   95.45 |   98.13 | ...32-333,380-381 
  ...orRegistry.ts |   96.54 |    91.73 |     100 |   96.54 | ...70-471,622-623 
  sessionRecap.ts  |   12.65 |      100 |       0 |   12.65 | 44-150            
  ...ionService.ts |   90.23 |     78.8 |   96.77 |   90.23 | ...1294,1298-1299 
  sessionTitle.ts  |   93.87 |    71.15 |     100 |   93.87 | ...33-236,267-268 
  ...ionService.ts |   81.07 |    77.92 |   89.28 |   81.07 | ...1923,1929-1934 
  ...Estimation.ts |     100 |      100 |     100 |     100 |                   
  ...UseSummary.ts |   94.63 |    88.46 |     100 |   94.63 | ...62-164,214-215 
  ...reeCleanup.ts |   14.56 |      100 |   33.33 |   14.56 | 58-185            
  ...ionService.ts |   84.21 |    79.41 |     100 |   84.21 | ...22-223,239-240 
 ...icrocompaction |   98.05 |     91.8 |     100 |   98.05 |                   
  microcompact.ts  |   98.05 |     91.8 |     100 |   98.05 | ...19,289,293,391 
 src/skills        |    87.5 |    83.86 |   94.23 |    87.5 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...activation.ts |     100 |     93.1 |     100 |     100 | 93,112            
  skill-load.ts    |   92.94 |    81.63 |     100 |   92.94 | ...06,226,238-240 
  skill-manager.ts |   83.31 |    79.66 |   90.32 |   83.31 | ...1120,1127-1131 
  skill-paths.ts   |   86.74 |    77.77 |     100 |   86.74 | ...00-101,106-107 
  symlinkScope.ts  |     100 |      100 |     100 |     100 |                   
  types.ts         |     100 |      100 |     100 |     100 |                   
 src/subagents     |   82.61 |    78.89 |   95.23 |   82.61 |                   
  ...tin-agents.ts |     100 |      100 |     100 |     100 |                   
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...nt-manager.ts |   77.15 |    71.36 |    93.1 |   77.15 | ...1178,1200-1201 
  types.ts         |     100 |      100 |     100 |     100 |                   
  validation.ts    |   92.46 |    95.18 |     100 |   92.46 | 51-56,69-74,78-83 
 src/telemetry     |   74.72 |    86.01 |   78.85 |   74.72 |                   
  config.ts        |     100 |      100 |     100 |     100 |                   
  constants.ts     |     100 |      100 |     100 |     100 |                   
  ...attributes.ts |   98.13 |       88 |     100 |   98.13 | 185-187           
  ...-exporters.ts |   46.37 |      100 |   44.44 |   46.37 | ...85,88-89,92-93 
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...t.circular.ts |       0 |        0 |       0 |       0 | 1-111             
  ...-processor.ts |   93.93 |    90.21 |   94.11 |   93.93 | ...75-280,299-300 
  ...t.circular.ts |       0 |        0 |       0 |       0 | 1-128             
  loggers.ts       |    51.9 |       64 |   57.77 |    51.9 | ...1214,1231-1251 
  metrics.ts       |    74.9 |    82.95 |   74.54 |    74.9 | ...58-978,981-992 
  sanitize.ts      |      80 |    83.33 |     100 |      80 | 35-36,41-42       
  sdk.ts           |   90.45 |    83.56 |   76.92 |   90.45 | ...17-318,338-342 
  ...on-context.ts |     100 |      100 |     100 |     100 |                   
  ...on-tracing.ts |   92.24 |    88.77 |     100 |   92.24 | ...21-424,522-525 
  ...etry-utils.ts |     100 |      100 |     100 |     100 |                   
  ...l-decision.ts |     100 |      100 |     100 |     100 |                   
  ...e-id-utils.ts |     100 |      100 |     100 |     100 |                   
  tracer.ts        |   98.61 |    89.36 |     100 |   98.61 | 53,108            
  types.ts         |   79.17 |    85.83 |   83.33 |   79.17 | ...1149,1152-1181 
  uiTelemetry.ts   |   92.97 |    96.96 |   81.25 |   92.97 | ...93-194,200-207 
 ...ry/qwen-logger |   68.24 |    79.56 |   64.91 |   68.24 |                   
  event-types.ts   |       0 |        0 |       0 |       0 |                   
  qwen-logger.ts   |   68.24 |    79.34 |   64.28 |   68.24 | ...1055,1093-1094 
 src/test-utils    |   93.16 |    95.91 |   76.47 |   93.16 |                   
  config.ts        |     100 |      100 |     100 |     100 |                   
  ...st-helpers.ts |   94.11 |       90 |     100 |   94.11 | 69-70             
  index.ts         |     100 |      100 |     100 |     100 |                   
  mock-tool.ts     |   91.19 |    97.14 |   72.41 |   91.19 | ...38,202-203,216 
  ...aceContext.ts |     100 |      100 |     100 |     100 |                   
 src/tools         |   78.53 |     81.6 |   85.98 |   78.53 |                   
  ...erQuestion.ts |   88.93 |    76.74 |    90.9 |   88.93 | ...39-340,347-348 
  cron-create.ts   |   88.11 |    88.88 |    62.5 |   88.11 | ...,43-44,165-172 
  cron-delete.ts   |   96.82 |      100 |   83.33 |   96.82 | 26-27             
  cron-list.ts     |   96.66 |      100 |   83.33 |   96.66 | 25-26             
  diffOptions.ts   |     100 |      100 |     100 |     100 |                   
  edit.ts          |   81.02 |    84.07 |      75 |   81.02 | ...15-716,826-876 
  ...r-worktree.ts |   82.95 |    67.56 |    87.5 |   82.95 | ...82-185,276-277 
  exit-worktree.ts |   84.23 |    85.96 |   91.66 |   84.23 | ...92-293,298-312 
  exitPlanMode.ts  |   85.09 |    85.71 |     100 |   85.09 | ...60-163,177-189 
  glob.ts          |   90.63 |    88.33 |   84.61 |   90.63 | ...28,171,302,305 
  grep.ts          |   79.19 |    85.71 |   78.94 |   79.19 | ...20,560,569-576 
  ls.ts            |   96.74 |    90.27 |     100 |   96.74 | 176-181,212,216   
  lsp.ts           |   72.77 |    60.09 |   90.32 |   72.77 | ...1211,1213-1214 
  ...nt-manager.ts |   84.36 |    82.74 |   84.21 |   84.36 | ...2099-2103,2142 
  mcp-client.ts    |   33.18 |    77.65 |   66.66 |   33.18 | ...1490,1494-1497 
  mcp-tool.ts      |   90.98 |    88.88 |   96.42 |   90.98 | ...95-596,646-647 
  memory-config.ts |       0 |        0 |       0 |       0 | 1-47              
  ...iable-tool.ts |     100 |    84.61 |     100 |     100 | 102,109           
  monitor.ts       |   91.36 |    83.94 |   88.46 |   91.36 | ...61,574,770-775 
  ...nforcement.ts |   82.44 |       90 |     100 |   82.44 | 174-185,234-247   
  read-file.ts     |   95.09 |    88.75 |      90 |   95.09 | ...99,293-296,299 
  ripGrep.ts       |   94.59 |    85.71 |   93.33 |   94.59 | ...60,463,541-542 
  ...-transport.ts |    6.34 |        0 |       0 |    6.34 | 47-145            
  send-message.ts  |   84.68 |    91.66 |    62.5 |   84.68 | ...,82-90,167-170 
  shell.ts         |   73.05 |    79.66 |   91.42 |   73.05 | ...4216,4265-4271 
  skill-utils.ts   |     100 |      100 |     100 |     100 |                   
  skill.ts         |   88.35 |    91.42 |   86.66 |   88.35 | ...12,416,439-461 
  ...eticOutput.ts |   95.12 |      100 |      80 |   95.12 | 87-88             
  task-stop.ts     |   93.14 |    96.15 |   85.71 |   93.14 | 39-40,54-64       
  todoWrite.ts     |   89.17 |    82.05 |   92.85 |   89.17 | ...41-546,568-569 
  tool-error.ts    |     100 |      100 |     100 |     100 |                   
  tool-names.ts    |     100 |      100 |     100 |     100 |                   
  tool-registry.ts |   74.85 |    76.85 |   80.95 |   74.85 | ...30-831,839-840 
  tool-search.ts   |   95.19 |    86.48 |    92.3 |   95.19 | ...47-153,208-213 
  tools.ts         |   90.49 |    90.19 |   84.21 |   90.49 | ...78-479,495-501 
  web-fetch.ts     |   88.84 |       80 |   92.85 |   88.84 | ...12-313,315-316 
  write-file.ts    |   82.65 |    80.45 |   84.61 |   82.65 | ...65-668,696-731 
 src/tools/agent   |   74.64 |    81.34 |   73.61 |   74.64 |                   
  agent.ts         |    74.9 |     81.6 |   74.24 |    74.9 | ...2390,2399-2402 
  fork-subagent.ts |   69.62 |    71.42 |   66.66 |   69.62 | ...04-105,140-151 
 src/utils         |   88.99 |    87.67 |    93.6 |   88.99 |                   
  LruCache.ts      |       0 |        0 |       0 |       0 | 1-41              
  ...ssageQueue.ts |     100 |      100 |     100 |     100 |                   
  ...cFileWrite.ts |   77.96 |    80.48 |     100 |   77.96 | ...35,156,173-176 
  bareMode.ts      |   27.27 |      100 |       0 |   27.27 | 9-15,18-19        
  browser.ts       |    7.69 |      100 |       0 |    7.69 | 17-56             
  bundlePaths.ts   |     100 |      100 |     100 |     100 |                   
  ...igResolver.ts |     100 |      100 |     100 |     100 |                   
  ...engthError.ts |   89.11 |    87.23 |     100 |   89.11 | ...28-129,132-133 
  cronDisplay.ts   |   42.85 |    23.07 |     100 |   42.85 | 26-31,33-45,47-54 
  cronParser.ts    |   89.74 |    85.71 |     100 |   89.74 | ...,63-64,183-186 
  debugLogger.ts   |    95.9 |    93.84 |   94.73 |    95.9 | 106-107,214-218   
  editHelper.ts    |   93.63 |    83.52 |     100 |   93.63 | ...28-429,463-464 
  editor.ts        |   97.61 |    95.71 |     100 |   97.61 | ...70-271,273-274 
  ...arResolver.ts |   94.28 |    88.88 |     100 |   94.28 | 28-29,125-126     
  ...entContext.ts |     100 |    95.45 |     100 |     100 | 83                
  errorParsing.ts  |    97.7 |    97.05 |     100 |    97.7 | 72-73             
  ...rReporting.ts |   88.46 |       90 |     100 |   88.46 | 69-74             
  errors.ts        |   70.92 |    79.59 |   53.33 |   70.92 | ...03-219,223-229 
  fetch.ts         |   70.18 |    71.42 |   71.42 |   70.18 | ...42,148,161,186 
  fileUtils.ts     |   91.46 |    86.19 |   95.23 |   91.46 | ...1188,1192-1198 
  forkedAgent.ts   |   80.68 |    78.12 |   83.33 |   80.68 | ...39-545,550-556 
  formatters.ts    |   81.81 |       75 |     100 |   81.81 | 15-16             
  ...eUtilities.ts |   89.21 |    86.66 |     100 |   89.21 | 16-17,49-55,65-66 
  ...rStructure.ts |   94.36 |    94.28 |     100 |   94.36 | ...17-120,330-335 
  getPty.ts        |    12.5 |      100 |       0 |    12.5 | 21-34             
  gitDiff.ts       |   92.36 |    79.53 |     100 |   92.36 | ...55-856,928-929 
  ...noreParser.ts |    92.3 |    89.36 |     100 |    92.3 | ...15-116,186-187 
  gitUtils.ts      |   56.66 |    85.71 |      75 |   56.66 | ...2,72-73,97-148 
  iconvHelper.ts   |     100 |      100 |     100 |     100 |                   
  ...rePatterns.ts |     100 |      100 |     100 |     100 |                   
  ...ionManager.ts |     100 |     90.9 |     100 |     100 | 26                
  ...lPromptIds.ts |     100 |      100 |     100 |     100 |                   
  jsonl-utils.ts   |    74.1 |    90.76 |   58.33 |    74.1 | ...23-326,336-342 
  ...-detection.ts |     100 |      100 |     100 |     100 |                   
  ...iagnostics.ts |   96.87 |    91.83 |     100 |   96.87 | 214-219,272       
  ...yDiscovery.ts |    83.9 |    79.36 |     100 |    83.9 | ...16,319,411-414 
  ...tProcessor.ts |   93.63 |       90 |     100 |   93.63 | ...96-302,384-385 
  ...Inspectors.ts |   61.53 |      100 |      50 |   61.53 | 18-23             
  modelId.ts       |   98.95 |    98.18 |     100 |   98.95 | 148               
  ...kerChecker.ts |   88.75 |    85.71 |     100 |   88.75 | 69-70,87-93       
  notebook.ts      |   94.35 |    84.78 |     100 |   94.35 | ...10,122,174-176 
  openaiLogger.ts  |   88.05 |    84.09 |     100 |   88.05 | ...44-146,169-174 
  partUtils.ts     |     100 |    98.61 |     100 |     100 | 206               
  pathReader.ts    |     100 |      100 |     100 |     100 |                   
  paths.ts         |   93.21 |    91.86 |     100 |   93.21 | ...89-390,392-394 
  pdf.ts           |   93.68 |    87.05 |     100 |   93.68 | ...96-297,321-325 
  projectPath.ts   |     100 |      100 |     100 |     100 |                   
  ...ectSummary.ts |   89.39 |    72.41 |     100 |   89.39 | ...37-142,193-196 
  ...tIdContext.ts |     100 |      100 |     100 |     100 |                   
  proxyUtils.ts    |     100 |      100 |     100 |     100 |                   
  ...rDetection.ts |   58.57 |       76 |     100 |   58.57 | ...4,88-89,95-100 
  ...noreParser.ts |   85.45 |    85.18 |     100 |   85.45 | ...59,65-66,72-73 
  rateLimit.ts     |   92.55 |    85.92 |     100 |   92.55 | ...70-272,309-310 
  readManyFiles.ts |   87.96 |    86.95 |     100 |   87.96 | ...05-207,223-234 
  retry.ts         |   89.81 |    88.05 |     100 |   89.81 | ...29,350,357-358 
  ripgrepUtils.ts  |   46.79 |    84.37 |   66.66 |   46.79 | ...45-246,258-335 
  ...sDiscovery.ts |   97.42 |    92.85 |     100 |   97.42 | ...04,182-183,202 
  ...tchOptions.ts |   81.72 |    85.04 |   95.23 |   81.72 | ...11,536,565-574 
  runtimeStatus.ts |    97.5 |    88.57 |     100 |    97.5 | 167-168           
  safeJsonParse.ts |   74.07 |    83.33 |     100 |   74.07 | 40-46             
  ...nStringify.ts |     100 |      100 |     100 |     100 |                   
  ...aConverter.ts |   90.78 |    88.23 |     100 |   90.78 | ...41-42,93,95-96 
  ...aValidator.ts |   94.57 |    80.26 |     100 |   94.57 | ...04,213-216,270 
  ...r-launcher.ts |   76.92 |     91.3 |   66.66 |   76.92 | ...34,136,157-195 
  ...orageUtils.ts |   96.89 |    85.84 |     100 |   96.89 | ...51,367,447,466 
  shell-utils.ts   |   82.93 |    89.89 |     100 |   82.93 | ...1522,1529-1533 
  ...lAstParser.ts |   95.58 |    85.79 |     100 |   95.58 | ...1059-1061,1071 
  ...nlyChecker.ts |   95.75 |    92.39 |     100 |   95.75 | ...00-301,313-314 
  sideQuery.ts     |   98.71 |    97.14 |     100 |   98.71 | 110               
  ...pEventSink.ts |     100 |       80 |     100 |     100 | 61                
  ...tGenerator.ts |     100 |      100 |     100 |     100 |                   
  ...ameContext.ts |     100 |      100 |     100 |     100 |                   
  symlink.ts       |   77.77 |       50 |     100 |   77.77 | 44,54-59          
  ...emEncoding.ts |   96.36 |    91.17 |     100 |   96.36 | 59-60,124-125     
  terminalSafe.ts  |     100 |      100 |     100 |     100 |                   
  ...Serializer.ts |   98.72 |       90 |     100 |   98.72 | 42-43,134,201-203 
  testUtils.ts     |   53.33 |      100 |   33.33 |   53.33 | ...53,59-64,70-72 
  textUtils.ts     |      60 |      100 |   66.66 |      60 | 36-55             
  thoughtUtils.ts  |     100 |    92.85 |     100 |     100 | 71                
  ...-converter.ts |   94.59 |    85.71 |     100 |   94.59 | 35-36             
  tool-utils.ts    |    93.6 |     91.3 |     100 |    93.6 | ...58-159,162-163 
  truncation.ts    |     100 |       92 |     100 |     100 | 52,71             
  windowsPath.ts   |   89.47 |    79.31 |     100 |   89.47 | ...57-58,62,90-91 
  ...aceContext.ts |   93.71 |    89.28 |   93.33 |   93.71 | ...24-225,249-251 
  xml.ts           |     100 |      100 |     100 |     100 |                   
  yaml-parser.ts   |      92 |    84.61 |     100 |      92 | 49-53,65-69       
 ...ils/filesearch |   86.21 |    81.61 |   96.42 |   86.21 |                   
  crawlCache.ts    |     100 |      100 |     100 |     100 |                   
  crawler.ts       |   82.84 |    77.49 |   94.82 |   82.84 | ...1451,1485-1486 
  fileSearch.ts    |   93.58 |    87.32 |     100 |   93.58 | ...46-247,249-250 
  ignore.ts        |     100 |      100 |     100 |     100 |                   
  result-cache.ts  |     100 |     92.3 |     100 |     100 | 46                
 ...uest-tokenizer |   56.63 |    74.52 |   74.19 |   56.63 |                   
  ...eTokenizer.ts |   41.86 |    76.47 |   69.23 |   41.86 | ...70-443,453-507 
  index.ts         |     100 |      100 |     100 |     100 |                   
  ...tTokenizer.ts |   68.39 |    69.49 |    90.9 |   68.39 | ...24-325,327-328 
  ...ageFormats.ts |      76 |      100 |   33.33 |      76 | 45-48,55-56       
  textTokenizer.ts |     100 |      100 |     100 |     100 |                   
  types.ts         |       0 |        0 |       0 |       0 | 1                 
-------------------|---------|----------|---------|---------|-------------------

For detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run.

LaZzyMan added a commit that referenced this pull request May 15, 2026
The Task 11 redesign updated the non-interactive text formatter
(formatContextUsageText) but left ContextUsage.tsx — the interactive
React component that real /context users see — unchanged. As a result
the TUI still showed the old single "Autocompact buffer" line and none
of the new warn/auto/hard ladder.

Adds a "Compaction thresholds" section after the per-category breakdown:
  - Effective window
  - Warn / Auto / Hard threshold rows with a ▶ marker on the row the
    current usage has crossed
  - Current tier label coloured by severity (safe→green, warn/auto→
    yellow, hard→red)

The existing progress bar legend (Used / Free / Autocompact buffer)
is preserved because it's tied to the three-segment progress bar
visualisation; the new section adds the absolute numbers + tier badge
on top of that.

Caught by the tmux e2e test (PR #4168 ci-monitor follow-up). Pre-fix
the assertion 'Compaction thresholds' missed completely from the TUI;
post-fix the new section renders correctly for fresh and live sessions
on 1M / 200K / 128K windows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LaZzyMan LaZzyMan marked this pull request as ready for review May 15, 2026 09:59
@LaZzyMan

Copy link
Copy Markdown
Collaborator Author

E2E 测试报告

跟进 review/CI 反馈,针对核心功能补做了多轮 E2E 验证。最终覆盖矩阵:

✅ 真实模型 E2E(最重要)

配置: qwen3.6-plus (1M 窗口) · tmux 交互模式 · 3 轮对话

阶段 Used tokens 说明
Auto threshold 967,000 computeThresholds(1M).auto
Turn 1 后(读 ~3.5MB TS 源码 inline) 908,128 (90.8%) 接近阈值但未跨过
Turn 2 发起(加载更多内容) 估算超 967K → 触发压缩 cheap-gate 工作正确
TUI 压缩 banner compressed from: 908128 to 20742 tokens 真实 reduction
Turn 2 后 /context 20,742 (2.1%) · tier=safe history 真的变小
Turn 3 任意 prompt 模型正常回答 压缩后对话不破

实测 97.7% 上下文 reduction,所有 6 项断言通过:

  1. Used 跨过 Auto threshold(precondition met)
  2. 压缩事件触发(UI banner 出现)
  3. Y < X 且 Y << X(不是假压缩)
  4. 压缩后 Used 远低于 auto
  5. tier 回到 safe
  6. 后续 turn 正常响应

✅ TUI /context 三层阈值显示

tmux e2e 一开始发现 ContextUsage.tsx React 组件没跟 Task 11 一起更新——TUI 仍显示旧的单行 Autocompact buffer。已在 commit 378635550 修复。修复后 TUI 渲染:

```
Compaction thresholds
Effective window 980.0k tokens
Warn threshold 947.0k tokens
Auto threshold 967.0k tokens
Hard threshold 977.0k tokens
Current tier safe
```

带 ▶ marker 标记当前跨越的阈值,tier 按 severity 着色(safe→绿 / warn,auto→黄 / hard→红)。

✅ Headless smoke

node dist/cli.js \"...\" --approval-mode yolo --output-format json 正常运行,输出 type: system/assistant/result 完整 JSON 流。

⚠️ 仅 unit test 覆盖、未跑真实模型 E2E

以下场景因构造真实触发条件成本高(需要让真实 sideQuery 反复失败 / 刻意填到 hard / 等)暂未做 E2E,但 unit test 覆盖到位:

  • Failure circuit breakerconsecutiveFailures 计数到 MAX_CONSECUTIVE_FAILURES=3 后 cheap-gate NOOP。geminiChat.test.ts 三个 case 直接验证(容忍 2 次、force 不计数、成功重置)
  • Hard-tier rescueeffectiveTokens >= hard 时 force=true 压缩 + 重置熔断器。geminiChat.test.ts 三个 case(force 触发、counter 重置、below-hard 不触发)
  • MAX_TOKENS guard — sideQuery output 达到 20K cap 时 NOOP 防止持久化截断 summary。chatCompressionService.test.ts 直接断言 status 为 NOOP + warn log
  • v4 nested deprecation warning — 用户在 model.chatCompression.contextPercentageThreshold 处使用废弃字段时启动 stderr warning。config.test.ts 3 个 case 覆盖(present warns / absent no-warn / other field no-warn)

如有 reviewer 觉得需要补真实场景 E2E,可以继续;但 unit test 已严格覆盖且 cost/benefit 边际收益递减。

🤖 Generated with Claude Code using the `e2e-testing` skill.

);
return {
newHistory: null,
info: {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Truncated summary returns NOOP — circuit breaker never trips

When compression output hits the 20K COMPACT_MAX_OUTPUT_TOKENS cap, compress() returns CompressionStatus.NOOP. Since isCompressionFailureStatus() does not match NOOP, consecutiveFailures is never incremented. If the model consistently produces max-length summaries, every subsequent send wastes an API call on a compaction attempt that will be dropped.

Consider treating the MAX_TOKENS truncation as a recoverable failure (increment the counter without locking) so the breaker can trip after repeated occurrences:

Suggested change
info: {
config
.getDebugLogger()
.warn(
`[chat-compression] summary output reached the ` +
`COMPACT_MAX_OUTPUT_TOKENS cap (${COMPACT_MAX_OUTPUT_TOKENS}); ` +
`dropping potentially-truncated result.`,
);
return {
newHistory: null,
info: {
originalTokenCount,
newTokenCount: originalTokenCount,
compressionStatus: CompressionStatus.FAILURE,
},
};

— glm-5.1 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 181393c — MAX_TOKENS truncation guard now returns COMPRESSION_FAILED_EMPTY_SUMMARY (a failure status), so the consecutive-failure breaker ticks on repeated truncations instead of wasting an API call per send. Unit test updated to assert the new status.

Comment thread packages/core/src/core/geminiChat.ts Outdated
// suppress further auto-compaction since the chat clearly
// can't shrink — trip the breaker to its NOOP threshold so
// subsequent unforced sends short-circuit at the cheap-gate.
self.consecutiveFailures = MAX_CONSECUTIVE_FAILURES;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Reactive failure permanently disables auto-compaction on a single transient error

self.consecutiveFailures = MAX_CONSECUTIVE_FAILURES directly sets the counter to the maximum rather than incrementing. A single transient network error during reactive compression permanently disables auto-compaction until the next hard-threshold crossing resets it. While the comment explains the rationale (forced compression already failed), consider incrementing instead so that N distinct failures are required:

Suggested change
self.consecutiveFailures = MAX_CONSECUTIVE_FAILURES;
self.consecutiveFailures += 1;

Or, if the current behavior is intentional, add a short comment noting that hard-tier rescue is the designated recovery path.

— glm-5.1 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 181393c — reactive failure now does self.consecutiveFailures += 1 instead of = MAX. Comment notes hard-tier rescue as the designated recovery path. Test marks failed reactive compression attempts... updated to assert counter == 1 (not MAX).

Comment thread packages/core/src/core/geminiChat.ts Outdated
DEFAULT_TOKEN_LIMIT;
const { hard } = computeThresholds(contextLimit);
const effectiveTokens = estimatePromptTokens(
this.getHistory(true),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Redundant deep clones + token estimation on every send

The hard-tier rescue calls estimatePromptTokens(this.getHistory(true), ...) (deep clone + full history walk). Then tryCompress → cheap-gate calls estimatePromptTokens(chat.getHistory(true), ...) again (second deep clone + walk). If the cheap-gate passes, compress() calls chat.getHistory(true) a third time.

Every sendMessageStream pays for 2–3 full-history clones and 2 estimation traversals even when no compaction is needed. Consider computing the effective token count once here and passing it into tryCompress as a pre-computed value:

Suggested change
this.getHistory(true),
const effectiveTokens = estimatePromptTokens(
this.getHistory(true),
userContent,
this.lastPromptTokenCount,
resolveSlimmingConfig(chatCompressionSettings).imageTokenEstimate,
);

This also fixes a minor inconsistency: this call uses the default imageTokenEstimate (1600) while the cheap-gate inside tryCompress uses the user's configured value.

— glm-5.1 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 181393c — sendMessageStream now computes effectiveTokens once and threads it through TryCompressOptions.precomputedEffectiveTokens; service.compress skips its own estimation pass when supplied. Also uses resolveSlimmingConfig(chatCompressionSettings).imageTokenEstimate so the rescue and cheap-gate paths see the same value. Steady-state path (count>0) skips the costly getHistory(true) clone since estimatePromptTokens only needs the user message in that branch — drops the per-send clone count from 2–3 to 1.

const pendingUserMessage = opts.pendingUserMessage;
const effectiveTokens = pendingUserMessage
? estimatePromptTokens(
chat.getHistory(true),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] computeThresholds() called twice per send + redundant getHistory(true)

computeThresholds(contextLimit) is called here in the cheap-gate, and again in geminiChat.ts hard-tier rescue for the same contextLimit. Similarly, estimatePromptTokens(chat.getHistory(true), ...) clones the full history even when originalTokenCount > 0 (only estimateContentTokens([userMessage]) is needed in that case — the history is unused).

Consider:

  1. Moving computeThresholds into a lazy field on the service (or passing the result from the caller)
  2. Guarding the getHistory(true) call behind the originalTokenCount === 0 branch to avoid the wasted clone
Suggested change
chat.getHistory(true),
const thresholds = computeThresholds(contextLimit);
const effectiveTokens =
originalTokenCount > 0
? estimateContentTokens([pendingUserMessage ?? []])
: estimatePromptTokens(
chat.getHistory(true),
pendingUserMessage,
0,
slimmingConfig.imageTokenEstimate,
);

— glm-5.1 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 181393c alongside the geminiChat-side change — precomputedEffectiveTokens lets sendMessageStream pass the value computed there directly, so the service's cheap-gate no longer recomputes computeThresholds + getHistory(true) + estimatePromptTokens. The steady-state branch (lastPromptTokenCount > 0) also skips history cloning entirely now.

id: 'context-critical',
content:
'Context is almost full! Run /compress now or start /new to continue.',
'Context near hard limit — auto-compact will force on next send. Consider /clear if you want to start fresh.',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Dead code: getContextUsagePercent has zero callers

The three context-* tips were all rewired to use ctx.thresholds directly (as visible here — the getContextUsagePercent(ctx) >= 95 call on the old L39 is replaced by ctx.thresholds.hard). However, getContextUsagePercent itself (defined at L41) is no longer called anywhere in the codebase but remains exported from ./index.ts. Consider removing it.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 181393cgetContextUsagePercent and its index.ts re-export removed. Confirmed no callers in source (grep clean across packages/cli/src and packages/core/src).

* `breakdown.thresholds` + `breakdown.currentTier`, which the context command
* derives from `computeThresholds()` in core.
*/
const CompactionThresholds: React.FC<{

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] New CompactionThresholds/ThresholdRow components have no tests

The new ~60 lines of JSX (ThresholdRow at L149, CompactionThresholds at L202, tierColor switch at ~L158) that render the three-tier ladder with color-coded tier labels and isCurrent arrow indicators are untested. No test file exists for ContextUsage.tsx, and this PR adds none. While the underlying data calculations are covered in contextCommand.test.ts, the rendering behavior (tier color mapping, arrow positioning, conditional visibility) is not verified in CI.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Partially fixed in 181393c — added packages/cli/src/ui/components/views/ContextUsage.test.tsx with 4 ink-testing-library cases covering the new Compaction-thresholds section: header + 4 threshold rows render, ▶ marker placement per current tier (safe/warn/hard), and the colored tier label. Skipped a true snapshot because the precise frame layout drifts with terminal width — string-match assertions on labels/marker presence are more durable.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two additional observations that don't map to specific diff lines:

  1. Missing test: contextPercentageThreshold setting is silently ignored. The config.ts deprecation warns on stderr, but no test verifies the behavioral change — that setting contextPercentageThreshold: 0 (which previously disabled auto-compaction) now has no effect. Consider adding a test pinning this.

  2. tierTokens = 0 when isEstimated shows misleading "safe" tier. In contextCommand.ts, when no API data exists yet (first render, --continue), tierTokens = 0 makes currentTier always 'safe' even for large inherited history. Consider using estimated overhead as the tier input when isEstimated.

— mimo-v2.5-pro via Qwen Code /review

* Average bytes-per-token for char-based token estimation.
* Matches claude-code's roughTokenCountEstimation default (tokens.ts).
*/
export const BYTES_PER_TOKEN = 4;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] BYTES_PER_TOKEN is a misleading name — the value divides character counts (from estimateContentCharsstring.length), not byte counts. For CJK text (3 bytes/char UTF-8), the name actively misleads. The adjacent module compactionInputSlimming.ts correctly names the identical ratio TOKEN_TO_CHAR_RATIO = 4.

Suggested change
export const BYTES_PER_TOKEN = 4;
/**
* Average characters-per-token for char-based token estimation.
* Matches the inverse of TOKEN_TO_CHAR_RATIO in compactionInputSlimming.ts.
*/
export const CHARS_PER_TOKEN = 4;

Then update the two usages (Math.ceil(totalChars / CHARS_PER_TOKEN) at line 39 and the return at line 66).

— mimo-v2.5-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 181393c — renamed BYTES_PER_TOKENCHARS_PER_TOKEN (the inputs are character counts from string.length, not byte counts; old name misleads on CJK). Doc updated to reference TOKEN_TO_CHAR_RATIO in compactionInputSlimming.ts as the inverse.

if (
!isSummaryEmpty &&
typeof compressionOutputTokenCount === 'number' &&
compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Off-by-one in truncation guard: >= COMPACT_MAX_OUTPUT_TOKENS rejects summaries that are exactly 20,000 tokens. Since maxOutputTokens is set to the same value, a model that produces a valid summary at exactly the cap limit gets dropped. The > operator is more appropriate — it catches outputs that exceeded the cap (impossible with the API budget, but defensive), while allowing outputs that landed exactly at the limit.

Suggested change
compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS
compressionOutputTokenCount > COMPACT_MAX_OUTPUT_TOKENS

Note: the dropped NOOP does NOT trip the circuit breaker (NOOP is not a failure status in isCompressionFailureStatus), so the worst case is a retry loop rather than permanent disablement. Still worth fixing to avoid unnecessary retry cycles.

— mimo-v2.5-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Declined — the >= operator is intentionally conservative: with maxOutputTokens=20K, p99.99 of real summaries is ~17K (per claude-code data), so landing exactly at the cap is far more likely truncation than a clean stop. > would make the guard effectively dead (the API can't return more than the cap). With R1.1 now in place (this batch), persistent truncation trips the breaker after MAX_CONSECUTIVE_FAILURES, so the worst-case wasted-retry cost is bounded. The proper long-term fix is plumbing finish_reason through runSideQuery, which is out of scope here.

estimateContentTokens([userMessage], imageTokenEstimate)
);
}
return estimateContentTokens([...history, userMessage], imageTokenEstimate);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] When lastPromptTokenCount === 0 (first send after --continue or inherited history), the fallback estimates only history + userMessage. It misses system prompt (~8-15K tokens), tool definitions (~5K), skill content, and cached content. This underestimates by ~15-20K tokens, which could cause the hard-tier rescue to not fire when it should.

The docstring correctly warns "using it to SKIP compaction is not [safe]" and the reactive overflow is the safety net, but closing the gap would improve first-send behavior. Consider adding a configurable or estimated overhead baseline:

Suggested change
return estimateContentTokens([...history, userMessage], imageTokenEstimate);
// Fallback: estimate from history + user message. Note this underestimates
// by ~15-20K tokens (system prompt, tool definitions, skills) — the reactive
// overflow handler is the safety net if the hard-tier rescue misses.
return estimateContentTokens([...history, userMessage], imageTokenEstimate);

— mimo-v2.5-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Partially fixed in 181393c — added an explanatory comment on the fallback branch documenting the ~15-20K under-estimate (system prompt + tool definitions + skills + cache headers) and that reactive overflow is the safety net. Skipped adding a magic overhead constant because the actual overhead is per-config (depends on tools loaded, skills active) and a fixed value would be a different kind of guess.

});

it('clears hasFailedCompressionAttempt after a forced successful compression', async () => {
it('forwards the pending user message to the compression cheap-gate', async () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This test mocks compress entirely, so the real estimatePromptTokens(history, pendingUserMessage, 0) full-history fallback path is never exercised end-to-end. The "first send after --continue / sub-agent inherited history" scenario — where lastPromptTokenCount === 0 and only the full-history estimate can cross the auto threshold — has no integration test.

Consider adding a test that uses the real ChatCompressionService (not a mock) with a GeminiChat seeded with large inherited history and lastPromptTokenCount = 0, then asserts that sendMessageStream triggers compaction.

— mimo-v2.5-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 181393c — added triggers compaction end-to-end through the real ChatCompressionService when lastPromptTokenCount === 0 and inherited history is large to geminiChat.test.ts. Uses the real service (no vi.spyOn on compress.prototype), seeds 400K chars of history, sets lastPromptTokenCount=0, mocks runSideQuery at the baseLlmClient layer, and asserts the stream emits a COMPRESSED event.

expect(estimateContentTokens([c], 1600)).toBe(1600);
});

it('estimates functionCall (json-dense) contributes some positive count', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] estimateContentTokens has a test for functionCall but no test for functionResponse, which has a distinct branch in estimateContentChars (nested parts walk, 64-char floor for wrapper metadata). Tool-heavy conversations are the exact scenario where context grows fastest.

Suggested change
it('estimates functionCall (json-dense) contributes some positive count', () => {
it('estimates functionResponse (json-dense) contributes some positive count', () => {
const c: Content = {
role: 'user',
parts: [{ functionResponse: { name: 'tool', response: { result: 'data'.repeat(100) } } }],
};
const result = estimateContentTokens([c]);
expect(result).toBeGreaterThan(0);
});
it('estimates functionCall (json-dense) contributes some positive count', () => {

— mimo-v2.5-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 181393c — added estimates functionResponse (nested parts) contributes some positive count to tokenEstimation.test.ts. Tool-heavy conversations were the gap.

LaZzyMan added a commit that referenced this pull request May 18, 2026
Adds a defensive guard in ChatCompressionService.compress() that detects
when the side-query summary hit COMPACT_MAX_OUTPUT_TOKENS (20K). In that
case the summary is likely truncated mid-content, so we drop it and
return NOOP rather than persist a half-summary. The next send re-tries;
reactive overflow still catches the catastrophic case where the API
rejects the next request as too large.

Documented in the design doc as risk #2; the bot reviewer on PR #4168
correctly pushed for it to land alongside the threshold redesign rather
than as a follow-up since the new 20K cap is what makes truncation
likely in the first place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 18, 2026
The Task 11 redesign updated the non-interactive text formatter
(formatContextUsageText) but left ContextUsage.tsx — the interactive
React component that real /context users see — unchanged. As a result
the TUI still showed the old single "Autocompact buffer" line and none
of the new warn/auto/hard ladder.

Adds a "Compaction thresholds" section after the per-category breakdown:
  - Effective window
  - Warn / Auto / Hard threshold rows with a ▶ marker on the row the
    current usage has crossed
  - Current tier label coloured by severity (safe→green, warn/auto→
    yellow, hard→red)

The existing progress bar legend (Used / Free / Autocompact buffer)
is preserved because it's tied to the three-segment progress bar
visualisation; the new section adds the absolute numbers + tier badge
on top of that.

Caught by the tmux e2e test (PR #4168 ci-monitor follow-up). Pre-fix
the assertion 'Compaction thresholds' missed completely from the TUI;
post-fix the new section renders correctly for fresh and live sessions
on 1M / 200K / 128K windows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 18, 2026
Behavior fixes:
- MAX_TOKENS truncation guard now returns COMPRESSION_FAILED_EMPTY_SUMMARY
  instead of NOOP so the consecutive-failure breaker actually trips after
  repeated max-length summaries (R1.1).
- Reactive overflow failure increments consecutiveFailures by 1 instead
  of latching to MAX in one shot, so a transient network blip doesn't
  permanently disable auto-compaction. The hard-tier rescue resets the
  counter, which remains the designated recovery path (R1.2).
- /context current-tier classification uses rawOverhead (system + tools +
  memory + skills) as the tier input when API data is not yet available,
  rather than 0 — large inherited contexts no longer silently show 'safe'
  (R2.2).

Performance:
- sendMessageStream computes effectiveTokens ONCE and passes it through
  TryCompressOptions.precomputedEffectiveTokens, so the cheap-gate inside
  service.compress doesn't redo the estimation. Also fixes the
  imageTokenEstimate inconsistency between the rescue and cheap-gate
  paths (R1.3 + R1.4).
- Steady-state path (lastPromptTokenCount > 0) skips the costly
  getHistory(true) clone — estimatePromptTokens only needs the user
  message in that branch.

Code hygiene:
- BYTES_PER_TOKEN → CHARS_PER_TOKEN (inputs are char counts, not byte
  counts; CJK text would mislead under the old name) (R3.1).
- Drop dead getContextUsagePercent helper + index re-export — no callers
  in source after the threshold rewire (R1.5).
- Add a comment on estimatePromptTokens' first-send fallback documenting
  the ~15-20K under-estimate (system prompt + tools + skills) and that
  reactive overflow is the safety net (R3.3).

Tests:
- New CLI ContextUsage.test.tsx exercises the React renderer for the
  three-tier section: section presence, ▶ marker placement per tier,
  current-tier label coloring (R1.6).
- New chatCompressionService.test.ts case pins that a stale
  contextPercentageThreshold: 0 value in user settings no longer
  short-circuits compaction (R2.1).
- New tokenEstimation.test.ts case covers functionResponse (distinct
  nested-parts branch from functionCall) (R3.5).
- New geminiChat.test.ts integration test exercises the real
  ChatCompressionService — not a mock — for the first-send-after-
  inherited-history scenario where lastPromptTokenCount=0 and only the
  full-history estimate can cross the auto threshold (R3.4).

Declined: R3.2 (change `>=` to `>` on the MAX_TOKENS guard). The current
operator catches the at-cap case as suspicious, which is intentional —
landing exactly at the output cap is far more likely truncation than
clean stop given p99.99 ≈ 17K. With R1.1 in place, persistent truncations
trip the breaker after MAX_CONSECUTIVE_FAILURES so the worst case is
bounded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LaZzyMan LaZzyMan force-pushed the lazzy/trusting-grothendieck-8a8501 branch from 3786355 to 181393c Compare May 18, 2026 02:55
@LaZzyMan

Copy link
Copy Markdown
Collaborator Author

Review batch 4 — commit 181393c49

Rebased onto current main (3 conflict files resolved) and addressed the new wenshao-channel review.

Outcomes per finding

# Outcome One-line
R1.1 truncation→NOOP doesn't trip breaker ✅ Fixed guard returns COMPRESSION_FAILED_EMPTY_SUMMARY so counter ticks
R1.2 reactive failure latches breaker ✅ Fixed consecutiveFailures += 1 instead of = MAX
R1.3 redundant deep clones + double estimation ✅ Fixed precomputedEffectiveTokens threaded through opts; steady-state path skips getHistory(true)
R1.4 computeThresholds × 2 + redundant getHistory ✅ Fixed same change as R1.3 + resolved imageTokenEstimate consistency
R1.5 getContextUsagePercent dead code ✅ Fixed removed, no callers in source
R1.6 new TUI components untested ✅ Partially added ContextUsage.test.tsx with 4 ink-testing-library cases (header, tier markers, color)
R2.1 missing test for deprecated field ignored ✅ Fixed added unit test asserting contextPercentageThreshold: 0 no longer disables compaction
R2.2 isEstimated → misleading 'safe' ✅ Fixed tier classification uses rawOverhead (not 0) when API data absent
R3.1 BYTES_PER_TOKEN misleading on CJK ✅ Fixed renamed to CHARS_PER_TOKEN; doc updated
R3.2 >= should be > ❌ Declined guard intentionally conservative — at-cap is more often truncation than clean stop; R1.1 bounds worst case
R3.3 first-send fallback underestimates ✅ Partially added comment documenting the ~15-20K under-estimate + reactive safety net; declined the magic-overhead constant
R3.4 first-send-after-continue not integration-tested ✅ Fixed added real-service integration test in geminiChat.test.ts
R3.5 functionResponse not tested ✅ Fixed added estimateContentTokens test for nested-parts branch

Verification

  • Affected unit/integration tests: 172/172 pass (chatCompressionService.test.ts 68 · geminiChat.test.ts 96 · tokenEstimation.test.ts 8)
  • New ContextUsage.test.tsx: 4/4 pass
  • npm run typecheck clean across all workspaces
  • npm run lint clean (the 11 errors on .claude/skills/e2e-testing/scripts/*.js are pre-existing and unrelated to this PR)
  • Rebased onto current main (1c529e4f0); 3 conflict files (geminiChat.ts, chatCompressionService.ts/test.ts) resolved to merge main's bypassTokenThreshold heap-pressure path with this PR's consecutiveFailures breaker — both mechanisms now coexist (heap-pressure bypass overrides the breaker carve-out)

11 inline threads have per-thread gh api replies pointing at this commit. Ready for next pass.

🤖 Triaged via the review-response skill.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

总评

Reviewed all 22 changed files focusing on chatCompressionService.ts / geminiChat.ts / tokenEstimation.ts / contextCommand.ts / tipRegistry.ts / ContextUsage.tsx 及对应测试与文档。

质量高,可以合并。 computeThresholds 是纯函数 + 覆盖 32K / 64K / 128K / 200K / 1M / 10K-极端 六个边界窗口的单元用例,数学清晰且可证;max(proportional, absolute) 的组合让小窗口自动降级到比例分支、大窗口完全由绝对分支主导;hard = max(rawHard, auto) 的 collapse-to-auto 兜住了小窗口下 hard < auto 的逻辑错。estimatePromptTokens 注释非常诚实地写明了首发 fallback 的 ~15-20K 偏低与 reactive overflow 作为 safety net 的角色,设计-代码-注释一致。steady-state 上 lastPromptTokenCount > 0 时传 []estimatePromptTokens 跳过 getHistory(true) clone 是合理的 hot-path 优化。

最赞的是 R3.4 的端到端集成测试 —— 真 sendMessageStream → tryCompress → 真 ChatCompressionService → 真 cheap-gate → splitter → mocked baseLlmClient → persistence 整条链走通,直接覆盖了历史上最容易出 bug 的 lastPromptTokenCount === 0 分支。

发现(以 inline 评论标注)

  1. 建议:contextCommand.ts:313 的 tier 分类用 rawOverhead,不包含 messagesTokens;一个 --continue 恢复了 100K 历史消息的 session 仍会在 /context 显示 "safe",但下一条 send 立刻在 cheap-gate 触发压缩 —— UI 与 runtime 判断不一致。建议要么从 chat 取 history 复用 estimatePromptTokens,要么收紧注释 scope。
  2. 建议:chatCompressionService.ts:553>= COMPACT_MAX_OUTPUT_TOKENS 截断判据是 heuristic,正好 20K 的合法 summary 会被误判为截断;同时该路径复用 COMPRESSION_FAILED_EMPTY_SUMMARY 会让 telemetry 分不清 prompt 质量问题 vs 容量问题。建议:挂 TODO(finish_reason) + 加 COMPRESSION_FAILED_OUTPUT_TRUNCATED 子状态。
  3. 建议:geminiChat.ts:752 hard-rescue 在 tryCompress(force=true) 前 reset counter,而 force=true 路径在 service 失败分支又 skip 自增 —— 结果 hard-rescue 失败永远不累加 counter,只能靠 reactive overflow 兜底。语义合理但字段命名误导,在 consecutiveFailures 字段 JSDoc 加一行说明。
  4. 应修:CompressOptions.hasFailedCompressionAttempt: booleanconsecutiveFailures: number 是 SDK breaking change,PR description / release notes 里目前没列出。

风险审计声明

我反向审计了:

  • hard-rescue 失败的无限循环可能 —— ✅ 不会,失败后 API 仍发出,reactive overflow 接管;
  • COMPACT_MAX_OUTPUT_TOKENS = 20K 永久压不下的可能 —— ⚠️ 三次 false-positive 截断会熔断,见发现 #2;
  • 跨 provider 行为一致性 —— ✅ thinking 关 + maxOutputTokens 钉死把不确定性最小化(Anthropic thinking budget / OpenAI reasoning tokens / Gemini 模型差异都被规避);
  • consecutiveFailures 在 force / heap-pressure / reactive 各路径的累计边界 —— ✅ 看了 R1.2 / R1.4 注释,逻辑闭环,但需要发现 #3 的注释补充;
  • /context UI 与 cheap-gate 的同源性 —— ⚠️ 发现 #1;
  • 兼容性 —— ⚠️ 发现 #4

docs/plans/2026-05-14-auto-compaction-threshold-redesign.md 1752 行在 PR 里也注意到了 —— 风格层面,plan doc 在 tree 里的体量偏大,但因为不进运行时,不阻断合并。

// should not silently show "safe" just because the API hasn't been hit.
// The estimate is a lower bound (excludes message body until first turn)
// so the tier may under-classify, but never over-classifies. (R2.2)
const tierTokens = isEstimated ? rawOverhead : apiTotalTokens;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/context 估算分支可能仍误报 "safe":这条注释说的目标是 "restored session with 800K of inherited history should not silently show safe",但 rawOverhead 只覆盖 system prompt / builtin tools / mcp tools / memory / skills,不包含 messagesTokens

实际后果:一个 --continue 恢复了 100K 历史消息(不是 memory/skills overhead)的 session,首发前跑 /context 仍然显示 safe,但下一条 send 立刻在 cheap-gate 触发压缩 —— UI 与 runtime 的判断不一致。

cheap-gate 在 chatCompressionService.tsestimatePromptTokens(history, pendingUserMessage, lastPromptTokenCount=0) 才能正确捕获,这里用的 rawOverhead 是完全不同的数据源。

建议二选一:

  • (a) 把 chat history 透传给 collectContextData,在估算分支调用 estimatePromptTokens(history, undefined, 0, imageTokenEstimate),与 cheap-gate 同源;
  • (b) 收紧注释,明说 "这条 fix 只覆盖 overhead-heavy 场景(memory / skills / MCP),message-heavy 场景(--continue 恢复)仍要等首发 cheap-gate 才会被纠正"。

R3.4 的端到端测试也佐证了这点:那个用例用 chat.setHistory([400K 字符的 user 消息, ...]) + lastPromptTokenCount === 0,cheap-gate 能正确触发压缩 —— 但同一时刻 /context 会显示 safe。

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Partially fixed in 681b705 — went with reviewer's option (b): tightened the inline comment to scope the fix to overhead-heavy sessions, and added a TODO to plumb chat history into collectContextData for same-source-of-truth as the cheap-gate. Option (a) requires adding a parameter to collectContextData (Config doesn't expose the active chat today), which is a bigger plumbing change deferred.

// perspective a truncated summary is unusable just like an empty
// one. `isCompressionFailureStatus()` returns true for this enum,
// so non-force callers will tick the consecutive-failure counter.
compressionStatus: CompressionStatus.COMPRESSION_FAILED_EMPTY_SUMMARY,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

输出截断 guard 有两个可以加强的点:

(1) 判据用 finish_reason 而不是 >= 20K(L553):compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS 是一个 heuristic —— 恰好等于 20K 的合法 summary 也会被判成 truncated → 计 failure → 三次后熔断。PR description 里也承认 "A finish_reason === MAX_TOKENS NOOP guard would be a sensible follow-up"。至少在这里挂一个 TODO(finish_reason) 注释锚住跟进项:

// TODO: switch to finish_reason === MAX_TOKENS when sideQuery surfaces it
// — the current >= cap heuristic false-positives on legitimate 20K summaries.

(2) 复用 COMPRESSION_FAILED_EMPTY_SUMMARY 会模糊 telemetry(L572):空 summary(prompt 质量问题,可能要调 prompt / 切分点)和 truncated(容量问题,要调 cap 或 splitter)是两个完全不同的运营信号。合并到同一 enum 值后,从日志 / telemetry 看不出到底是哪一类。

建议加一个 COMPRESSION_FAILED_OUTPUT_TRUNCATED,让 isCompressionFailureStatus() 同样返回 true(对持久化层等价),但日志和 telemetry 能区分这两种失败模式。

注释 L562 已经写了 "Reuse the empty-summary status: from the persistence layer's perspective a truncated summary is unusable just like an empty one" —— 是合理的近似,但代价是 observability 损失。

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 681b705 (both parts): (1) added TODO(finish_reason) comment at the truncation guard documenting that >= cap is a heuristic awaiting runSideQuery to surface finish_reason; (2) added CompressionStatus.COMPRESSION_FAILED_OUTPUT_TRUNCATED enum value distinct from EMPTY_SUMMARY so telemetry can separate prompt-quality failures from capacity failures. isCompressionFailureStatus() returns true for both, so persistence/breaker behaviour is unchanged. Updated the truncation test to expect the new status.

);
const shouldForceFromHard = effectiveTokens >= hard;
if (shouldForceFromHard) {
this.consecutiveFailures = 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-rescue 与熔断计数器的实际语义需要写明:这里在调 tryCompress(force=true) 之前先把 counter 清 0;而 force=true 路径在 failure 分支又跳过this.consecutiveFailures += 1(chatCompressionService.ts 那一侧 if (!force) 守卫)。

两者组合的后果:hard-rescue 反复失败时,consecutiveFailures 永远不会累加,熔断对这条路径完全无效 —— 只能靠 reactive overflow 兜底,而 reactive overflow 自己也只会 +1 一次(那段代码也是 force=true,作为补偿才显式 self.consecutiveFailures += 1)。

这是合理的 fail-open 设计(hard 已经预测会爆,继续退让最有意义),但 consecutiveFailures 字段的字面含义实际语义不一致:它实际是 "非 force、非 hard-rescue 的连续失败次数",而不是字面意义上的所有失败。

建议在 GeminiChat.consecutiveFailures 的 JSDoc(geminiChat.ts:459 附近)或这里的 reset 处加 1-2 行注释明说:

// Hard-rescue is a fail-open exception to the breaker: we reset the counter
// because the runtime decided overflow is imminent regardless of recent
// failure history. Combined with `if (!force)` in the service's failure
// branch, hard-rescue failures never accumulate — reactive overflow is the
// real safety net for this path (it explicitly bumps the counter by 1).
this.consecutiveFailures = 0;

这样未来调试 "为什么 hard 一直触发但 counter 是 0" 时不至于困惑。

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 681b705 — expanded the consecutiveFailures field JSDoc on GeminiChat to spell out the real semantics: it tracks "non-force, non-hard-rescue consecutive failures". Listed each path's interaction with the counter (auto +1, manual /compress skipped, hard-rescue resets BEFORE force=true), and called out reactive overflow as the actual safety net (it explicitly bumps the counter by +1). Future debug-time confusion about "why is hard-rescue firing but counter is 0" now has a one-line answer in the field doc.

* force=true call resets it.
*/
hasFailedCompressionAttempt: boolean;
consecutiveFailures: number;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDK breaking change 需要补到 release notes:hasFailedCompressionAttempt: booleanconsecutiveFailures: numberCompressOptions 上是字段名 + 类型双变。CompressOptions 通过 ChatCompressionService 间接暴露在 @qwen-code/qwen-code-core 的 public surface,下游 SDK 直接调 service.compress({ ..., hasFailedCompressionAttempt: true }) 的代码会拿到 TS 编译错误,且语义也变了(布尔指示 vs 计数累加,默认值 0 不等价于 false 在某些边界上)。

PR description 的 "Breaking changes / migration notes" 段落里列了 contextPercentageThreshold 移除,但没提这条。建议补一行(并在 release notes / CHANGELOG 同步):

  • CompressOptions.hasFailedCompressionAttempt: boolean 重命名为 consecutiveFailures: number。SDK 消费者需要从 "传 true 表示已失败" 改为 "传当前累计失败次数(通常由 GeminiChat 维护)"。语义变化:true 旧含义是 "永久禁用 auto",新的 >= MAX_CONSECUTIVE_FAILURES 等价。

这条改动我审计了一下使用面:仓库内只有 GeminiChat.tryCompress 一个调用方在传这个字段,所以内部 migration 风险很低;但 core 包的 d.ts 是会发出去的,对外仍是 breaking。

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 681b705 — added a dedicated SDK-Breaking-change subsection to the design doc (docs/design/auto-compaction-threshold-redesign.md) covering the CompressOptions.hasFailedCompressionAttempt: booleanconsecutiveFailures: number rename with a side-by-side semantics table and migration guide (trueMAX_CONSECUTIVE_FAILURES, false0). The PR description's release-notes block is sourced from the design doc, so this propagates.

LaZzyMan added a commit that referenced this pull request May 18, 2026
- R5.1: tighten /context tier comment + TODO. The rawOverhead-based fix
  doesn't cover `--continue` restores with many history messages (since
  rawOverhead excludes messagesTokens). UI may still show 'safe' for one
  render until the first send. Documented inline and added a TODO to plumb
  chat history into collectContextData for same-source-of-truth as the
  cheap-gate.
- R5.2a: add TODO(finish_reason) at the truncation guard. The `>= cap`
  heuristic false-positives on legitimate at-cap summaries; the proper
  signal is finish_reason which runSideQuery doesn't surface today.
- R5.2b: split telemetry — new CompressionStatus.COMPRESSION_FAILED_OUTPUT_TRUNCATED
  enum value. Distinct from EMPTY_SUMMARY so logs/telemetry can tell
  prompt-quality failures (tune prompt / splitter) from capacity failures
  (raise cap / shrink splitter input). isCompressionFailureStatus()
  treats both as failures so the breaker behavior is unchanged.
- R5.3: expand consecutiveFailures JSDoc to clarify it tracks
  "non-force, non-hard-rescue consecutive failures" — hard-rescue resets
  the counter and force=true skips increments, so the counter is the
  "regular path" health signal only; reactive overflow is the real
  safety net for the force-only paths.
- R5.4: document the CompressOptions field rename
  (hasFailedCompressionAttempt: boolean → consecutiveFailures: number)
  as an SDK breaking change in the design doc with migration guide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LaZzyMan

Copy link
Copy Markdown
Collaborator Author

Review batch 5 — commit 681b70501

谢谢 high-quality review。"质量高,可以合并" 这条评价对 batch 4 是莫大的鼓励。这次 batch 5 全部 4 项 finding 均已处理(无 declined)。

# Outcome One-line
R5.1 /context tier 漏 messagesTokens ✅ Partial 选 path (b): 收紧注释 + TODO;path (a) 需要改 Config 接口暴露 chat instance,是更大改动
R5.2a >= heuristic 应基于 finish_reason ✅ Fixed TODO(finish_reason) 锚定 follow-up
R5.2b 截断与空 summary 模糊 telemetry ✅ Fixed 新增 CompressionStatus.COMPRESSION_FAILED_OUTPUT_TRUNCATED 枚举值
R5.3 hard-rescue counter 语义不一致 ✅ Fixed consecutiveFailures JSDoc 扩写,列明所有路径的累加/重置语义
R5.4 SDK breaking change 未列 ✅ Fixed design doc Breaking-change 章节补全 `hasFailedCompressionAttempt` → `consecutiveFailures` 含迁移指引

Verification

  • typecheck 跨 4 workspace clean
  • core 测试 172/172 pass (chatCompressionService.test.ts · geminiChat.test.ts · tokenEstimation.test.ts)
  • cli 测试 15/15 pass (contextCommand.test.ts · ContextUsage.test.tsx · tipRegistry.test.ts)
  • 新增 COMPRESSION_FAILED_OUTPUT_TRUNCATED enum 值附加在 NOOP 之后(不 shift 现有 enum 数值,二进制兼容)

Reviewer 风险审计回应

针对 reviewer 总评中的 ⚠️ 项:

  • Where is the config saved? #2 三次 false-positive 熔断风险:已通过 R5.2b 的 enum 拆分让运维能从 telemetry 区分"提示词质量" vs "容量";finish_reason 路径上线后 R5.2a 的 TODO 可一次清理掉 false-positive
  • pre-release: fix ci #1 /context UI vs cheap-gate 同源性:注释和 TODO 已说明只覆盖 overhead-heavy 场景;message-heavy 在首发 cheap-gate 即纠正(≤1 render lag)
  • Are you interested in AI Terminal? #4 SDK 兼容性:design doc 已含 migration guide,仓库内只 GeminiChat.tryCompress 一个内部调用方,外部影响面小

plan doc 体量

reviewer 提到 docs/plans/2026-05-14-auto-compaction-threshold-redesign.md 1752 行偏大、不进运行时。同意它是一次性产物,仅作为本 PR 的实施记录。如果你认为应该移到 .github/closed-PRs/ 这类归档位置,可以单独提一个 follow-up。

5 个 inline 评论都已在原 thread 单条回复并指向 681b70501

🤖 Triaged via the review-response skill.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage gaps (no specific diff line to anchor):

  • [Suggestion] compress() precomputedEffectiveTokens path (path 1 — skip estimation) has no unit test at the service level. If future refactoring breaks the priority logic, the regression is invisible at this layer. (chatCompressionService.test.ts)
  • [Suggestion] Hard-tier rescue tests verify force + pendingUserMessage are passed but never assert precomputedEffectiveTokens is forwarded. The estimation-reuse optimization (a core perf win of this PR) is unprotected against regression. (geminiChat.test.ts)
  • [Suggestion] COMPRESSION_FAILED_OUTPUT_TRUNCATED is included in isCompressionFailureStatus() but never exercised as a failure variant in any geminiChat-level circuit breaker test. Only INFLATED and EMPTY_SUMMARY are tested. (geminiChat.test.ts)
  • [Suggestion] The originalTokenCount === 0 + pendingUserMessage path (which falls through to the first-send estimation fallback) has no dedicated test case. The existing "estimated tokens exceed threshold" test uses originalTokenCount = 160_000 (non-zero), so the zero-path branch is uncovered at the service level. (chatCompressionService.test.ts)

— DeepSeek/deepseek-v4-pro via Qwen Code /review

consecutiveFailures >= MAX_CONSECUTIVE_FAILURES &&
!force &&
!bypassTokenThreshold
) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Circuit breaker NOOP is completely silent. When consecutiveFailures >= MAX_CONSECUTIVE_FAILURES causes compress() to return CompressionStatus.NOOP, there is zero log output. The only observable signal that auto-compaction has stopped is the absence of compression — a nightmare to debug at 3 AM when the only visible symptom is "API context overflow."

Suggested change
) {
this.config
.getDebugLogger()
.warn(
`Auto-compaction breaker tripped: consecutiveFailures=${opts.consecutiveFailures} >= MAX=${MAX_CONSECUTIVE_FAILURES}. Use /compress to reset.`,
);
return { compressionStatus: CompressionStatus.NOOP };

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557 — added a warn-level log on the breaker NOOP path quoting consecutiveFailures, MAX, and recovery instruction (/compress). Was: silent. Now: shows up in logs so absence-of-compaction is no longer invisible.

// trying after MAX_CONSECUTIVE_FAILURES strikes rather than burn an API
// call on every send. Reactive overflow still catches the catastrophic
// case. See docs/design/auto-compaction-threshold-redesign.md risk #2.
//

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The truncation guard compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS (>=) can false-positive on a legitimate summary that happens to land exactly at 20K tokens. Each false-positive counts as a compression failure, incrementing the circuit breaker — 3 false-positives permanently disable auto-compaction. The code's own TODO acknowledges the correct approach is finish_reason === 'length' / MAX_TOKENS. Until runSideQuery surfaces finish_reason, consider using > instead of >= to shrink the false-positive window (a model producing exactly 20K is far less likely than one exceeding 20K).

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557>= changed to >. With R5.2b's new COMPRESSION_FAILED_OUTPUT_TRUNCATED status now counting toward the breaker, the false-positive cost (3 strikes → permanent disable) was too high. The proper finish_reason path is still TODO'd; > will essentially never fire today (API enforces hard cap) but is the right semantics once finish_reason lands.

Comment thread packages/core/src/core/geminiChat.ts Outdated
this.lastPromptTokenCount,
imageTokenEstimate,
);
const shouldForceFromHard = effectiveTokens >= hard;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Hard-tier rescue resets consecutiveFailures = 0 before calling tryCompress(force=true), and force=true causes tryCompress's failure branch to skip the consecutiveFailures += 1 increment. This means repeated hard-rescue compression failures never trip the breaker — each failing send burns one doomed compression API call with no limit. The opposite extreme of the proactive path's breaker; consider either saving/restoring the pre-call counter value on failure, or adding a separate hard-rescue cooldown counter (e.g., skip hard rescue for N turns after M consecutive hard-rescue failures).

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557 — added a dedicated hardRescueFailureCount field on GeminiChat, bounded by MAX_CONSECUTIVE_FAILURES. After that many consecutive hard-rescue failures, the rescue stops firing and reactive overflow takes over. Resets on any compression success. Documented in JSDoc + observable via the new warn-log when the rescue fails.

Comment thread packages/cli/src/ui/types.ts Outdated

export type ContextTier = 'safe' | 'warn' | 'auto' | 'hard';

export interface ContextThresholds {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] ContextThresholds (CLI package) has the exact same shape as CompactionThresholds (core package) — four identically-typed fields (effectiveWindow, warn, auto, hard). Maintaining a duplicate type definition across packages creates a silent drift risk: if core's shape changes, the CLI type becomes stale with no compiler error (because the runtime value still flows through untyped boundaries). Consider re-exporting CompactionThresholds from core's public API and referencing it here, or using Pick<CompactionThresholds, ...> to make the relationship explicit.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557ContextThresholds reduced to a type alias of core's CompactionThresholds (export type ContextThresholds = CompactionThresholds;). The four fields now have a single source of truth; the previously-silent drift risk is gone.

self.hasFailedCompressionAttempt = true;
// Reactive compression is force=true so tryCompress's
// failure branch did not increment the counter. Count it
// explicitly as one strike — a single transient error

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The reactive overflow path (self.consecutiveFailures += 1 at L984) shares the same consecutiveFailures counter as the proactive cheap-gate path. This means reactive compression failures (transient model issues producing poor summaries) disable proactive auto-compaction. These two paths have different failure semantics — reactive is a last-resort recovery, proactive is a performance optimization — and sharing a breaker undermines fault isolation. Consider a separate reactiveConsecutiveFailures counter, or at minimum document the coupling in the field's JSDoc.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Declined — coupling is documented in R5.3 JSDoc and the related R1.2 fix already softened reactive's impact: reactive failure increments by +1, not =MAX, so it takes MAX_CONSECUTIVE_FAILURES reactive failures to disable proactive. That's the correct outcome for a chat where reactive consistently fails — splitting the counter would just delay the bound without changing the steady-state behavior. R6.7 (this batch) additionally fixed reactive's missing increment in the catch block so the bound is now uniformly enforced across both reactive failure modes.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional observation (not tied to a diff line): docs/users/configuration/settings.md still documents model.chatCompression.contextPercentageThreshold as a live setting (including "Use 0 to disable compression entirely"), but this field has been removed from ChatCompressionSettings and the value is silently ignored. The settings docs should be updated to reflect the removal.

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

imageTokenEstimate,
);
const shouldForceFromHard = effectiveTokens >= hard;
if (shouldForceFromHard) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Hard-tier rescue fires with zero logging

When shouldForceFromHard is true, the code silently resets consecutiveFailures = 0 and passes force=true to tryCompress. There is no debugLogger.info or debugLogger.warn at this trigger point.

The JSDoc at L469 says "If you're debugging 'why is hard-rescue firing but the counter is 0', that's by design" — but there's nothing in the log to confirm that IS what's happening. An oncall engineer seeing repeated forced compaction with counter stuck at 0 cannot distinguish "hard-tier rescue by design" from "something is resetting the counter in a loop" without reading source code.

Suggested change
if (shouldForceFromHard) {
if (shouldForceFromHard) {
debugLogger.info(
`[compaction] hard-tier rescue: effectiveTokens=${effectiveTokens} >= hard=${hard}, forcing compaction (consecutiveFailures ${this.consecutiveFailures} → 0)`,
);
this.consecutiveFailures = 0;
}

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557 — added a debugLogger.info call at the rescue trigger point capturing effectiveTokens, the hard threshold, and the counter transition (consecutiveFailures N → 0). Now the field JSDoc's claim that "hard-rescue resets the counter by design" has a matching observable signal.

// the intended recovery path. (review #4168 R1.2)
self.consecutiveFailures += 1;
}
} catch (compressionError) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Reactive overflow catch block doesn't increment consecutiveFailures

The status-based failure path at L984 correctly increments self.consecutiveFailures += 1, but thrown exceptions (network errors, model 500s, timeouts) in this catch block bypass the increment entirely. If reactive compression consistently throws rather than returning a failure status, the circuit breaker never trips — the system burns a failed reactive compression API call on every overflow indefinitely.

Suggested change
} catch (compressionError) {
} catch (compressionError) {
if (
params.config?.abortSignal?.aborted ||
isAbortError(compressionError)
) {
throw compressionError;
}
debugLogger.warn('Reactive compression failed.', compressionError);
self.consecutiveFailures += 1;

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557self.consecutiveFailures += 1 added inside the catch block (was: status-based path only). Thrown exceptions (network errors, model 5xx, timeouts) now also count toward the breaker, matching the status-based pattern.

id: 'context-critical',
content:
'Context is almost full! Run /compress now or start /new to continue.',
'Context near hard limit — auto-compact will force on next send. Consider /clear if you want to start fresh.',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Tip text says "will force on next send" but hard-tier rescue already ran on current send

The isRelevant check fires when lastPromptTokenCount >= thresholds.hard — meaning the hard-tier rescue in sendMessageStream has already forced compaction on the send that just produced this response. Users are told compaction "will force on next send" when it already forced on this send.

Suggested change
'Context near hard limit — auto-compact will force on next send. Consider /clear if you want to start fresh.',
'Context near hard limit — auto-compact was forced on this turn. Consider /clear if context remains tight.',

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557 — tip text now reads "Context near hard limit — auto-compact was forced on this turn. Consider /clear if context remains tight." Tense matches the actual timing (rescue already ran by the time the tip renders).

) {
// eslint-disable-next-line no-console
console.warn(
'[qwen-code] chatCompression.contextPercentageThreshold has been removed ' +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] contextPercentageThreshold: 0 disable mechanism removed with no replacement

Users who set contextPercentageThreshold: 0 to disable auto-compaction (e.g., for debugging compression issues, or for sessions that must retain full history) now have no migration path. The one-time console.warn at startup is easy to miss, and doesn't inform users that compaction can no longer be disabled at all.

Consider either: (1) adding a chatCompression.enabled boolean as a replacement escape hatch, (2) adding an env var like QWEN_AUTO_COMPACT=false, or (3) at minimum, updating the warning to explicitly state that auto-compaction cannot currently be disabled.

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Partially fixed in 1030557 — went with reviewer's option (3): the deprecation warning now explicitly states "auto-compaction cannot currently be disabled" and points users to /clear / open an issue if they need a replacement. Did not add a new enabled flag or env var — those would re-introduce the disable mechanism this PR removed. The choice trades one round of forced-on auto-compact for the cleaner threshold semantics; the warning makes that trade-off visible to affected users.

label: string;
tokens: number;
isCurrent?: boolean;
hint?: string;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Dead code: ThresholdRow declares hint?: string prop but no caller passes it

The hint prop is declared, destructured, and rendered, but none of the ThresholdRow invocations in CompactionThresholds ever pass a hint value. Remove the prop until a caller actually needs it.

Suggested change
hint?: string;
const ThresholdRow: React.FC<{
label: string;
tokens: number;
isCurrent?: boolean;
}> = ({ label, tokens, isCurrent }) => {

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557hint prop removed from ThresholdRow. The only caller that ever passed it was dropped in an earlier round (the "Effective window" hint was moved out to avoid wrapping). Now matches the actual usage.

// single render — that resolves the moment any send happens.
//
// TODO: plumb the chat history into collectContextData and use
// estimatePromptTokens(history, undefined, 0, imageTokenEstimate) here

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] TODO references a type-incorrect function call

The TODO proposes estimatePromptTokens(history, undefined, 0, imageTokenEstimate), but userMessage: Content is a non-optional parameter in estimatePromptTokens. A future developer following this TODO will hit a compile error.

Update the TODO to show a type-correct call, or note that userMessage needs to be made optional first.

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557 — rewrote the TODO with a 3-step implementation sketch that calls out the prerequisite signature change (estimatePromptTokens.userMessage needs to be made optional first) and the plumbing path (add a chat?: GeminiChat parameter to collectContextData). A future developer following the TODO won't hit the compile error you flagged.

*/
const DEFAULT_COMPRESSION_THRESHOLD = 0.7;
function currentTier(
tokens: number,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] hard === auto makes the 'auto' tier unreachable for small windows

computeThresholds collapses hard to auto for windows below ~82.5K tokens. Since this function checks >= hard first, when hard === auto the first branch always wins — the function returns 'hard' and the 'auto' tier is never displayed. Users with smaller-window models see the display jump from 'warn' directly to 'hard'.

Consider checking auto before hard when they're equal, or displaying 'auto/hard' when the thresholds are identical.

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557 — currentTier() now only returns 'hard' when thresholds.hard > thresholds.auto. For small windows where computeThresholds collapses them, the function returns 'auto' instead, so the tier label is reachable for all windows.

breakdown.autocompactBuffer,
contextWindowSize,
),
` Effective window: ${formatNum(breakdown.thresholds.effectiveWindow)} (window − 20K reserve)`,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] formatContextUsageText hardcodes English labels and a stale-prone magic constant

The text formatter embeds threshold labels in English ("Warn threshold", "Auto threshold", etc.) and includes a literal "(window − 20K reserve)" string. The interactive ContextUsage.tsx component uses t() for i18n. Additionally, "20K" is a hard-coded reference to SUMMARY_RESERVE — if that constant changes, this string silently goes stale.

Consider using t() for labels and deriving the reserve text from the constant.

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557 — formatContextUsageText labels now go through t() (matching the interactive ContextUsage.tsx). "20K reserve" is now derived from SUMMARY_RESERVE (exported from core) via Math.round(SUMMARY_RESERVE / 1000) + "K", so it stays in sync if the constant ever changes.

const effectiveTokens =
opts.precomputedEffectiveTokens !== undefined
? opts.precomputedEffectiveTokens
: pendingUserMessage

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Cheap-gate path #2 (pendingUserMessage without precomputedEffectiveTokens) is dead code

The cheap-gate has three estimation paths, but no current caller ever passes pendingUserMessage without also passing precomputedEffectiveTokens: sendMessageStream passes both, tryCompressChat (manual /compress) passes neither, and heap-pressure bypass passes neither. This path is the only one that calls chat.getHistory(true) inside the cheap-gate — an expensive clone that is never actually reached.

Removing this branch (or adding an assertion) would simplify the gate and eliminate the latent clone risk if a future caller accidentally hits it.

— qwen-latest-series-invite-beta-v28 via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in 1030557 — removed the dead pendingUserMessage-only branch. The cheap-gate now uses opts.precomputedEffectiveTokens ?? originalTokenCount. Production callers (sendMessageStream) always pass precomputed; direct service callers fall back to originalTokenCount without cloning history. Eliminates the latent double-clone risk.

LaZzyMan added a commit that referenced this pull request May 19, 2026
Adds a defensive guard in ChatCompressionService.compress() that detects
when the side-query summary hit COMPACT_MAX_OUTPUT_TOKENS (20K). In that
case the summary is likely truncated mid-content, so we drop it and
return NOOP rather than persist a half-summary. The next send re-tries;
reactive overflow still catches the catastrophic case where the API
rejects the next request as too large.

Documented in the design doc as risk #2; the bot reviewer on PR #4168
correctly pushed for it to land alongside the threshold redesign rather
than as a follow-up since the new 20K cap is what makes truncation
likely in the first place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 19, 2026
The Task 11 redesign updated the non-interactive text formatter
(formatContextUsageText) but left ContextUsage.tsx — the interactive
React component that real /context users see — unchanged. As a result
the TUI still showed the old single "Autocompact buffer" line and none
of the new warn/auto/hard ladder.

Adds a "Compaction thresholds" section after the per-category breakdown:
  - Effective window
  - Warn / Auto / Hard threshold rows with a ▶ marker on the row the
    current usage has crossed
  - Current tier label coloured by severity (safe→green, warn/auto→
    yellow, hard→red)

The existing progress bar legend (Used / Free / Autocompact buffer)
is preserved because it's tied to the three-segment progress bar
visualisation; the new section adds the absolute numbers + tier badge
on top of that.

Caught by the tmux e2e test (PR #4168 ci-monitor follow-up). Pre-fix
the assertion 'Compaction thresholds' missed completely from the TUI;
post-fix the new section renders correctly for fresh and live sessions
on 1M / 200K / 128K windows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 19, 2026
Behavior fixes:
- MAX_TOKENS truncation guard now returns COMPRESSION_FAILED_EMPTY_SUMMARY
  instead of NOOP so the consecutive-failure breaker actually trips after
  repeated max-length summaries (R1.1).
- Reactive overflow failure increments consecutiveFailures by 1 instead
  of latching to MAX in one shot, so a transient network blip doesn't
  permanently disable auto-compaction. The hard-tier rescue resets the
  counter, which remains the designated recovery path (R1.2).
- /context current-tier classification uses rawOverhead (system + tools +
  memory + skills) as the tier input when API data is not yet available,
  rather than 0 — large inherited contexts no longer silently show 'safe'
  (R2.2).

Performance:
- sendMessageStream computes effectiveTokens ONCE and passes it through
  TryCompressOptions.precomputedEffectiveTokens, so the cheap-gate inside
  service.compress doesn't redo the estimation. Also fixes the
  imageTokenEstimate inconsistency between the rescue and cheap-gate
  paths (R1.3 + R1.4).
- Steady-state path (lastPromptTokenCount > 0) skips the costly
  getHistory(true) clone — estimatePromptTokens only needs the user
  message in that branch.

Code hygiene:
- BYTES_PER_TOKEN → CHARS_PER_TOKEN (inputs are char counts, not byte
  counts; CJK text would mislead under the old name) (R3.1).
- Drop dead getContextUsagePercent helper + index re-export — no callers
  in source after the threshold rewire (R1.5).
- Add a comment on estimatePromptTokens' first-send fallback documenting
  the ~15-20K under-estimate (system prompt + tools + skills) and that
  reactive overflow is the safety net (R3.3).

Tests:
- New CLI ContextUsage.test.tsx exercises the React renderer for the
  three-tier section: section presence, ▶ marker placement per tier,
  current-tier label coloring (R1.6).
- New chatCompressionService.test.ts case pins that a stale
  contextPercentageThreshold: 0 value in user settings no longer
  short-circuits compaction (R2.1).
- New tokenEstimation.test.ts case covers functionResponse (distinct
  nested-parts branch from functionCall) (R3.5).
- New geminiChat.test.ts integration test exercises the real
  ChatCompressionService — not a mock — for the first-send-after-
  inherited-history scenario where lastPromptTokenCount=0 and only the
  full-history estimate can cross the auto threshold (R3.4).

Declined: R3.2 (change `>=` to `>` on the MAX_TOKENS guard). The current
operator catches the at-cap case as suspicious, which is intentional —
landing exactly at the output cap is far more likely truncation than
clean stop given p99.99 ≈ 17K. With R1.1 in place, persistent truncations
trip the breaker after MAX_CONSECUTIVE_FAILURES so the worst case is
bounded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 19, 2026
- R5.1: tighten /context tier comment + TODO. The rawOverhead-based fix
  doesn't cover `--continue` restores with many history messages (since
  rawOverhead excludes messagesTokens). UI may still show 'safe' for one
  render until the first send. Documented inline and added a TODO to plumb
  chat history into collectContextData for same-source-of-truth as the
  cheap-gate.
- R5.2a: add TODO(finish_reason) at the truncation guard. The `>= cap`
  heuristic false-positives on legitimate at-cap summaries; the proper
  signal is finish_reason which runSideQuery doesn't surface today.
- R5.2b: split telemetry — new CompressionStatus.COMPRESSION_FAILED_OUTPUT_TRUNCATED
  enum value. Distinct from EMPTY_SUMMARY so logs/telemetry can tell
  prompt-quality failures (tune prompt / splitter) from capacity failures
  (raise cap / shrink splitter input). isCompressionFailureStatus()
  treats both as failures so the breaker behavior is unchanged.
- R5.3: expand consecutiveFailures JSDoc to clarify it tracks
  "non-force, non-hard-rescue consecutive failures" — hard-rescue resets
  the counter and force=true skips increments, so the counter is the
  "regular path" health signal only; reactive overflow is the real
  safety net for the force-only paths.
- R5.4: document the CompressOptions field rename
  (hasFailedCompressionAttempt: boolean → consecutiveFailures: number)
  as an SDK breaking change in the design doc with migration guide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 20, 2026
The Task 11 redesign updated the non-interactive text formatter
(formatContextUsageText) but left ContextUsage.tsx — the interactive
React component that real /context users see — unchanged. As a result
the TUI still showed the old single "Autocompact buffer" line and none
of the new warn/auto/hard ladder.

Adds a "Compaction thresholds" section after the per-category breakdown:
  - Effective window
  - Warn / Auto / Hard threshold rows with a ▶ marker on the row the
    current usage has crossed
  - Current tier label coloured by severity (safe→green, warn/auto→
    yellow, hard→red)

The existing progress bar legend (Used / Free / Autocompact buffer)
is preserved because it's tied to the three-segment progress bar
visualisation; the new section adds the absolute numbers + tier badge
on top of that.

Caught by the tmux e2e test (PR #4168 ci-monitor follow-up). Pre-fix
the assertion 'Compaction thresholds' missed completely from the TUI;
post-fix the new section renders correctly for fresh and live sessions
on 1M / 200K / 128K windows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 20, 2026
Behavior fixes:
- MAX_TOKENS truncation guard now returns COMPRESSION_FAILED_EMPTY_SUMMARY
  instead of NOOP so the consecutive-failure breaker actually trips after
  repeated max-length summaries (R1.1).
- Reactive overflow failure increments consecutiveFailures by 1 instead
  of latching to MAX in one shot, so a transient network blip doesn't
  permanently disable auto-compaction. The hard-tier rescue resets the
  counter, which remains the designated recovery path (R1.2).
- /context current-tier classification uses rawOverhead (system + tools +
  memory + skills) as the tier input when API data is not yet available,
  rather than 0 — large inherited contexts no longer silently show 'safe'
  (R2.2).

Performance:
- sendMessageStream computes effectiveTokens ONCE and passes it through
  TryCompressOptions.precomputedEffectiveTokens, so the cheap-gate inside
  service.compress doesn't redo the estimation. Also fixes the
  imageTokenEstimate inconsistency between the rescue and cheap-gate
  paths (R1.3 + R1.4).
- Steady-state path (lastPromptTokenCount > 0) skips the costly
  getHistory(true) clone — estimatePromptTokens only needs the user
  message in that branch.

Code hygiene:
- BYTES_PER_TOKEN → CHARS_PER_TOKEN (inputs are char counts, not byte
  counts; CJK text would mislead under the old name) (R3.1).
- Drop dead getContextUsagePercent helper + index re-export — no callers
  in source after the threshold rewire (R1.5).
- Add a comment on estimatePromptTokens' first-send fallback documenting
  the ~15-20K under-estimate (system prompt + tools + skills) and that
  reactive overflow is the safety net (R3.3).

Tests:
- New CLI ContextUsage.test.tsx exercises the React renderer for the
  three-tier section: section presence, ▶ marker placement per tier,
  current-tier label coloring (R1.6).
- New chatCompressionService.test.ts case pins that a stale
  contextPercentageThreshold: 0 value in user settings no longer
  short-circuits compaction (R2.1).
- New tokenEstimation.test.ts case covers functionResponse (distinct
  nested-parts branch from functionCall) (R3.5).
- New geminiChat.test.ts integration test exercises the real
  ChatCompressionService — not a mock — for the first-send-after-
  inherited-history scenario where lastPromptTokenCount=0 and only the
  full-history estimate can cross the auto threshold (R3.4).

Declined: R3.2 (change `>=` to `>` on the MAX_TOKENS guard). The current
operator catches the at-cap case as suspicious, which is intentional —
landing exactly at the output cap is far more likely truncation than
clean stop given p99.99 ≈ 17K. With R1.1 in place, persistent truncations
trip the breaker after MAX_CONSECUTIVE_FAILURES so the worst case is
bounded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 20, 2026
- R5.1: tighten /context tier comment + TODO. The rawOverhead-based fix
  doesn't cover `--continue` restores with many history messages (since
  rawOverhead excludes messagesTokens). UI may still show 'safe' for one
  render until the first send. Documented inline and added a TODO to plumb
  chat history into collectContextData for same-source-of-truth as the
  cheap-gate.
- R5.2a: add TODO(finish_reason) at the truncation guard. The `>= cap`
  heuristic false-positives on legitimate at-cap summaries; the proper
  signal is finish_reason which runSideQuery doesn't surface today.
- R5.2b: split telemetry — new CompressionStatus.COMPRESSION_FAILED_OUTPUT_TRUNCATED
  enum value. Distinct from EMPTY_SUMMARY so logs/telemetry can tell
  prompt-quality failures (tune prompt / splitter) from capacity failures
  (raise cap / shrink splitter input). isCompressionFailureStatus()
  treats both as failures so the breaker behavior is unchanged.
- R5.3: expand consecutiveFailures JSDoc to clarify it tracks
  "non-force, non-hard-rescue consecutive failures" — hard-rescue resets
  the counter and force=true skips increments, so the counter is the
  "regular path" health signal only; reactive overflow is the real
  safety net for the force-only paths.
- R5.4: document the CompressOptions field rename
  (hasFailedCompressionAttempt: boolean → consecutiveFailures: number)
  as an SDK breaking change in the design doc with migration guide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 20, 2026
Observability (R6.1 + R6.6):
- chatCompressionService.compress() now warn-logs when the breaker trips
  the NOOP path; previously the only signal was the absence of compaction
- sendMessageStream info-logs hard-tier rescue trigger + warn-logs on
  rescue failure so debugging matches the consecutiveFailures JSDoc

Counter accounting (R6.3 + R6.7):
- New `hardRescueFailureCount` field on GeminiChat bounds hard-rescue
  retries to MAX_CONSECUTIVE_FAILURES — without it a chat whose history
  can't shrink would burn an API call per send forever (force=true
  skipped the regular increment AND the rescue's pre-call reset wiped
  state). After MAX failures, hard rescue stops firing and reactive
  overflow takes over as the next defense layer. Reset on any
  compression success.
- Reactive overflow catch block now increments consecutiveFailures so
  thrown exceptions (network, 5xx, timeouts) also count toward the
  breaker — previously only status-based reactive failures incremented.

UI corrections (R6.8 + R6.9 + R6.12):
- context-critical tip: tense corrected from "will force on next send"
  to "was forced on this turn" — the rescue already ran by the time the
  tip renders
- Deprecation warning explicitly states auto-compaction can no longer
  be disabled (no replacement for `contextPercentageThreshold: 0`)
- currentTier() returns 'auto' (not 'hard') when hard collapses to
  auto on small windows — previously the 'auto' tier was unreachable
  for those sessions

Code hygiene (R6.2 / R6.4 / R6.10 / R6.11 / R6.13 / R6.14):
- Truncation guard `>=` → `>`: legitimate at-cap summaries no longer
  treated as truncation (was particularly costly because R5.2b made
  these count toward the breaker)
- ContextThresholds reduced to a type alias of core's CompactionThresholds
  to eliminate silent-drift risk
- Removed dead `hint` prop on ThresholdRow (no caller after R5 refactor)
- TODO at contextCommand.ts now shows a type-correct call sketch
- formatContextUsageText uses t() for labels; "20K" derived from
  SUMMARY_RESERVE constant (exported from core)
- cheap-gate dead branch removed: production callers always pass
  precomputedEffectiveTokens; direct service callers fall back to
  originalTokenCount instead of double-cloning history

Tests (R6.15):
- New: COMPRESSION_FAILED_OUTPUT_TRUNCATED counts toward the breaker
- New: precomputedEffectiveTokens path skips estimation work
- New: cheap-gate falls back to originalTokenCount when no precomputed
- Hard-rescue test now asserts precomputedEffectiveTokens is forwarded

Docs (R6.16):
- docs/users/configuration/settings.md table entry for
  `model.chatCompression.contextPercentageThreshold` updated to mark
  the field REMOVED with link to PR rationale

Declined: R6.5 (separate reactive/proactive counter). The R5.3 JSDoc
already documents the coupling intentionally; R1.2 reduced reactive's
weight to +1 (not =MAX), so it takes MAX_CONSECUTIVE_FAILURES reactive
failures to disable proactive — which is the correct outcome for a
chat where reactive consistently fails. A separate counter would add
state without changing observable behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 20, 2026
R7.1 critical (scratchpad data-retention): with includeThoughts=false,
the compression model emits its private <scratchpad> reasoning as plain
text alongside <state_snapshot>, and the entire concatenation was being
persisted as the chat's compressed memory — leaking sensitive tool
output (API keys, paths, file fragments) into every subsequent turn.
Extract just the <state_snapshot> envelope from the response; surface a
no-match as COMPRESSION_FAILED_EMPTY_SUMMARY so the breaker reacts to
prompt-format drift.

R7.2 / R7.3 critical (hard-rescue counter accounting): pessimistic
increment pattern. The previous post-call accounting silently leaked
two failure shapes:
  - throw (provider 5xx / abort): post-handler unreachable, counter
    stuck → infinite re-fire on every send.
  - NOOP (history too small to split): neither failure-status nor
    COMPRESSED branch matched → same infinite re-fire.
Increment hardRescueFailureCount BEFORE tryCompress(force=true); rely on
the existing success-branch reset in tryCompress to refund the strike
on COMPRESSED. Throws, NOOPs, and failure statuses all keep the strike
uniformly.

R7.4 critical (constant coupling): lifted TOKEN_TO_CHAR_RATIO to the
single declaration in compactionInputSlimming.ts; tokenEstimation.ts's
CHARS_PER_TOKEN is now a re-export. Silent-drift risk between splitter
sizing and gate sizing is gone.

R7.5: removed dead `pendingUserMessage` field from CompressOptions /
TryCompressOptions — unused since R6.14 collapsed its consumer.

R7.6: breaker-NOOP path returns the caller's `originalTokenCount`
rather than 0 so telemetry sees real session token counts on the trip
event, not a misleading zero.

R7.7: log at warn level when hard-rescue is skipped due to budget
exhaustion (hardRescueFailureCount >= MAX). Closes the "why isn't
rescue firing" oncall blind spot.

R7.8: reverted R6.2's `>` back to `>=` on the truncation guard. With
the API hard-capping output at COMPACT_MAX_OUTPUT_TOKENS, `>` could
never fire — making the guard dead code that silently persisted
truncated summaries. `>=` catches exact-at-cap (almost always
truncated); the breaker bounds 3 strikes. Declined the reviewer's
alternative `>= cap * 0.95` heuristic — broadens false positives into
the p99-realistic range (~19K) without addressing the root cause
(finish_reason plumbing, still TODO'd).

R7.9: throttle the breaker warn log via a `breakerWarningEmitted` flag
on GeminiChat. Fires once when the breaker first trips, resets when
consecutiveFailures returns to 0. Service stays stateless.

R7.10: neutral tip wording — "Run /compress or /clear to free space"
is correct whether hard-rescue ran, failed, or was budget-suppressed.
Previous past-tense ("was forced on this turn") was wrong in the
budget-exhausted case.

R7.11: 4 new test cases pinning the hardRescueFailureCount + reactive
overflow counter contracts (budget exhaustion via failures, via NOOPs,
via thrown exceptions; reactive throw increments consecutiveFailures).

Tests: packages/core 205 passing in changed files (chatCompression +
geminiChat + tokenEstimation + compactionInputSlimming);
packages/cli 33 passing (tips + ContextUsage + contextCommand). Pre-
existing serve/* breakage and timeout-flaky utils/filesearch tests
unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 20, 2026
…er + throttle)

The R7.1 <state_snapshot> extraction shipped in round 7 turned out to
be incomplete on three fronts that the round-8 review caught:

R8.1 critical (format-violation diagnostic): when the model produced
non-empty raw output but no <state_snapshot> tags, the path silently
classified as COMPRESSION_FAILED_EMPTY_SUMMARY — indistinguishable
from a model that genuinely returned nothing, and three such sends
trip the breaker with no actionable signal. Added a warn-level log
on the !isRawEmpty && isSummaryEmpty branch that includes length and
the first 200 chars of the raw output, so an oncall can distinguish
"prompt drift / model misbehaviour" from "provider error".

R8.6 (regex bypass): the non-greedy `<state_snapshot>[\s\S]*?</...>`
match captured from the FIRST occurrence of the opening tag. Because
the compression prompt instructs the model to "generate the
<state_snapshot>", the scratchpad is plausibly going to mention the
tag literally — and the match would then start at the scratchpad
mention and capture the scratchpad's reasoning through to the real
closing tag, defeating the data-retention fix. Anchored on the LAST
opening tag via `[\s\S]*<state_snapshot>([\s\S]*?)</state_snapshot>`
plus `${`<state_snapshot>${...}</state_snapshot>`}` reconstruction.

R8.7 (token math): the persisted history contains only the snapshot
envelope, but newTokenCount used the raw API `candidatesTokenCount`
which counts scratchpad+snapshot. Scaling by `summary.length /
rawSummaryText.length` while keeping the API count as the base
preserves tokenizer fidelity for the snapshot portion. Test scenario
of ~3x scratchpad vs snapshot drops the bookkeeping from 1024 → ~260,
which is materially closer to what the next cheap-gate actually sees.

R8.4 (throttle asymmetry): the R7.7 budget-exhausted warn fired on
every send when a session stayed above the hard threshold —
asymmetric with R7.9's `breakerWarningEmitted`. Added matching
`budgetExhaustedWarningEmitted` flag, cleared in the same COMPRESSED
success branch as the other resets.

R8.2 / R8.3 / R8.5 (test coverage gaps): added 6 tests pinning
contracts the previous rounds left unverified:

  - exact-cap (20_000) truncation guard (R7.8 regression guard)
  - scratchpad-strip end-to-end persistence assertion (R7.1)
  - format-violation EMPTY_SUMMARY + warn (R8.1/R8.3b combined)
  - breaker-tripped NOOP returns originalTokenCount (R7.6 telemetry)
  - hardRescueFailureCount recovery after COMPRESSED success (R8.5)
  - regex-anchor on literal scratchpad mention (R8.6)
  - newTokenCount accounts for only persisted snapshot (R8.7)

Phase 5 ordering: R8.1, R8.6, R8.7 were written test-first
(RED → fix → GREEN); R8.4 mirrors R7.9 structurally. Phase 6
self-review checklist run and documented in the PR reply.

All 2126 core tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaZzyMan added a commit that referenced this pull request May 20, 2026
R9.1 (telemetry assertion): pre-existing breaker-NOOP test only
checked status — added explicit token-count assertions so a
regression to `0/0` would surface instead of silently corrupting
trip-event telemetry.

R9.2 critical (NOOP refund): the R7.2/R7.3 pessimistic increment was
overcautious for the NOOP case. A forced rescue NOOPs when the
compressible slice is too small to split this turn — not because
the compression mechanism is broken. Refund the strike on NOOP so
a session whose first few turns happen to be too small doesn't
permanently disable hard-rescue. Throws and failure statuses still
cost a strike. Flipped the R7.11 NOOP test to assert the new
contract (budget does NOT exhaust on NOOPs).

R9.3 critical (cross-file silent coupling): the `<state_snapshot>`
tag name was hard-coded in both `prompts.ts` (literal XML in the
template) and `chatCompressionService.ts` (extraction regex). A
rename in one without the other was a silent failure mode (every
compaction → EMPTY_SUMMARY → breaker trips after 3 sends → auto-
compaction permanently off, looking like "model can't follow
format"). Lifted `COMPRESSION_SNAPSHOT_TAG = 'state_snapshot'` as a
shared constant; prompt template uses it via template literal, regex
constructs from it via `new RegExp`.

R9.4 (stale breaker flag): hard-rescue resets `consecutiveFailures
= 0` in the pre-call path but pre-R9.4 left `breakerWarningEmitted`
true. After a session sequence "breaker trips → warn emitted →
hard-rescue resets counter → counter re-trips", the second trip
emitted no warn. Clear the flag alongside the counter in the rescue
pre-call path.

R9.5 (tip small-window collapse): the `context-critical` tip fired
at `>= thresholds.hard`, but on small windows (32K) `computeThresholds`
collapses hard to equal auto — the tip would claim "near hard limit"
when there is no distinct hard limit. Mirror the `currentTier` guard
(`hard > auto`) so the `context-high` band `[auto, hard)` handles
small windows cleanly.

R9.6 declined as filter-1 false-positive: the cited inflation was
fixed in R8.7 (current code scales `compressionOutputTokenCount` by
the snapshot/raw char ratio). Reviewer was reading a stale snapshot.

R9.7 (preserve valid snapshots): the truncation guard fired whenever
`compressionOutputTokenCount >= COMPACT_MAX_OUTPUT_TOKENS` regardless
of extraction success. When the model emits a complete
`<state_snapshot>...</state_snapshot>` envelope and the cap was
consumed by scratchpad, dropping the snapshot throws away a valid
result. Gated the guard on `!snapshotMatch` so it now only fires
when the envelope is incomplete (no closing tag) — strong evidence
of mid-snapshot truncation. Existing R7.8/R8.2 truncation tests
updated to use no-closing-tag mocks (the actual shape of mid-
snapshot truncation); added new test for the "complete envelope +
cap hit → preserved" contract.

Phase 5 ordering: R9.2 / R9.4 / R9.7 were RED-first (the R7.11 NOOP
test flip is the explicit RED for R9.2; R9.4 has a fresh
internals-peek test; R9.7 has a fresh test that fails against the
pre-R9.7 code which would return TRUNCATED instead of COMPRESSED).
R9.3 is a constant-lift with no behavior change. R9.5 has a new
small-window-collapse test.

Tests: 2128 core + 24 CLI all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LaZzyMan LaZzyMan force-pushed the lazzy/trusting-grothendieck-8a8501 branch from 9cbcfd2 to 27200f7 Compare May 20, 2026 02:34
LaZzyMan added a commit that referenced this pull request May 20, 2026
R11.1 critical (NaN propagation): R10.1's `?? 0` only catches
null/undefined; NaN passes through and poisons every subsequent
`lastPromptTokenCount + NaN + ...` arithmetic — `NaN >= hard` is
always false, silently disabling hard-tier rescue for the session.
Guard with `Number.isFinite` so NaN / Infinity / non-numbers coerce
to 0. RED-first via hostile-NaN-payload test.

R11.2 (self-inflicted regression from R9.5): adding `hard > auto`
to context-critical left context-high's `[auto, hard)` band empty
when hard === auto (small windows 32K/64K). Users at the auto
threshold lost ALL contextual tips. Accept `>= auto` in
context-high when hard === auto so there's always exactly one tip
in the high-utilization range. RED-first via collapsed-window test.

R11.3 critical (per-strike observability): pre-R11.3, proactive
auto-compaction failures produced ZERO logs until the breaker
tripped on strike 3. An oncall investigating "auto-compaction
stopped" couldn't distinguish EMPTY_SUMMARY / OUTPUT_TRUNCATED /
INFLATED / TOKEN_COUNT_ERROR without source-diving. Added
info-level per-strike log citing status and strike-of-MAX. Declined
the second half of the suggestion (promote breaker/budget warns to
console.warn for user visibility) — that's UI noise; users without
DEBUG=QWEN_CODE_CHAT enabled see reactive overflow recovery
working, which is the intended UX.

R11.4 critical (disable escape hatch restored): the removal of
`contextPercentageThreshold: 0` was scope-collateral, not intent.
Users with compliance / debugging / audit-trail needs require a
way to opt out of auto-compaction entirely. Added
`chatCompression.disabled: boolean` field. Service-level cheap-gate
gates `!force && !bypassTokenThreshold` (proactive only); hard-
rescue gated at SOURCE in sendMessageStream since force=true would
bypass the service gate. Manual /compress (user-initiated force=true
via tryCompressChat) and reactive overflow (API-layer safety net)
remain active — matching the old contextPercentageThreshold=0
semantics that only gated the proactive path.

R11.5 declined-design: the counter asymmetry between
`consecutiveFailures` (proactive cheap-gate health) and
`hardRescueFailureCount` (rescue-budget pessimistic) is intentional
and documented in the JSDoc — they track different mechanisms with
legitimately different reset semantics. The "regular breaker
reports healthy while every compression fails" scenario the
reviewer describes IS the design: a flaky hard-rescue eventually
exhausts its own budget, then the proactive cheap-gate accumulates
strikes, then the cheap-gate breaker latches. Reactive overflow
catches the actual API failure throughout. The save/restore pattern
suggested would complicate the state machine without changing the
recovery shape.

R11.6 (sensitive content in warn log): R8.1's `slice(0, 200)` of
raw model output captured exactly the window where scratchpad's
sensitive content (quoted API keys, paths from tool output) is
most likely to appear. Length-only message preserves the
operationally actionable distinction ("model returned content but
no tags" vs "model returned nothing") without the leak risk.
Actual content is recoverable from provider-side logging.

R11.7 (regex hoist): the snapshot extraction regex depends only on
the immutable `COMPRESSION_SNAPSHOT_TAG` constant. Hoisted to
module-scope `SNAPSHOT_REGEX` — removes per-call `new RegExp()`
overhead and signals to readers that the pattern is a fixed
contract, not parameterised.

R11.8 (i18n hygiene): `breakdown.currentTier` value was interpolated
raw at 2 sites (contextCommand text formatter + ContextUsage Ink
component). Wrapped in `t()` so non-English locales don't see
mixed-language output. Sibling sweep via grep confirmed exactly 2
unwrapped render sites; the other `currentTier` references are
code comparisons against tier-name string literals (not user-facing
strings).

2361 core + 35 CLI tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread packages/core/src/core/geminiChat.ts
Comment thread packages/core/src/services/chatCompressionService.ts
Comment thread packages/core/src/services/chatCompressionService.ts
Comment thread packages/core/src/services/chatCompressionService.ts
Comment thread packages/core/src/core/geminiChat.ts
Comment thread packages/core/src/config/config.ts Outdated
Comment thread packages/core/src/services/compactionInputSlimming.ts
Comment thread packages/cli/src/ui/commands/contextCommand.ts
Comment thread packages/core/src/config/config.ts
Comment thread docs/users/configuration/settings.md Outdated
@LaZzyMan LaZzyMan force-pushed the lazzy/trusting-grothendieck-8a8501 branch from 27200f7 to e861d07 Compare May 20, 2026 03:11
R12.1 critical (sibling sweep of R11.1): R11.1 added Number.isFinite
to `lastCandidatesTokenCount`, but `lastPromptTokenCount` (assigned
3 lines above) and `cachedContentTokenCount` had no guard. Also,
Number.isFinite(-1) is true — a negative value would still poison
arithmetic. Factored `coerceUsageCount(value)` enforcing
(finite ∧ >= 0) and routed all 4 API-value capture sites through it.
RED-first via Infinity/NaN/-1/-1e9 injection test.

R12.2 critical (computeThresholds NaN propagation): a provider
returning `"context_window": null` surfaces as `contextWindowSize:
NaN`. Pre-fix, NaN propagated to all 4 thresholds, every downstream
`tokens >= NaN` comparison evaluated false, and the entire three-tier
gate silently disabled. Guard with `!Number.isFinite || <= 0` →
return Infinity thresholds (gate falls through to NOOP) + 0
effectiveWindow. RED-first against NaN/0/-1/-Inf inputs.

R12.3 critical (R8.7 self-inflicted undercount): pure scaling
collapses on extreme scratchpad/snapshot ratios. Example: 200K
scratchpad + 5K snapshot with 15K API tokens scaled to ~375 tokens.
Floor by `estimateContentTokens` on the persisted summary — `Math.max(
scaledApi, charBased)` keeps API tokenizer fidelity when scratchpad is
reasonable, clamps when it isn't. RED-first via 200K/5K extreme test.

R12.4 critical (disabled NOOP observability): the R11.4 disable-knob
NOOP returned silently, leaving oncall unable to distinguish "user
disabled" from "system broken". Added once-per-process warn (module-
level flag because `ChatCompressionService` is per-call). Symmetric
with R7.9 `breakerWarningEmitted` / R8.4 `budgetExhaustedWarningEmitted`.

R12.5 critical (test gap for R11.4 source gate): R11.4's hard-rescue
source-level disable check had no regression guard. Added test
mocking `getChatCompression: { disabled: true }` + lastPromptTokenCount
above hard threshold; asserts no force=true call to tryCompress.
Test passes against current code — pins the contract against future
refactor removing the source gate.

R12.6 (deprecation text contradiction): the R11.4 commit added
`disabled: true` but left the deprecation warning saying
"auto-compaction cannot currently be disabled". Updated to mention
the new field.

R12.7 declined-design: `imageTokenEstimate: 0` silently clamping to
100 violates user intent on a user-configurable knob. The reviewer's
concern (user accidentally disabling image weight) is real but the
fix is wrong shape — silent override of explicit values is filter-5
defensive bloat. Users explicitly setting 0 are signaling intent;
config-validation warnings at load are a future enhancement if
real-world complaints surface.

R12.8 (locale baseline): the 8+ new t() keys in /context output
(`Compaction thresholds`, `Effective window`, `Warn/Auto/Hard
threshold`, `Current tier`, tier names, `window − {{reserve}}
reserve`) had no entries in en.js. Added as baseline; other locales
fall back to the literal key (existing Used/Free behavior).
Not flagged in mustTranslateKeys.ts — would force breaking-CI on
locale maintainers; same precedent as existing Used/Free which also
aren't flagged.

R12.9 + R12.10 (discoverability): added `model.chatCompression.disabled`
and `model.chatCompression.imageTokenEstimate` rows to settings.md;
updated the REMOVED row for `contextPercentageThreshold` to mention
the new `disabled: true` migration path per gpt-5.5's exact suggested
text. Schema entry in settingsSchema.ts deliberately NOT changed —
adding nested sub-properties for chatCompression would require
rewriting the schema design for ALL existing sub-fields
(imageTokenEstimate) and is out of scope for this round; TypeScript's
ChatCompressionSettings interface already provides IDE-side autocomplete.

2405 core + 43 CLI tests in touched files passing. Pre-existing
serve/* import resolution failures in CLI workspace unaffected.
) {
// eslint-disable-next-line no-console
console.warn(
'[qwen-code] chatCompression.contextPercentageThreshold has been removed ' +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Deprecation warning contradicts the PR's own replacement mechanism

The console.warn message says "auto-compaction cannot currently be disabled" and tells users to open an issue "so we can consider a replacement." But this same PR adds ChatCompressionSettings.disabled as the first-class replacement for the removed contextPercentageThreshold: 0 escape hatch. Users who see this warning will be misled into thinking no disable mechanism exists and may file unnecessary issues or resort to unsafe workarounds.

Suggested change
'[qwen-code] chatCompression.contextPercentageThreshold has been removed ' +
'Note: the "contextPercentageThreshold" setting is removed. ' +
'To disable auto-compaction, set "chatCompression": { "disabled": true } instead.'

— DeepSeek/deepseek-v4-pro via Qwen Code /review

// disable gate catches the proactive cheap-gate (force=false);
// hard-rescue uses force=true to bypass the breaker, so it would
// otherwise sidestep that gate. Skip at the source.
const autoCompactionDisabled =

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] No test coverage for hard-rescue suppression when chatCompression.disabled === true

The sendMessageStream hard-rescue gate checks !autoCompactionDisabled before computing wantHardRescue. This is the sole protection against hard-rescue (which uses force=true) bypassing the service-layer disabled gate. Only the service-layer disable check is tested (chatCompressionService.test.ts "honors chatCompression.disabled"); the geminiChat guard has zero coverage. A regression that drops this guard would cause hard-rescue to silently ignore disabled: true.

Suggested change
const autoCompactionDisabled =
// Add a test in the "sendMessageStream hard-tier rescue" describe block:
// set mockConfig.getChatCompression().mockReturnValue({ disabled: true }),
// seed lastPromptTokenCount above hard threshold,
// assert compressSpy.mock.calls[0][1].force is false

— DeepSeek/deepseek-v4-pro via Qwen Code /review

imageTokenEstimate: number = DEFAULT_IMAGE_TOKEN_ESTIMATE,
): number {
let totalChars = 0;
for (const content of contents) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] char/4 token estimation severely underestimates for CJK text (2-4x error)

The code comment claims "char/4 is a conservative lower bound (real tokenizers vary ±30%)," but for Chinese/Japanese/Korean the opposite is true — most tokenizers encode CJK at 1-2 chars per token, not 4. This means for CJK-heavy conversations the cheap-gate triggers much later than it should, hard-tier rescue misses the window before API overflow, and the /context display shows misleadingly low token counts. For CJK users this is a systematic correctness issue that silently undermines the entire three-tier ladder.

Consider applying a language-aware correction factor, or at minimum lowering the ratio for safety. The current CHARS_PER_TOKEN=4 is only "conservative" for Latin-alphabet languages.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

// the API layer is still the last-ditch safety net. Replaces the
// removed `contextPercentageThreshold: 0` escape hatch.
if (chatCompressionSettings?.disabled && !force && !bypassTokenThreshold) {
return {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The disabled NOOP path returns with zero observability

When chatCompressionSettings.disabled === true, the early-return NOOP is entirely silent — no debug log, no warn, no telemetry event. This replaces the old contextPercentageThreshold: 0 escape hatch, but unlike the old path which at least produced NOOP telemetry, this leaves zero audit trail. An oncall investigating "why isn't compression running?" has no signal to find the disabled flag short of inspecting runtime config.

Suggested change
return {
if (chatCompressionSettings?.disabled && !force && !bypassTokenThreshold) {
debugLogger.debug(
'[compaction] NOOP: chatCompression.disabled is true',
);
return { ... };
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

* (autoCompact.ts:30) which is based on p99.99 of real compaction outputs.
*/
export const COMPACT_MAX_OUTPUT_TOKENS = 20_000;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] SUMMARY_RESERVE and COMPACT_MAX_OUTPUT_TOKENS are both hardcoded to 20_000 with a prose comment requiring them to stay equal — but there is no compile-time enforcement

If someone changes one constant without the other, computeThresholds computes effectiveWindow = window - SUMMARY_RESERVE while the actual output cap is COMPACT_MAX_OUTPUT_TOKENS. The threshold ladder silently drifts from reality. This is a classic "works by coincidence" coupling with no compiler error, no runtime exception, and no test failure to catch the drift.

Suggested change
export const COMPACT_MAX_OUTPUT_TOKENS = 20_000;
export const SUMMARY_RESERVE = COMPACT_MAX_OUTPUT_TOKENS;

— DeepSeek/deepseek-v4-pro via Qwen Code /review

/** Token count at which UI warn tier triggers. */
readonly warn: number;
/** Token count at which auto-compaction triggers. */
readonly auto: number;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] computeThresholds(window) has no input validation — NaN/Infinity input silently disables all compression

If called with NaN (e.g., contextWindowSize misread from config), NaN - SUMMARY_RESERVE = NaN, Math.max(NaN, 0) = NaN, and all thresholds become NaN. Since NaN >= anything is always false, the cheap-gate always passes (NOOP) and hard-rescue never fires. Compression is silently disabled with no error. Extremely unlikely but catastrophic when it happens.

Suggested change
readonly auto: number;
if (!Number.isFinite(window) || window <= 0) {
const infinite = Number.MAX_SAFE_INTEGER;
return { warn: infinite, auto: infinite, hard: infinite };
}

— DeepSeek/deepseek-v4-pro via Qwen Code /review

* the steady-state estimate lags by one response (typically 500–5000
* tokens) and the hard-tier rescue (which sits only HARD_BUFFER ≈ 3K
* from the window edge) fires late, costing a doomed API round-trip
* before reactive recovery catches the overflow.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] estimatePromptTokens accepts a history parameter that is unused in the steady-state path — leaky abstraction

The primary caller (sendMessageStream) passes [] for history to avoid a getHistory(true) clone. In the steady-state branch (lastPromptTokenCount > 0), the history argument is never read. This optimization leaks into the callee's API surface: future callers may not realize they can pass [], or may mistakenly pass [] on the cold-start path, silently getting a severe underestimate.

Consider splitting into two functions or using an options object so the cold-start case is explicitly opt-in:

Suggested change
* before reactive recovery catches the overflow.
export function estimateSteadyStateTokens(
lastPromptTokenCount: number,
lastCandidatesTokenCount: number,
userMessage: Content,
imageTokenEstimate?: number,
): number { ... }
export function estimateColdStartTokens(
history: Content[],
userMessage: Content,
imageTokenEstimate?: number,
): number { ... }

— DeepSeek/deepseek-v4-pro via Qwen Code /review

/**
* Number of consecutive auto-compaction failures for this chat. The
* cheap-gate NOOPs once this reaches MAX_CONSECUTIVE_FAILURES (default 3)
* until a successful compress (forced or not) resets it to 0. Replaces the

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Inconsistent naming between two adjacent failure counters: consecutiveFailures vs hardRescueFailureCount

One uses bare "Failures" without a count suffix, the other uses "FailureCount". Additionally, consecutiveFailures doesn't specify what fails — consecutive what? Without reading the JSDoc, it's ambiguous. Both track related-but-different failure domains within the same class.

Consider renaming both for symmetry and clarity, e.g., autoCompactionConsecutiveFailures and hardRescueConsecutiveFailures.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

*/
function tierColor(tier: ContextTier): string {
switch (tier) {
case 'safe':

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] warn and auto tiers share the same color (theme.status.warning), making them visually indistinguishable

The three-tier ladder has four states (safe, warn, auto, hard) but only three distinct colors — the warn-to-auto escalation has no visual signal. A user seeing /context output cannot tell from color alone whether they're in the "fire a tip" tier (warn) or the "actually compacting" tier (auto).

Use a distinct color for auto (e.g., a lighter orange distinct from the warn yellow) so each of the four states has a visually unique badge.

— DeepSeek/deepseek-v4-pro via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ CI note: Lint check is failing and Windows test is still pending on this commit.

Comment thread packages/core/src/config/config.ts Outdated
'[qwen-code] chatCompression.contextPercentageThreshold has been removed ' +
'and is now controlled by built-in thresholds. Setting will be ignored. ' +
'Note: auto-compaction cannot currently be disabled — the old ' +
'"set threshold to 0 to disable" escape hatch is gone. If you need ' +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Deprecation warning contradicts disabled field

This warning states "auto-compaction cannot currently be disabled" but the same PR introduces chatCompression.disabled: boolean (config.ts:281) which does exactly that. Users migrating from contextPercentageThreshold: 0 will believe there is no replacement.

The same false claim appears in docs/users/configuration/settings.md:148 ("There is currently no replacement to disable auto-compaction").

Suggested change
'"set threshold to 0 to disable" escape hatch is gone. If you need ' +
'Note: to disable auto-compaction, set chatCompression.disabled to true instead. ' +
'If you need to retain full history for other reasons, use /clear between conversations.',

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

// would silently disable hard-tier rescue for the rest of
// the session.
this.lastCandidatesTokenCount = Number.isFinite(
usageMetadata.candidatesTokenCount,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Negative candidatesTokenCount not clamped

The R11.1 Number.isFinite guard catches NaN and Infinity but NOT negative finite values. Number.isFinite(-100_000) returns true, so a hostile/buggy provider reporting candidatesTokenCount: -100000 passes through and is added to the estimate in estimatePromptTokens, effectively subtracting from it.

Concretely: lastPromptTokenCount=176_999 + lastCandidatesTokenCount=-100_000 + small user message → effectiveTokens ≈ 77K, far below any hard threshold. Hard-tier rescue never fires regardless of actual prompt size.

Suggested change
usageMetadata.candidatesTokenCount,
this.lastCandidatesTokenCount = Number.isFinite(
usageMetadata.candidatesTokenCount,
)
? Math.max(0, usageMetadata.candidatesTokenCount as number)
: 0;

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

@@ -1375,6 +1667,23 @@ export class GeminiChat {
// Always update the per-chat counter so this chat (including

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] totalTokenCount fallback double-counts candidates tokens

When promptTokenCount is falsy (0 or undefined), this falls back to totalTokenCount which already includes candidates tokens. Then estimatePromptTokens (tokenEstimation.ts:82-84) adds lastCandidatesTokenCount on top, producing promptTokens + 2×candidatesTokens.

On providers that omit promptTokenCount (some OpenAI-compatible endpoints), this over-estimates by 500–5000 tokens, causing false-positive hard-tier rescue triggers (the HARD_BUFFER is only 3K from the window edge).

Suggested change
// Always update the per-chat counter so this chat (including
usageMetadata.promptTokenCount ?? usageMetadata.totalTokenCount;

Additionally, estimatePromptTokens should only add lastCandidatesTokenCount when lastPromptTokenCount was sourced from promptTokenCount (not totalTokenCount). Consider exposing a flag or restructuring the estimator.

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
// sent on THIS turn.
//
// R11.1: use Number.isFinite so a hostile / buggy provider
// payload (NaN, Infinity, non-number) coerces to 0 instead

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] lastPromptTokenCount missing Number.isFinite guard

Unlike lastCandidatesTokenCount (guarded 18 lines below with the R11.1 Number.isFinite check), lastPromptTokenCount is assigned directly without validation. The R11.1 comment explicitly warns that "NaN >= hard is always false" would "silently disable hard-tier rescue" — yet the same class of bug exists on the far more impactful primary field.

A provider returning -Infinity for promptTokenCount passes the if (lastPromptTokenCount) truthiness check and is stored. -Infinity < auto is always false → cheap-gate never fires → auto-compaction silently disabled.

Suggested change
// payload (NaN, Infinity, non-number) coerces to 0 instead
const rawPromptTokens =
usageMetadata.promptTokenCount ?? usageMetadata.totalTokenCount;
if (rawPromptTokens) {
this.lastPromptTokenCount = Number.isFinite(rawPromptTokens)
? (rawPromptTokens as number)
: this.lastPromptTokenCount;

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

// hard-rescue uses force=true to bypass the breaker, so it would
// otherwise sidestep that gate. Skip at the source.
const autoCompactionDisabled =
this.config.getChatCompression()?.disabled === true;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] autoCompactionDisabled hard-rescue guard has no integration test

The guard suppressing hard-tier rescue when chatCompression.disabled === true has no test at the sendMessageStream level. The service-level cheap-gate NOOP test exists (chatCompressionService.test.ts:675), but nothing verifies that shouldForceFromHard stays false when disabled.

Without this test, a regression removing !autoCompactionDisabled from the wantHardRescue expression would silently force-compress sessions whose user explicitly opted out.

Suggested test: set mockConfig.getChatCompression to return { disabled: true }, set lastPromptTokenCount above the hard threshold, send a message, and assert compressSpy.mock.calls[0][1].force === false.

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

// Services
// ============================================================================

export {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] MAX_CONSECUTIVE_FAILURES not exported from barrel

The design doc's migration guide instructs SDK consumers to "change true to MAX_CONSECUTIVE_FAILURES", but this constant is not re-exported from the package barrel. Consumers following the documented path cannot import { MAX_CONSECUTIVE_FAILURES } from '@qwen-code/qwen-code-core'.

Suggested change
export {
export {
computeThresholds,
MAX_CONSECUTIVE_FAILURES,
SUMMARY_RESERVE,
type CompactionThresholds,
} from './services/chatCompressionService.js';

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

persistedOutputTokens,
);
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Hook contract change: firePostCompactEvent now receives XML-wrapped summary

The summary parameter passed to hooks is now <state_snapshot>...</state_snapshot> (constructed at line 598), whereas previously it was raw model text (thoughts filtered). External hook consumers that parse, display, or store compact_summary will now see XML envelope tags they did not expect.

Consider passing snapshotMatch[1] (the inner content without wrapper tags), or documenting the format change in the hook-event type definition.

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

);
lines.push('');
lines.push(`**${t('Compaction thresholds')}**`);
// R6.13: i18n the labels + derive the reserve hint from the

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Inconsistent i18n: new labels use t(), existing category labels do not

New threshold labels are wrapped in t() (e.g., t('Effective window'), t('Warn threshold')), but adjacent category labels remain raw strings: fmtCategoryRow('System prompt', ...), fmtCategoryRow('Built-in tools', ...), etc.

Non-English locales will see a mixed-language render. Either wrap all labels in t() or add a // TODO(i18n): wrap remaining category labels comment to track the gap.

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

* runtime would treat the session as sitting in.
*/
const DEFAULT_COMPRESSION_THRESHOLD = 0.7;
function currentTier(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] currentTier small-window degradation branch untested

When thresholds.hard === thresholds.auto (e.g., 32K window: both are 22,400), the guard thresholds.hard > thresholds.auto prevents misclassifying the tier as 'hard'. All /context tests use a 200K window where hard > auto, so this branch is never exercised.

Suggested test: use makeMockConfig(32_000) with lastPromptTokenCount = 25_000, assert currentTier === 'auto' (not 'hard').

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

* design discussion.
*/
const SNAPSHOT_REGEX = new RegExp(
`[\\s\\S]*<${COMPRESSION_SNAPSHOT_TAG}>([\\s\\S]*?)</${COMPRESSION_SNAPSHOT_TAG}>`,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Truncation guard false-positive on partial closing tag

SNAPSHOT_REGEX requires the full closing </state_snapshot> tag. When the model writes a complete snapshot body but hits COMPACT_MAX_OUTPUT_TOKENS mid-closing-tag (e.g., emits </state_snap), the regex returns no match and a semantically-complete summary is discarded as COMPRESSION_FAILED_OUTPUT_TRUNCATED.

After 3 such events the breaker trips and auto-compaction stops. Consider accepting partial closing tags or verifying the captured content contains expected child tags (<overall_goal>, <current_plan>) as a more robust completeness check.

— qwen-latest-series-invite-beta-v34 via Qwen Code /review

@LaZzyMan

Copy link
Copy Markdown
Collaborator Author

Closing in favor of #4345 — a clean re-cut of the original spec + early-discovered real bugs (R1-R5, 6 commits, 4288 LOC, no AI-driven scope creep).

This PR ran 12 review rounds and accumulated ~2700 LOC of review-driven additions, much of it self-inflicted regression chains and AI-reviewer scope creep. The PR became un-reviewable.

Substantive review-driven refinements (hard-rescue counter, scratchpad envelope extraction, disable escape hatch, hostile-provider hardening, etc.) are being split into focused follow-up issues for independent review.

Branch state preserved at tag pr-4168-archive-pre-revert (already pushed to origin) for archaeology.

The triage retrospective that drove this revert is also being folded into the project's pr-triage skill — adds a round-weighted decline bar (defaults Suggestions/test gaps/docs/observability to overthinking from round 5+) which review-response lacked.

@LaZzyMan LaZzyMan closed this May 20, 2026
LaZzyMan added a commit that referenced this pull request May 21, 2026
…istoryShallow)

Main landed #4286 (replace structuredClone with shallow copy) which:
  - Reverted #4186's heap-pressure auto-compaction safety net (#4286
    removed HEAP_PRESSURE_COMPRESSION_RATIO because the underlying OOM
    cause was fixed by the shallow-copy refactor)
  - Reverted #4168's consecutiveFailures ladder back to single-shot
    hasFailedCompressionAttempt
  - Introduced getHistoryShallow() / peekLastHistoryEntry() to replace
    structuredClone-based history access
  - Added a Chinese-language design doc draft for this exact redesign

Resolution strategy:
  - Take OUR redesign everywhere it conflicts: three-tier threshold
    ladder, consecutiveFailures circuit breaker, hard-rescue, token
    estimator, hard-rescue debug log, CompressOptions plumbing for
    pendingUserMessage / precomputedEffectiveTokens / trigger.
  - DROP all bypassTokenThreshold / heapPressureCompressionCooldownUntil /
    HEAP_PRESSURE_* / mockGetHeapStatistics / mockHeapPressure code
    (heap-pressure mechanism is gone on main; we're not reviving it).
  - Use main's new getHistoryShallow(true) in chatCompressionService and
    in the hard-tier rescue estimator path (was getHistory(true) before
    main's refactor; the shallow path is what other compaction call
    sites now use).
  - For chatCompressionService.test.ts inline mockChat objects, alias
    getHistoryShallow to the same vi.fn() as getHistory so existing
    .mockReturnValue() calls drive both methods.
  - For the design doc, keep our resolved Open Question 2 closure
    rationale and prepend the round-2 blockquote clarifying the
    Background section describes pre-redesign behavior; take main's
    slightly more thorough SUMMARY_RESERVE paragraph where it explains
    both with/without-thinking cases.
  - Replace the round-2 test that asserted "hard-rescue forwards
    consecutiveFailures=3" with a test compatible with the post-merge
    history-access shape (now using getHistoryShallow).

346 core tests passing; CLI typecheck clean for affected files.
Pre-existing provider-config typecheck errors from main's #4287
refactor are unrelated to this PR and not touched here.
LaZzyMan added a commit that referenced this pull request May 25, 2026
…er (#4345)

* feat(core)!: redesign auto-compaction thresholds with three-tier ladder

Replaces the single 70% proportional threshold with a three-tier ladder
(warn/auto/hard) that combines proportional fallback with absolute
reservation. Large-window models (>=128K) now reserve ~33K instead of
30% of the window, freeing tens of thousands of context tokens that the
old formula wasted.

Other improvements bundled in the same redesign:

- Compression sideQuery now disables thinking and caps maxOutputTokens
  at 20K, matching claude-code so the buffer math is predictable across
  providers (Anthropic/OpenAI/Gemini handle thinking budgets
  inconsistently)
- Failure handling upgraded from one-shot permanent lock to a 3-strike
  circuit breaker; reactive overflow still latches immediately
- New estimatePromptTokens helper closes the lag-by-one-turn and
  first-send-is-0 gaps in lastPromptTokenCount
- Hard-tier rescue pulls reactive overflow recovery forward to before
  the API call, saving an oversized round-trip
- /context command displays the three-tier ladder + current tier
- tipRegistry's context-* tips track the new thresholds instead of
  fixed 50/80/95 percentages

BREAKING CHANGE: chatCompression.contextPercentageThreshold setting is
removed. Settings files containing the field log a one-line deprecation
warning at startup and the value is ignored; behaviour is now controlled
by built-in thresholds via the new computeThresholds() function.

Design: docs/design/auto-compaction-threshold-redesign.md
Plan: docs/plans/2026-05-14-auto-compaction-threshold-redesign.md

* test(core): fix leftover hasFailedCompressionAttempt option in compress test

A pre-existing test case at chatCompressionService.test.ts:678 still
passed `hasFailedCompressionAttempt: false` in the CompressOptions
shape; rebasing onto current main surfaced this as a typecheck error
because the field was renamed to `consecutiveFailures` (Task 7 of the
three-tier ladder migration). Update to `consecutiveFailures: 0` —
semantically equivalent, the test asserts the side-query is called
when `force: true`, no other behaviour change.

* fix(core): drop compaction summary when output hits maxOutputTokens cap

Adds a defensive guard in ChatCompressionService.compress() that detects
when the side-query summary hit COMPACT_MAX_OUTPUT_TOKENS (20K). In that
case the summary is likely truncated mid-content, so we drop it and
return NOOP rather than persist a half-summary. The next send re-tries;
reactive overflow still catches the catastrophic case where the API
rejects the next request as too large.

Documented in the design doc as risk #2; the bot reviewer on PR #4168
correctly pushed for it to land alongside the threshold redesign rather
than as a follow-up since the new 20K cap is what makes truncation
likely in the first place.

* fix(cli): render three-tier thresholds in /context TUI view

The Task 11 redesign updated the non-interactive text formatter
(formatContextUsageText) but left ContextUsage.tsx — the interactive
React component that real /context users see — unchanged. As a result
the TUI still showed the old single "Autocompact buffer" line and none
of the new warn/auto/hard ladder.

Adds a "Compaction thresholds" section after the per-category breakdown:
  - Effective window
  - Warn / Auto / Hard threshold rows with a ▶ marker on the row the
    current usage has crossed
  - Current tier label coloured by severity (safe→green, warn/auto→
    yellow, hard→red)

The existing progress bar legend (Used / Free / Autocompact buffer)
is preserved because it's tied to the three-segment progress bar
visualisation; the new section adds the absolute numbers + tier badge
on top of that.

Caught by the tmux e2e test (PR #4168 ci-monitor follow-up). Pre-fix
the assertion 'Compaction thresholds' missed completely from the TUI;
post-fix the new section renders correctly for fresh and live sessions
on 1M / 200K / 128K windows.

* fix(core,cli): address PR #4168 review batch 4

Behavior fixes:
- MAX_TOKENS truncation guard now returns COMPRESSION_FAILED_EMPTY_SUMMARY
  instead of NOOP so the consecutive-failure breaker actually trips after
  repeated max-length summaries (R1.1).
- Reactive overflow failure increments consecutiveFailures by 1 instead
  of latching to MAX in one shot, so a transient network blip doesn't
  permanently disable auto-compaction. The hard-tier rescue resets the
  counter, which remains the designated recovery path (R1.2).
- /context current-tier classification uses rawOverhead (system + tools +
  memory + skills) as the tier input when API data is not yet available,
  rather than 0 — large inherited contexts no longer silently show 'safe'
  (R2.2).

Performance:
- sendMessageStream computes effectiveTokens ONCE and passes it through
  TryCompressOptions.precomputedEffectiveTokens, so the cheap-gate inside
  service.compress doesn't redo the estimation. Also fixes the
  imageTokenEstimate inconsistency between the rescue and cheap-gate
  paths (R1.3 + R1.4).
- Steady-state path (lastPromptTokenCount > 0) skips the costly
  getHistory(true) clone — estimatePromptTokens only needs the user
  message in that branch.

Code hygiene:
- BYTES_PER_TOKEN → CHARS_PER_TOKEN (inputs are char counts, not byte
  counts; CJK text would mislead under the old name) (R3.1).
- Drop dead getContextUsagePercent helper + index re-export — no callers
  in source after the threshold rewire (R1.5).
- Add a comment on estimatePromptTokens' first-send fallback documenting
  the ~15-20K under-estimate (system prompt + tools + skills) and that
  reactive overflow is the safety net (R3.3).

Tests:
- New CLI ContextUsage.test.tsx exercises the React renderer for the
  three-tier section: section presence, ▶ marker placement per tier,
  current-tier label coloring (R1.6).
- New chatCompressionService.test.ts case pins that a stale
  contextPercentageThreshold: 0 value in user settings no longer
  short-circuits compaction (R2.1).
- New tokenEstimation.test.ts case covers functionResponse (distinct
  nested-parts branch from functionCall) (R3.5).
- New geminiChat.test.ts integration test exercises the real
  ChatCompressionService — not a mock — for the first-send-after-
  inherited-history scenario where lastPromptTokenCount=0 and only the
  full-history estimate can cross the auto threshold (R3.4).

Declined: R3.2 (change `>=` to `>` on the MAX_TOKENS guard). The current
operator catches the at-cap case as suspicious, which is intentional —
landing exactly at the output cap is far more likely truncation than
clean stop given p99.99 ≈ 17K. With R1.1 in place, persistent truncations
trip the breaker after MAX_CONSECUTIVE_FAILURES so the worst case is
bounded.

* fix(core,cli): address PR #4168 review batch 5

- R5.1: tighten /context tier comment + TODO. The rawOverhead-based fix
  doesn't cover `--continue` restores with many history messages (since
  rawOverhead excludes messagesTokens). UI may still show 'safe' for one
  render until the first send. Documented inline and added a TODO to plumb
  chat history into collectContextData for same-source-of-truth as the
  cheap-gate.
- R5.2a: add TODO(finish_reason) at the truncation guard. The `>= cap`
  heuristic false-positives on legitimate at-cap summaries; the proper
  signal is finish_reason which runSideQuery doesn't surface today.
- R5.2b: split telemetry — new CompressionStatus.COMPRESSION_FAILED_OUTPUT_TRUNCATED
  enum value. Distinct from EMPTY_SUMMARY so logs/telemetry can tell
  prompt-quality failures (tune prompt / splitter) from capacity failures
  (raise cap / shrink splitter input). isCompressionFailureStatus()
  treats both as failures so the breaker behavior is unchanged.
- R5.3: expand consecutiveFailures JSDoc to clarify it tracks
  "non-force, non-hard-rescue consecutive failures" — hard-rescue resets
  the counter and force=true skips increments, so the counter is the
  "regular path" health signal only; reactive overflow is the real
  safety net for the force-only paths.
- R5.4: document the CompressOptions field rename
  (hasFailedCompressionAttempt: boolean → consecutiveFailures: number)
  as an SDK breaking change in the design doc with migration guide.

* fix(core): disambiguate hard-rescue from manual /compress orphan-strip

Self-review (dual reviewer / pr-triage round 1) caught a correctness
regression in the hard-rescue path:

`sendMessageStream` calls `tryCompress(force=true)` from inside the
pre-push window when `effectiveTokens >= hard`. The service's
orphan-strip predicate at `chatCompressionService.ts:426-429` gated on
`force` alone, which conflated two distinct call shapes:

  - manual `/compress` (force=true, trigger='manual'): user-initiated
    between turns; trailing model funcCall IS orphaned because no
    funcResponse is coming
  - hard-rescue (force=true, trigger='auto'): automatic mid-turn;
    trailing model funcCall is ACTIVE because its matching funcResponse
    is sitting in the pending `userContent` waiting to be pushed

The strip fired for both, so a hard-rescue triggered mid tool-use loop
would drop the active funcCall. After compression returned and
`userContent` (the funcResponse) was pushed, the next API request
carried tool_result with no matching tool_use → provider validation
error.

The in-code comment at L422-424 already documented this exact
constraint for the auto-compress case (`force=false`), but reusing
`force=true` for hard-rescue silently violated the same constraint.

Fix:
- Gate `hasOrphanedFuncCall` on `compactTrigger === 'manual'` instead
  of `force`. The trigger field already disambiguates intent.
- `sendMessageStream` hard-rescue now passes `trigger: 'auto'`
  explicitly (without it, `force=true` defaults to `trigger='manual'`
  via the `?? (force ? 'manual' : 'auto')` resolver).

Sibling audit for "force=true non-manual callsites":
- `GeminiClient.tryCompressChat` (manual /compress): correct — manual
- `sendMessageStream` hard-rescue: fixed in this commit
- `sendMessageStream` reactive overflow catch: already passes
  trigger='auto'; runs AFTER API call (userContent in history), so if
  it observes a trailing funcCall it IS orphaned but findCompressSplitPoint
  handles the case without needing the strip

RED-first regression test added:
`preserves trailing model+funcCall under hard-rescue (force=true + trigger=auto)`
in `chatCompressionService.test.ts`. Failed against pre-fix code (the
strip dropped the funcCall); passes against the fix.

Adjacent fixes from the same triage round:

- `docs/users/configuration/settings.md`: the
  `chatCompression.contextPercentageThreshold` row still said "use 0
  to disable compression entirely" — code has ignored the value since
  the removal commit. Marked the row REMOVED with migration guidance
  pointing at the design doc.
- `packages/core/src/config/config.ts`: the deprecation warning now
  tells users how to silence it (remove the key) and where to read
  current behavior, instead of just announcing the removal.
- `docs/design/auto-compaction-threshold-redesign.md`: closed Open
  Question 2 (small-window hard/auto collapse) — decision is to NOT
  annotate `/context`, with rationale on file.

Tests: 2395 core tests passing, typecheck clean.

* docs(core): fix tier-collapse direction in auto-compaction design doc

Self-review on the 50bac97 commit caught a direction error in the
M2a Open Question 2 closure note: said `currentTier` skips `'hard'`
and goes to `'auto'` on collapsed windows, which is backwards.

`contextCommand.ts:43-44` checks `tokens >= thresholds.hard` first
(no `hard > auto` guard — that fix lives in a separate follow-up), so
when `hard === auto` the `'hard'` branch matches first and the
`'auto'` band is the empty one. Updated the rationale to describe the
actual collapse direction and cite the source-of-truth file:line.

Conclusion of the open question (don't annotate `/context`) is
unchanged — only the explanation is corrected.

* refactor(core): extract shared in-flight funcCall fixture in compression tests

The auto-compress and hard-rescue tests for "trailing funcCall is
active, not orphaned" shared a byte-identical 4-message history and
mock setup. Pull both into setupInFlightFuncCallFixture() inside the
describe block so each test only contains the scenario name, the
compress() call shape, and its own assertions.

Net -29 LOC, no behavior change.

* fix(core,cli): address PR #4345 round-2 review feedback

- geminiChat: remove pre-call consecutiveFailures reset in hard-rescue.
  force=true already bypasses the breaker check in chatCompressionService;
  the pre-reset was redundant on success (post-call L614 already handles it)
  and *broke* the breaker on failure paths — hard-rescue failures don't
  increment via tryCompress (force=true skips that branch), only the
  reactive overflow path at L992 explicitly increments. With the pre-reset
  the counter oscillated 0↔1 every send and MAX_CONSECUTIVE_FAILURES=3 was
  unreachable. Wrote a RED test asserting the forwarded counter is the
  latched value, not zero; the test failed against the old code and passes
  with the reset removed.

- geminiChat: log hard-tier-rescue triggers via debugLogger.warn including
  effectiveTokens, hard, and the current consecutiveFailures so operators
  debugging "compaction stopped working" have a breadcrumb.

- chatCompressionService: clamp effectiveWindow to >= 0 in computeThresholds
  so the value surfaced in /context stays meaningful for tiny windows
  (window < SUMMARY_RESERVE). auto/warn/hard outputs are unaffected because
  each is Math.max(proportional, absolute) and the proportional branch
  dominates whenever the absolute branch goes negative.

- turn.ts: rewrite COMPRESSION_FAILED_OUTPUT_TRUNCATED docstring. Drop the
  misleading "compression succeeded" framing (the summary is dropped and
  isCompressionFailureStatus returns true) and reference the full enum name
  COMPRESSION_FAILED_EMPTY_SUMMARY instead of the abbreviation.

- contextCommand.test.ts: reword the no-API-data-session test comment.
  collectContextData classifies estimated sessions against rawOverhead;
  with default fixtures rawOverhead lands in `safe`, but heavy
  system-prompt / skill / MCP loads can push it into warn/auto/hard.

- design doc Background: prepend a blockquote clarifying the section
  describes pre-redesign behavior and that the inline file:line references
  point at code before PR #4345 (which removes them).

- ui/types: replace the duplicated ContextThresholds interface with a
  type alias to the core's CompactionThresholds. Field-by-field copy in
  contextCommand.ts becomes a direct spread. ContextUsage.tsx keeps its
  CompactionThresholds React component name — the alias avoids the
  collision a direct import would have caused.

- contextCommand: interpolate the actual reserve value into the
  "(window − 20K reserve)" annotation so SUMMARY_RESERVE retuning doesn't
  leave the text stale.

* fix(core): address PR #4345 round-3 + round-4 review feedback

R3-1: rewrite the stale "Hard-tier rescue resets the counter" comment in
the reactive-overflow path. The R2 commit removed the pre-call reset
from hard-rescue; the only counter-reset path is now the post-call
COMPRESSED branch in tryCompress. Two contradicting comments in the
same file would mislead a future maintainer tracing the lifecycle.

R3-2: rewrite the JSDoc on CompactionThresholds.hard. The "(resets
failure counter)" phrasing was true under the pre-R2 design; after R2
the hard threshold force-triggers compaction and bypasses the breaker,
but does not reset the counter (which only happens on COMPRESSED
success via the post-call branch). The type is consumed by both
geminiChat and the CLI UI (via ContextThresholds alias), so the
authoritative description had to match the actual contract.

R3-3: add a Step 3 to the hard-rescue regression test. The test title
claims "success recovers via the post-call branch" but the original
Steps 1-2 only verified the latched counter was forwarded INTO the
call. Step 3 follows up with a below-hard send and asserts the
forwarded counter is 0 — proving geminiChat.ts:614 ran on the
COMPRESSED result.

R3-4: assert effectiveWindow === 0 on the existing extreme-small-window
test and add a separate zero-window edge case. The Math.max(0, ...)
clamp from R2 was previously unasserted; a regression that removed
the clamp would go undetected.

R4-1: forward originalTokenCount on the breaker-NOOP path in
chatCompressionService.compress() to match the adjacent
threshold-NOOP path (L368-369). Returning {originalTokenCount: 0,
newTokenCount: 0} masked "breaker tripped at N tokens" as
"empty session" in telemetry dashboards.

R4-2a: add debugLogger.warn at the two consecutiveFailures increment
sites (cheap-gate path L586 and reactive-overflow path L955) when
the counter reaches MAX_CONSECUTIVE_FAILURES. The breaker is one of
the PR's headline safety features but, prior to this round, had zero
observability when it tripped. Required importing MAX_CONSECUTIVE_FAILURES
into geminiChat.ts.

R4-3: programmatically link tokenEstimation.ts's CHARS_PER_TOKEN to
compactionInputSlimming.ts's TOKEN_TO_CHAR_RATIO. Both are 4 today
and represent the same generic char/token conversion. Exporting from
compactionInputSlimming and aliasing in tokenEstimation eliminates
the silent-drift hazard the JSDoc already warned about.

Declined (round-weighted bar at round 4):
- R3-5: debugLogger test for hard-rescue trigger — observability test
  coverage is overthinking at round 3+; the log is informational.
- R4-2b: expose breaker state in /context — new feature; out of scope.
- R4-4: render test for auto-tier marker — test coverage gap on
  working code, defer to follow-up PR per round-weighted bar.
- R4-5a: extract makeFakeChat/makeFakeConfig shared factory — pure
  test refactor at round 4, not a fix.
- R4-5b: direct unit test for precomputedEffectiveTokens — exercised
  indirectly via hard-rescue path tests in geminiChat.test.ts.
- R4-6: truncation-guard fallback test for missing candidatesTokenCount
  — code already has a TODO acknowledging the heuristic is imperfect
  (chatCompressionService.ts:549-553); defer.

* fix(core): address PR #4345 round-5 review feedback

R5-1: assert breaker-NOOP forwards originalTokenCount. R4-1 changed the
breaker-NOOP return from `{0, 0}` to `{originalTokenCount, originalTokenCount}`
so telemetry can distinguish "breaker tripped at N tokens" from
"empty session", but the existing test only checked compressionStatus
and newHistory. Now seeds a non-zero originalTokenCount (120K) and
asserts both fields forward it.

R5-2: forward originalTokenCount on the empty-history NOOP. This was
sibling drift on R4-1 — I fixed the cited breaker-NOOP site but missed
the empty-history NOOP. Of 5 NOOP return sites in chatCompressionService,
4 now forward originalTokenCount (breaker, threshold-gate, post-split,
min-compression-fraction) and 1 (this one) was still returning `{0, 0}`,
breaking the project-wide invariant. Now consistent.

R5-3: replace 10 stale line-number references with semantic anchors.
After the R3+R4 push, the line refs in my R2/R3 comments (`geminiChat.ts:614`,
`chatCompressionService.ts:339`, `line 992`, `L627`, `line 944`) no longer
pointed at their original targets — `geminiChat.ts:614` now points at
`setSystemInstruction`'s body, completely unrelated to compaction. The
pattern itself is fragile; semantic phrasing ("the post-call reset in
tryCompress's COMPRESSED handler") doesn't drift when lines shift.

347/347 affected core tests passing locally; typecheck clean.

* fix(core): address PR #4345 round-6 review feedback (R6 sweep)

R6-1: rewrite the stale JSDoc bullet on `consecutiveFailures` (the
"Hard-tier rescue failures" bullet). The old wording said "the counter
is reset to 0 BEFORE the rescue call" — that contradicted R5 which
explicitly removed the pre-call reset. Now the bullet matches the
actual behavior: counter is NOT pre-reset, force=true bypasses the
breaker, post-call COMPRESSED handler resets on success, reactive
overflow is the explicit-increment safety net.

My R5 stale-comment sweep only grep'd inline `//` comments; this JSDoc
on the field declaration slipped through. Re-audited "reset to 0
BEFORE" / "pre-reset" across both packages — single site remaining.

R6-7: assert `passedOpts.trigger === 'auto'` in the hard-rescue test.
This field is the orphan-strip safety wire added by the C1 fix (the
service's `compactTrigger === 'manual'` check would otherwise strip
the trailing active funcCall mid tool-loop). The test asserted force
and pendingUserMessage but not the trigger; a refactor dropping the
'auto' from `trigger: shouldForceFromHard ? 'auto' : undefined` would
silently break orphan-strip safety. Now regression-guarded with a
single-line expect.

164/164 affected core tests passing locally.

Declined per round-weighted bar (round 6 defaults Suggestion / Test
coverage / Style to overthinking):
- R6-2/3/6: test-coverage gaps on working code — defer to follow-up
- R6-4: redundant truthy guard on always-set fields — style nit
- R6-5: text-vs-UI inconsistency on /context — existing test enforces
  current behavior; treat as design decision (offer follow-up if
  reviewer escalates)
- R6-8 (tipRegistry small-window context-high): explicitly closed in
  design doc's Open Question 2 — small windows have empty context-high
  band by design; UI work is out-of-scope for this PR
- R6-9: wasted clone on rare fallback path — Suggestion-level perf
- R6-10 (CompressionMessage missing case): file not in this PR's diff;
  reviewer themselves proposed it as follow-up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants