feat(cli): add /compress-fast command for no-LLM rule-based context compression#4893
Conversation
e4f700e to
4b2c6b3
Compare
| // Lightweight: setHistory() already called in compressFast(). | ||
| // Reuse microcompaction's surgical FileReadCache disarm pattern. | ||
| const m = microcompactMeta; | ||
| const fileReadCache = this.config.getFileReadCache(); |
There was a problem hiding this comment.
[Suggestion] This FileReadCache disarm block is a structural copy of the auto-compression path (lines 1557-1597). Two copies of the same ~20-line branching logic will drift over time. The new copy also drops two debug log messages present in the original (the success log for surgical disarm and the "unresolvable path" explanation before blanket clear), reducing observability in the /compress-fast path.
Consider extracting into a private method on GeminiClient:
private async disarmFileReadCacheAfterEviction(
meta: MicrocompactMeta,
logTag: string,
): Promise<void> {
// shared disarm logic with debug logs
}Then both call sites reduce to await this.disarmFileReadCacheAfterEviction(m, 'compress-fast') / await this.disarmFileReadCacheAfterEviction(m, 'microcompaction').
— qwen3.7-max via Qwen Code /review
| expect(firstModel?.parts).toEqual([{ text: 'response text' }]); | ||
| }); | ||
|
|
||
| it('NOOP when no tool calls and no thinking', () => { |
There was a problem hiding this comment.
[Suggestion] Two compressFast tests use weak assertions that cannot fail:
-
This test accepts both
NOOPandCOMPRESSED— tautological since those are the only two possibleCompressionStatusvalues. A bug that always returns either one would go undetected. -
The "updates lastPromptTokenCount on COMPRESSED" test gates its core assertion behind
if (result.info.compressionStatus === COMPRESSED). If NOOP fires (possible given the small history), the test silently passes.
For test (1), set lastPromptTokenCount to a value that guarantees NOOP, then assert strictly:
expect(result.info.compressionStatus).toBe(CompressionStatus.NOOP);For test (2), construct a history with substantial thinking parts and set lastPromptTokenCount high enough to guarantee COMPRESSED, then assert unconditionally.
— qwen3.7-max via Qwen Code /review
4b2c6b3 to
32dda43
Compare
| }); | ||
| this.setHistory(newHistory); | ||
| clearDetailedSpanState(); | ||
| this.lastPromptTokenCount = afterTokens; |
There was a problem hiding this comment.
[Suggestion] afterTokens comes from estimateContentTokens() (char/4 heuristic, ~30% underestimate). Overwriting lastPromptTokenCount here replaces the API-authoritative count with a heuristic that persists until the next API call. This degrades the hard-rescue threshold gate and auto-compaction trigger — exactly the safety nets that exist to prevent context-overflow 400 errors.
The existing /compress path avoids this because tryCompress derives newTokenCount from the compression API response (authoritative), not a heuristic.
| this.lastPromptTokenCount = afterTokens; | |
| // Don't overwrite API-authoritative count with char/4 heuristic. | |
| // Next sendMessageStream API response will update it. | |
| this.telemetryService?.setLastPromptTokenCount(afterTokens); |
— qwen3.7-max via Qwen Code /review
32dda43 to
f976e05
Compare
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No high-confidence issues found. All 224 tests pass, no lint/typecheck errors. LGTM! ✅ — qwen3.7-max via Qwen Code /review
f976e05 to
f51b3a4
Compare
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
All previously raised issues have been addressed in this revision. The implementation is well-structured, well-tested, and follows existing conventions. LGTM! ✅ — qwen3.7-max via Qwen Code /review
|
The main flow looks right to me — the earlier review rounds (FileReadCache disarm dedup, delta-based token adjustment, test assertions) are all properly addressed, and CI is green across the matrix. Two follow-ups worth tightening before merge:
Two non-blocking nits:
|
f51b3a4 to
da6f728
Compare
|
|
||
| const doCompress = async () => await geminiClient.tryCompressChatFast(); | ||
|
|
||
| if (executionMode === 'acp') { |
There was a problem hiding this comment.
[Suggestion] compressFastCommand does not read context.abortSignal or pass it through to tryCompressChatFast. The sibling compressCommand extracts abortSignal (line 34), passes it to tryCompressChat (line 83), and guards both the post-compression path (if (abortSignal?.aborted) { return; } at line 141) and the error path.
While /compress-fast is fast (no LLM call), the post-compression disarmFileReadCacheAfterEviction performs async fsPromises.stat() calls. If the user presses ESC during this window, the UI pending item stays visible and history mutations proceed despite cancellation intent.
Add abortSignal extraction and guards matching the /compress pattern:
| if (executionMode === 'acp') { | |
| const { ui } = context; | |
| const abortSignal = context.abortSignal; | |
| const executionMode = context.executionMode ?? 'interactive'; |
Then add if (abortSignal?.aborted) { return; } after await doCompress() (line 95) and in the catch block.
— qwen3.7-max via Qwen Code /review
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
|
Ran this end-to-end locally on the PR head ( Setup: built Unit tests (the PR's own suites): 397 passed — E2E in the real TUI:
JSONL checkpoint matches the {"info":{"originalTokenCount":30481,"newTokenCount":25901,"compressionStatus":1,"triggerReason":"manual"},"compressedHistory":[…]}One honest caveat: Code-wise this LGTM. Real terminal output (live tmux run, API key redacted — prompts were issued in Chinese; click to expand) |
|
The red CI here is unrelated to your changes — the failing tests are |
…ompression
Adds /compress-fast, a new slash command that compresses context without
any LLM side-query. It combines two rule-based steps:
1. Force microcompaction — clears old tool results and media parts,
keeping the most recent N (default 5, configurable via
toolResultsNumToKeep). Uses a new { force: true } option on
microcompactHistory() to skip the time-based trigger.
2. Strip thinking blocks — removes thought parts from all model turns,
keeping text and tool_use parts intact.
Uses setHistory() for zero-latency history replacement (no session
rebuild, deferred tools survive). Writes a chat_compression checkpoint
to JSONL so --resume works identically to /compress.
Post-compression, tryCompressChatFast() surgically disarms affected
file paths from FileReadCache via markReadEvictedFromHistory(), falling
back to clear() only when paths can't be resolved.
Resolves QwenLM#4264.
- Add test coverage for tryCompressChatFast FileReadCache disarming (NOOP, clear, surgical disarm with inode miss, full success) - Fix weak assertions in geminiChat compressFast tests: - NOOP test now strictly asserts CompressionStatus.NOOP - lastPromptTokenCount test guarantees COMPRESSED with larger history - Register 'No compression needed.' i18n key in en/zh/zh-TW locales Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Fix token estimation: use same estimator (estimateContentTokens) on both sides of the NOOP gate, then delta-adjust API-authoritative lastPromptTokenCount instead of replacing it with char/4 heuristic - Handle lastPromptTokenCount=0 fallback for fresh/continued sessions - Extract duplicated FileReadCache disarm logic into shared disarmFileReadCacheAfterEviction() method with debug logs - Remove redundant setLastPromptTokenCount call from tryCompressChatFast - Update tests for delta-adjustment and zero-fallback behavior
- Add telemetry: emit logChatCompression event in compressFast() for usage tracking - Add /compress-fast to docs/users/features/commands.md - Use CompressionStatus.NOOP enum instead of token count comparison for NOOP detection - Deduplicate disarm logic in microcompactIdleHistory to use shared disarmFileReadCacheAfterEviction method (resolves conflict with upstream QwenLM#4840)
da6f728 to
28c526b
Compare
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
yiliang114
left a comment
There was a problem hiding this comment.
No further issues from my pass. The latest head matches the reviewed revision, the focused local test suites passed, and CI is green. Thanks for the careful follow-ups.
What this PR does
Adds
/compress-fast, a new slash command that compresses conversation context without any LLM side-query. It combines two rule-based steps: (1) force microcompaction to clear old tool results and media parts while keeping the most recent N, and (2) strippingthoughtparts from all model turns. The result is a significantly smaller history — typically freeing thousands of tokens — at zero API latency.A
chat_compressioncheckpoint is written to JSONL so--resumeworks exactly as it does after/compress.Why it's needed
/compressrelies on an LLM side-query (~2-5s, ~30K tokens) to summarise history. For local model deployments and users who just want quick space reclamation, this is too slow./compress-fastruns entirely rules-based: no API call, no token cost, instant feedback. It complements/compress— use/compress-fastwhen you need space right now, and/compresswhen you want semantic summary quality.Resolves #4264.
Reviewer Test Plan
How to verify
# Unit tests npx vitest run \ packages/core/src/services/microcompaction/microcompact.test.ts \ packages/core/src/core/geminiChat.test.ts \ packages/cli/src/ui/commands/compressFastCommand.test.ts \ packages/cli/src/services/BuiltinCommandLoader.test.tsManual smoke test in interactive mode:
Evidence (Before & After)
TUI change: a new
COMPRESSIONhistory item appears after running/compress-fast, showing the token reduction (e.g.15,432 → 8,210). This is identical UX to/compress.Non-UI artifacts: the JSONL transcript gains a
chat_compressionrecord withcompressionStatus: COMPRESSEDandtriggerReason: manual, matching the/compresscheckpoint format.Tested on
Environment (optional)
Local:
npm run devon macOS, Node 22.Risk & Scope
/compressis available if deeper summarization is needed.estimateContentTokensmay have edge cases. The command intentionally does NOT rebuild the session viastartChat()— deferred tools survive, unlike a/clear. This is both a feature (fast, preserves state) and a limitation (does not reclaim system prompt tokens).microcompactHistory()gains an optional{ force: true }parameter that existing callers don't pass.stripThoughtPartsFromContentremains module-private.Linked Issues
Closes #4264
中文说明
这个 PR 做了什么
新增
/compress-fast斜杠命令,在不发起任何 LLM 侧边查询的情况下压缩对话上下文。它组合了两个基于规则的步骤:(1) 强制 microcompaction 清理旧的工具结果和媒体内容,保留最近 N 个;(2) 剥离所有模型回复中的thought部分。结果是在零 API 延迟下显著缩减 history token 数。会写入
chat_compressioncheckpoint 到 JSONL,--resume的行为与/compress完全一致。为什么需要
/compress依赖 LLM 侧边查询来生成摘要(约 2-5 秒,消耗约 30K token)。对于本地模型部署或只想快速释放空间的用户来说太慢了。/compress-fast纯规则驱动:无 API 调用、无 token 开销、即时响应。它与/compress互补——需要立即释放空间时用/compress-fast,需要语义摘要质量时用/compress。解决 #4264。
Reviewer Test Plan
(测试步骤同上,此处省略以保持可读性。)
风险与范围
/compress。estimateContentTokens可能存在边界情况。命令有意不使用startChat()重建 session——deferred tools 会保留,不像/clear。这既是优点(快、保留状态)也是局限(无法回收 system prompt token)。microcompactHistory()新增可选的{ force: true }参数,现有调用方不传此参数。stripThoughtPartsFromContent保持模块私有。关联 Issues
Closes #4264