fix(core): microcompact hook continuations#4840
Conversation
|
@qwen-code /review |
| `cleared ${m.toolsCleared} tool result(s) + ${m.mediaCleared} media (~${m.tokensSaved} tokens), ` + | ||
| `kept ${m.toolsKept} tool / ${m.mediaKept} media`, | ||
| ); | ||
| // Runs on user and cron messages, preserving the original |
There was a problem hiding this comment.
[Suggestion] The old inline comment explained why ToolResult/Retry turns are excluded: "not tool result submissions or retries/hooks so that model latency during a tool-call loop doesn't count as user idle time." The replacement — "preserving the original user-idle semantics" — drops the design-decision rationale. A future maintainer who doesn't know the history may "clean up" the guard to include ToolResult or Retry, reintroducing the tool-call-loop latency mis-count.
Consider restoring the why:
| // Runs on user and cron messages, preserving the original | |
| // Idle cleanup: clear old tool results when idle > threshold. | |
| // Runs on user and cron messages (not tool result submissions | |
| // or retries/hooks), preserving the original user-idle | |
| // semantics — model latency during a tool-call loop must not | |
| // count as user idle time. |
— qwen3.7-max via Qwen Code /review
| expect(markReadEvictedFromHistory).toHaveBeenCalledTimes(1); | ||
| }); | ||
|
|
||
| it('microcompacts old tool results on Hook continuations', async () => { |
There was a problem hiding this comment.
[Suggestion] This test covers the happy path (pre-set stale timestamp → trigger fires → cleanup runs), but several important paths are untested:
??=initialization: whenlastHookMicrocompactionTimestampisnullandlastApiCompletionTimestampis an old value, the Hook should inherit that timestamp and potentially trigger. This is the most logic-dense line in the change.- Negative test: set
lastHookMicrocompactionTimestamp = Date.now()(recent), send a Hook message, and assertsetHistoryis NOT called — guards against unconditional microcompaction on every Hook. - Both-null fallback: both timestamps
null→ should useDate.now()and NOT trigger. - Timestamp reset: after microcompaction fires, assert
lastHookMicrocompactionTimestampis updated to a recent value.
Adding at minimum the negative test and the ??= init test would significantly strengthen coverage for the subtle parts of this change.
— qwen3.7-max via Qwen Code /review
| this.lastHookMicrocompactionTimestamp = Date.now(); | ||
| } else if (messageType === SendMessageType.Hook) { | ||
| this.lastHookMicrocompactionTimestamp ??= | ||
| this.lastApiCompletionTimestamp ?? Date.now(); |
There was a problem hiding this comment.
[Suggestion] The ?? Date.now() fallback in this double-nullish-coalescing chain is not exercised by any of the three new tests — all of them set lastApiCompletionTimestamp to a non-null value, so the second ?? operator is never reached.
Consider adding a test where both fields are null to verify the fallback seeds the checkpoint with Date.now() and no microcompaction fires (gap ≈ 0):
it('seeds Hook checkpoint to now when no API call has completed', async () => {
// ...same setup...
client['lastApiCompletionTimestamp'] = null;
client['lastHookMicrocompactionTimestamp'] = null;
const before = Date.now();
// ...send Hook message, drain stream...
expect(client['lastHookMicrocompactionTimestamp']).toBeGreaterThanOrEqual(before);
expect(setHistory).not.toHaveBeenCalled();
});— qwen3.7-max via Qwen Code /review
✅ Independent Verification Report — PR #4840Built and ran real tests locally for this fix. Beyond confirming the suite is green, I ran a red/green regression proof showing the new tests genuinely fail without the implementation. Verdict: LGTM — recommend merge. Environment
1 — Focused core suite ✅ 147 passedThe PR body cites 144/144; it's 147 now — later commits (
2 — Red/green regression proof ✅ (the new tests actually guard the fix)The PR changes only The 3 failures are exactly the tests that assert compaction happens on a Hook turn:
The 4th Hook test ( This independently reproduces the PR's stated "Before" ( 3 — Typecheck + lint ✅ cleanObservations (non-blocking)
Recommendation: ✅ verified — suite green, the fix is provably exercised by genuine regression tests, and typecheck/lint are clean. Recommend merge. 中文小结本地真实环境(🐧 Linux,Node v22.22.2,
观察(非阻塞):重构是行为保持的(抽出 结论:✅ 验证通过,建议合并。 |
|
Qwen Code review did not complete successfully: Qwen review timed out after 55 minutes. See workflow logs. |
|
Hi @qqqys 👋 — this PR currently has merge conflicts with |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
Code Review — PR #4840
Verdict: Request Changes
The refactoring cleanly extracts inline microcompaction into microcompactIdleHistory() and the Hook-continuation checkpoint logic is well-designed. However, the extracted method drops the error boundary that exists on main, risking stream aborts during long-running goal sessions.
Critical
- Missing try/catch in
microcompactIdleHistory— Currentmainwraps the equivalent inline code intry/catch. IfsetHistory(),fileReadCache.clear(), ormarkReadEvictedFromHistory()throws, the exception propagates through thetry/finallyinsendMessageStream(nocatchblock) and kills the async generator. In the Hook path — the exact long-running goal loop this PR targets — a transient error kills the entire goal session.
Suggestions
- Redundant
evaluateTimeBasedTrigger— Hook branch calls it externally as a guard, thenmicrocompactIdleHistory->microcompactHistorycalls it again internally. Consider havingmicrocompactIdleHistoryreturnbooleanto unify both branches. - Test gaps — Missing tests for:
resetChatresetting the new field, UserQuery seedinglastHookMicrocompactionTimestamp, and a sequential Hook fire-reset-skip cycle. - Log differentiation —
[TIME-BASED MC]prefix does not tag the trigger source (Hook vs UserQuery/Cron), making incident investigation harder.
Reviewed by Qwen Code — 6 findings (1 Critical, 5 Suggestions), 8 parallel agents
|
|
||
| private async microcompactIdleHistory( | ||
| lastCompletionTimestamp: number | null, | ||
| ): Promise<void> { |
There was a problem hiding this comment.
[Critical] Missing try/catch — regression from main
This extracted method has no error boundary. On main, the equivalent inline code at lines 1543-1608 is wrapped in try { ... } catch (err) { debugLogger.error(...) }. If this.getChat().setHistory(), fileReadCache.clear(), or fileReadCache.markReadEvictedFromHistory() throws (e.g. malformed history, cache error), the exception propagates through the try/finally in sendMessageStream (which has no catch block) and aborts the async generator.
In the Hook path — the exact long-running goal loop this PR targets — a single transient error kills the entire goal session.
Suggested fix: wrap the body in try/catch matching the original pattern:
private async microcompactIdleHistory(
lastCompletionTimestamp: number | null,
): Promise<void> {
try {
const mcResult = microcompactHistory(
this.getHistoryShallow(),
lastCompletionTimestamp,
this.config.getClearContextOnIdle(),
);
if (!mcResult.meta) {
return;
}
// ... existing body unchanged ...
} catch (err) {
debugLogger.error(
`[TIME-BASED MC] microcompactIdleHistory failed: ${err instanceof Error ? err.message : String(err)}`,
);
}
}| this.lastApiCompletionTimestamp ?? Date.now(); | ||
| const checkpoint = this.lastHookMicrocompactionTimestamp; | ||
| if ( | ||
| evaluateTimeBasedTrigger( |
There was a problem hiding this comment.
[Suggestion] Redundant evaluateTimeBasedTrigger — consider returning boolean from microcompactIdleHistory
The Hook branch calls evaluateTimeBasedTrigger here as a pre-check, then microcompactIdleHistory calls microcompactHistory which calls it again internally. The UserQuery/Cron branch skips the external guard entirely. This creates an inconsistent pattern between the two branches.
Having microcompactIdleHistory return boolean (whether compaction occurred) eliminates the redundant call and unifies both branches:
} else if (messageType === SendMessageType.Hook) {
this.lastHookMicrocompactionTimestamp ??=
this.lastApiCompletionTimestamp ?? Date.now();
if (await this.microcompactIdleHistory(this.lastHookMicrocompactionTimestamp)) {
this.lastHookMicrocompactionTimestamp = Date.now();
}
}| this.surfacedRelevantAutoMemoryPaths.clear(); | ||
| this.cachedGitStatus = undefined; | ||
| this.lastApiCompletionTimestamp = null; | ||
| this.lastHookMicrocompactionTimestamp = null; |
There was a problem hiding this comment.
[Suggestion] No test for lastHookMicrocompactionTimestamp reset in resetChat
This line resets the Hook checkpoint on /clear, but no test asserts it. A stale checkpoint surviving across resets could cause unexpected compaction behavior (e.g. immediate compaction on the first Hook message after /clear).
| // or retries/hooks, so model latency during a tool-call loop | ||
| // does not count as user idle time. | ||
| await this.microcompactIdleHistory(this.lastApiCompletionTimestamp); | ||
| this.lastHookMicrocompactionTimestamp = Date.now(); |
There was a problem hiding this comment.
[Suggestion] No test for UserQuery/Cron seeding lastHookMicrocompactionTimestamp
This line is the integration bridge between the two branches: UserQuery/Cron sets the baseline so the Hook ??= picks up from the right timestamp. But no test verifies it. If accidentally removed, the Hook ??= fallback would silently use lastApiCompletionTimestamp instead, and the gap measurement could be wrong.
| } | ||
| if (fullyDisarmed) { | ||
| debugLogger.debug( | ||
| `[FILE_READ_CACHE] disarmed fast-path for ` + |
There was a problem hiding this comment.
[Suggestion] No log differentiation between Hook and UserQuery/Cron triggers
The [TIME-BASED MC] debug logs here contain no source tag to distinguish whether compaction was triggered by a user prompt or a Hook continuation. During incident investigation (e.g. 'why did compaction happen mid-goal-loop?'), operators cannot tell from the logs alone.
Consider accepting a source: string parameter and including it in the log prefix:
private async microcompactIdleHistory(
lastCompletionTimestamp: number | null,
source: 'userQuery' | 'hook' = 'userQuery',
): Promise<void> {Then log: `[TIME-BASED MC][${source}] gap ${m.gapMinutes}min > ...`
d46c1b3 to
f8a33cb
Compare
| ) | ||
| ) { | ||
| await this.microcompactIdleHistory(checkpoint); | ||
| this.lastHookMicrocompactionTimestamp = Date.now(); |
There was a problem hiding this comment.
[Suggestion] The Hook checkpoint lastHookMicrocompactionTimestamp = Date.now() advances unconditionally after microcompactIdleHistory() returns, even when that method caught and swallowed an error internally (its try/catch at line ~1401 logs but does not rethrow). If compaction fails during a Hook-only /goal loop, the checkpoint jumps to "now" and the next retry is gated by the full idle threshold (default 60 min), reopening the OOM window this feature was designed to close.
Consider having microcompactIdleHistory return a boolean indicating whether compaction actually ran, and only advance the checkpoint on success:
const compacted = await this.microcompactIdleHistory(checkpoint);
if (compacted) {
this.lastHookMicrocompactionTimestamp = Date.now();
}The UserQuery/Cron path's unconditional = Date.now() on line 1615 is safe because that path's trigger is gated by lastApiCompletionTimestamp, which is not mutated here.
— qwen3.7-max via Qwen Code /review
| } as unknown as GeminiChat; | ||
| client['lastApiCompletionTimestamp'] = Date.now(); | ||
| ( | ||
| client as unknown as { lastHookMicrocompactionTimestamp: number } |
There was a problem hiding this comment.
[Suggestion] Inconsistent private-field access pattern. This test uses a verbose double-cast (client as unknown as { lastHookMicrocompactionTimestamp: number }), while all 10 other accesses in the file use the bracket-notation idiom (client['lastHookMicrocompactionTimestamp']). The cast also narrows the declared type from number | null to number.
| client as unknown as { lastHookMicrocompactionTimestamp: number } | |
| client['lastHookMicrocompactionTimestamp'] = Date.now() - 90 * 60_000; |
— qwen3.7-max via Qwen Code /review
| ) { | ||
| // Idle cleanup: clear old tool results when idle > threshold. | ||
| // Runs on user and cron messages, not tool result submissions | ||
| // or retries/hooks, so model latency during a tool-call loop |
There was a problem hiding this comment.
[Suggestion] The comment says "not tool result submissions or retries/hooks" — implying hooks are excluded from microcompaction entirely — but the immediately following else if (messageType === SendMessageType.Hook) branch gives hooks their own compaction path. A reader skimming this comment will conclude hooks never trigger idle cleanup.
Consider rewriting to describe only what this branch handles:
// Idle cleanup: clear old tool results when the user has been
// idle longer than the configured threshold. Uses the last API
// completion time so model latency during a tool-call loop does
// not count as idle time.Then let the Hook branch's own code speak for itself.
— qwen3.7-max via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI still running. The refactoring cleanly extracts microcompaction into microcompactIdleHistory() with correct error boundary, the Hook checkpoint seeding via ??= is safe, and FileReadCache disarm ordering is correct. 168/168 tests pass, tsc and eslint clean. — qwen3.7-max via Qwen Code /review
🔄 Updated Verification Report — PR #4840 (re-verified at
|
2026-06-08 (6f874f7) |
now (737ac01b3) |
|
|---|---|---|
client.test.ts total |
147 | 168 |
| New tests that fail without the fix | 3 | 8 |
| New: error-boundary / checkpoint-retention tests | — | ✅ added |
I'm also correcting an imprecise statement in my prior report ("old code never compacted on Hook anyway"). That was wrong, and the strengthened tests now prove it — see the mechanism note below.
Environment
| OS | 🐧 Linux 6.12.63 (Debian 13) — the row the PR marks |
| Node / npm | v22.22.2 / 10.9.7 |
| Checkout | isolated git worktree on current origin/main fb1432cf5 + the PR's 2 files, own npm ci (exit 0), driven in tmux |
| Post-merge fidelity | main has not touched client.ts/client.test.ts since merge-base ea4cbafd3, so worktree == true post-merge state (files byte-identical to PR head) |
1 — Focused core suite (with fix) ✅ 168 / 168
$ npx vitest run src/core/client.test.ts
Test Files 1 passed (1)
Tests 168 passed (168)
(PR body says 144 — stale; later commits added cases. All green.)
2 — Revert-proof: new tests genuinely guard the fix ✅
Swapped in origin/main's client.ts (0 occurrences of lastHookMicrocompactionTimestamp) and kept the PR's test file:
RED (origin/main client.ts + PR tests): 8 failed | 160 passed (exit 1)
GREEN (restore PR client.ts): 168 passed
8 of the 9 new tests fail pre-fix, with exact mechanism signatures. The two most diagnostic:
- ✗
microcompacts old tool results on Hook continuations—setHistory()never called pre-fix (fresh API timestamp → gap≈0 → no compaction). - ✗
does not microcompact … when the checkpoint is recent— pre-fix wrongly compacts (setHistorycalled once) because it keys off a 90-min-oldlastApiCompletionTimestampeven though the Hook checkpoint is fresh.
Together these pin both directions of the timestamp-reference change. The 1 new test that passes pre-fix (does not abort the turn when microcompaction cleanup fails) exercises the UserQuery error boundary, which origin/main already has — an honest, expected result.
3 — Built-artifact A/B on the shipped microcompactHistory ✅
Ran the compiled dist/.../microcompact.js against the same 6×read_file history with the two idle references (production default 60-min threshold):
BEFORE-ref (fresh lastApiCompletionTimestamp) -> meta undefined => NO compaction
AFTER-ref (aged lastHookMicrocompactionTimestamp, now-90min) -> COMPACTED:
gap 90min > 60min, toolsCleared=5 toolsKept=1, evictedReadPaths=5
Identical history + identical algorithm; only the idle reference differs — exactly the pivot this PR makes, confirmed on the built code. Structural check: built client.js contains the new gating (6 refs to lastHookMicrocompactionTimestamp); origin/main source has 0 → the builds genuinely differ.
4 — Full @qwen-code/qwen-code-core suite ✅ no regressions
Test Files 365 passed (365)
Tests 10395 passed | 8 skipped (10403)
5 — Typecheck + lint ✅
tsc --noEmit (core) -> exit 0
eslint client.ts client.test.ts -> exit 0
Mechanism — what actually changes (and a PR-body correction)
The PR body says "the previous cleanup path only ran for user and cron turns." That isn't accurate for current main. #4824 already gates compaction on UserQuery || Cron || Hook — Hook turns did enter the cleanup path. The real defect is the idle reference: all three used lastApiCompletionTimestamp, which is refreshed after every API call, so during an active /goal loop the measured gap is ~0 and Hook compaction never fires → unbounded tool-result retention (#4838).
This PR fixes that by giving Hook turns a dedicated lastHookMicrocompactionTimestamp that advances only when a compaction actually fires (or on a user/cron turn). Across a long loop the gap grows with wall-clock, so compaction fires periodically (~every threshold window) and retention is bounded. The fix is correct and well-scoped; only the body's wording needs a tweak so reviewers understand it's a timestamp-reference change, not "adding Hook to the gate."
Non-blocking observations
- User/Cron turns reset the Hook checkpoint unconditionally (
= Date.now()), even when no compaction fired. A user query mid-loop resets the Hook idle clock without compacting, so worst-case retention is measured from the last user turn rather than the last compaction — still bounded (~1 window), and a reasonable "user activity = fresh baseline" choice. Just noting. - On a persistent post-
setHistorycache-disarm throw,microcompactIdleHistoryreturnsfalse, so the checkpoint doesn't advance and each subsequent Hook turn re-attempts compaction. The history was already compacted, so it's near-idempotent (no memory growth, turn not aborted — verified bydoes not abort Hook continuations when microcompaction cleanup fails), but it logs on each turn. Minor; could be avoided by advancing the checkpoint once the history mutation has been applied.
Recommendation: ✅ verified — fix is correct, exhaustively tested (8 proven regression guards), no regressions across 10,395 core tests, typecheck/lint clean, behavior confirmed on the built artifact. Recommend merge after a one-line PR-body wording fix (Hook was already in the compaction gate; the change is the idle-timestamp reference).
中文小结
本次为 复验(更新 2026-06-08 在 6f874f7 的旧报告;PR 已 force-push +7 个提交,主要是错误边界加固与更多测试)。当前 head 737ac01b3,🐧 Linux + Node v22.22.2,于 tmux 中在当前 origin/main(fb1432cf5)隔离 worktree + PR 改动的 2 个文件(== 真实 merge 后状态,主干自 merge-base 起未碰这两个文件)上从零重跑:
- focused 套件全过:168/168(PR 正文写 144,已过时)。
- 回归红/绿证明:换成
origin/main的client.ts(0 处 hook 字段)保留 PR 测试 → 8 failed | 160 passed;还原后 168 全过。9 个新测试有 8 个在无修复时失败,其中两个关键用例同时证明了「时间戳基准」改动的两个方向。唯一在无修复下仍通过的是 UserQuery 错误边界(主干本就有),结果诚实自洽。 - 构建产物 A/B:对编译后的
microcompactHistory,同一份历史、同一算法,仅基准时间戳不同 → fresh 基准(旧)不 compact、aged 基准(新 checkpoint)compact(清 5 个、留 1 个)。 - core 全量套件 10395 通过、0 失败,无回归;tsc / eslint 均 exit 0。
纠正与提醒:PR 正文「之前只在 user/cron 触发清理」不准确——#4824 已经把 Hook 纳入 shouldCompact 门控;真正的问题是基准用了 lastApiCompletionTimestamp(每次 API 调用后刷新,活跃 /goal 循环里 gap≈0 → Hook 永不触发)。本 PR 用独立的 lastHookMicrocompactionTimestamp(仅在实际 compact 或 user/cron 时推进)修复,逻辑正确、范围合理。建议仅微调正文措辞(Hook 早已在门控内,本次是改基准时间戳)。我也借此更正了上一份报告里「旧代码在 Hook 上从不 compact」的不准确表述。
结论:✅ 验证通过,建议合并(正文措辞微调后)。
| messageType === SendMessageType.Cron | ||
| ) { | ||
| // Idle cleanup: clear old tool results when idle > threshold. | ||
| // Runs on user and cron messages, not tool result submissions |
There was a problem hiding this comment.
[Suggestion] The comment says "not tool result submissions or retries/hooks" but Hook messages DO trigger microcompaction via the immediately following else if branch (line 1615). A reader scanning this comment would conclude that Hook messages never trigger microcompaction — the opposite of the actual behavior.
| // Runs on user and cron messages, not tool result submissions | |
| // Idle cleanup: clear old tool results when idle > threshold. | |
| // Runs on user and cron messages. ToolResult and Retry are | |
| // excluded: ToolResult fires on every tool-call return | |
| // (O(history) overhead per call), and mid-loop compaction | |
| // could blank results the model still needs. Hook | |
| // continuations use a separate checkpoint (see below). |
— qwen3.7-max via Qwen Code /review
|
Qwen Code review did not complete successfully: Qwen review timed out after 55 minutes. See workflow logs. |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Test (windows-latest, Node 22.x). The refactoring cleanly extracts microcompaction into microcompactIdleHistory() with correct error boundary, the Hook checkpoint logic correctly fixes the OOM path for long /goal loops, and the test suite (168/168) thoroughly covers the new code paths. — qwen3.7-max via Qwen Code /review
| // Runs on user and cron messages. ToolResult and Retry are | ||
| // excluded; Hook continuations use a separate checkpoint below. | ||
| await this.microcompactIdleHistory(this.lastApiCompletionTimestamp); | ||
| this.lastHookMicrocompactionTimestamp = Date.now(); |
There was a problem hiding this comment.
[Critical] Cron unconditionally resets Hook checkpoint, defeating OOM protection
This line unconditionally sets lastHookMicrocompactionTimestamp = Date.now() on every Cron message. During a long /goal loop, Hook API calls keep lastApiCompletionTimestamp recent, so Cron's own microcompactIdleHistory call finds a small gap and returns false. Meanwhile, the Hook branch's trigger uses lastHookMicrocompactionTimestamp, which Cron keeps resetting to "now."
If Cron arrives at intervals shorter than toolResultsThresholdMinutes (default 60 min), the Hook checkpoint gap never exceeds the threshold, compaction never fires, and tool results accumulate without bound — the exact OOM scenario this PR targets.
| this.lastHookMicrocompactionTimestamp = Date.now(); | |
| const compacted = await this.microcompactIdleHistory(this.lastApiCompletionTimestamp); | |
| if (messageType === SendMessageType.UserQuery || compacted) { | |
| this.lastHookMicrocompactionTimestamp = Date.now(); | |
| } |
— qwen3.7-max via Qwen Code /review
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No new issues found. LGTM! ✅ The implementation correctly splits idle-cleanup between UserQuery/Cron (using lastApiCompletionTimestamp) and Hook continuations (using a separate lastHookMicrocompactionTimestamp checkpoint), achieving the OOM-prevention goal for long /goal loops. Tests are comprehensive (169 client + 190 config tests all passing), and tsc/eslint are clean. — qwen3.7-max via Qwen Code /review
✅ Verification Report #3 — PR #4840 re-verified at
|
| Commit | Effect |
|---|---|
55f4a4387 clarify hook microcompaction comment |
comment-only |
0e104e179 (merge of origin/main) |
PR now contains current main (git rev-list --left-right --count origin/main...HEAD = 0 11) → conflicts resolved as @tanzhenxin requested; also brings the dev.test.js Windows-assertion fix, see scope notes |
92546b81d preserve hook checkpoint on cron no-op |
addresses the first non-blocking observation from my report #2: a Cron turn now refreshes the Hook checkpoint only when compaction actually fired; UserQuery still refreshes unconditionally (reasonable "user activity = fresh baseline"). Pinned by a new test |
Environment: 🐧 Linux 6.12.63 (Debian 13), Node v22.22.2 / npm 10.9.7 — the row the PR marks git worktree checkouts of pr-4840 (= true post-merge state, since it contains current main) and origin/main (1190ef316) for the A/B, each with its own npm ci.
1 — NEW: live runtime A/B (real TUI in tmux + scripted OpenAI-compatible provider)
Built both full bundles with the real chain (patch-package → generate → web-templates/channels/core → esbuild → assets):
- BEFORE =
origin/main1190ef316→lastHookMicrocompactionTimestamprefs in bundle: 0 - AFTER = PR head
92546b81d→ 6 refs (+microcompactIdleHistory×3) — bundles proven distinct
Harness: mock provider that answers every user/hook text turn with one read_file tool-call (rotating real ~900 B files, 900 ms simulated latency) and every tool result with a short unique text, so each turn ends and a configured Stop hook ({"decision":"block","reason":"GOAL_NOT_DONE_<nonce> …"}) forces a Hook continuation — i.e. a faithful /goal-style autonomous loop. Settings: context.clearContextOnIdle = { toolResultsThresholdMinutes: 0.5, toolResultsNumToKeep: 1 }, --yolo, QWEN_CODE_STOP_HOOK_BLOCK_CAP raised. Every outbound provider request was captured to JSONL and scanned for the real [Old tool result content cleared] marker.
| # | Scenario | Bundle | Observed |
|---|---|---|---|
| 1 | Hook loop, 94 s, 102 main requests | BEFORE | 51 read_file results accumulate, 0 compactions across three full 30 s threshold windows — the idle gap is measured from lastApiCompletionTimestamp, which every API call refreshes (~1 s in an active loop), so cleanup never fires. This is exactly the unbounded retention of #4838 |
| 2 | Same loop, same script, 87 s | AFTER | Compaction fires on Hook turns at t = 31.7 s (17 results → 16 cleared, 1 kept) and again at t = 63.3 s (33 cleared, 1 kept) — once per threshold window. The ~31.6 s spacing also proves the checkpoint advances after each firing (otherwise every post-30 s hook turn would re-compact) |
| 3 | Idle control (no hooks): 3 reads → 46 s idle → user turn | BEFORE | User-turn compaction fires (2 cleared, 1 kept) — proves the harness detects compaction on this bundle, so scenario 1's zero is a true negative, and shows the pre-existing user/cron idle path |
| 4 | Same idle control | AFTER | Byte-identical request trace to #3 — the existing user-turn idle cleanup is unregressed |
Excerpt of the AFTER-loop capture (per-request: tool messages in history / of which cleared / still full):
seq t(s) turn tools cleared full
33 29.8 HOOK 16 0 16
35 31.7 HOOK 17 16 1 <- first firing, first hook turn past the 30s window
37 33.6 HOOK 18 16 2
67 61.5 HOOK 33 16 17
69 63.3 HOOK 34 33 1 <- second firing, one window later (checkpoint advanced)
94 86.6 tool 47 33 14
The same trace on BEFORE stays at cleared = 0 for all 102 requests while tools climbs monotonically to 51.
2 — Focused suite ✅ 169/169, and revert-proof red/green ✅ 8/10 flip red
pr-4840 client.ts + PR tests : 169 passed (169) # PR body says 144 — stale, later commits added cases
origin/main client.ts + PR tests : 8 failed | 161 passed
restore PR client.ts : 169 passed (169)
The 8 pre-fix failures are precisely the hook-checkpoint behaviors: resets Hook microcompaction checkpoint, seeds … on user turns, microcompacts old tool results on Hook continuations, does not abort Hook continuations when cleanup fails, skips the next Hook microcompaction after one just ran, initializes … from the last API completion timestamp, does not microcompact … when the checkpoint is recent, seeds … to now when no API call completed. The 2 new tests that pass pre-fix are honest: the UserQuery error-boundary (main already has it) and does not reset the Hook checkpoint when Cron skips microcompaction (main satisfies it vacuously; the test pins the new Cron semantics against regression).
3 — Full suites, typecheck, lint ✅ zero PR-attributable failures
| Check | Result |
|---|---|
packages/core full suite |
365 files, 10396 passed, 8 skipped, 0 failed |
packages/cli full suite |
7542 passed, 9 skipped, 5 failed — all 5 fail identically on origin/main (root-user + headless-clipboard environment: workspaceAgents/cleanup tests expect unlink/rm to fail, root succeeds; wl-paste absent). 0 attributable |
scripts/tests (incl. the PR-touched dev.test.js) |
dev.test.js ✅; 1 failure in install-script.test.js reproduces identically on origin/main (root env, npm -g ENOTEMPTY) |
tsc --noEmit (core), eslint on changed TS files |
exit 0 / exit 0 |
Built-artifact A/B on compiled microcompactHistory (60 min default threshold, identical 6×read_file history) |
fresh ref (3 s) → meta: undefined, no compaction; aged ref (90 min) → toolsCleared: 5, toolsKept: 1 — same algorithm, only the idle reference differs, on shipped code |
4 — Scope notes (the changeset grew from 2 files to 8 — all accounted for)
client.ts/client.test.ts— the fix (now with the extracted best-effortmicrocompactIdleHistory()helper) + 10 new tests.consent.ts,extensions/utils.ts,useGeminiStream.ts,config.test.ts,qc-helper/SKILL.md— prettier normalization only: verifiedprettier --checkflags exactly these 5 on currentorigin/mainand they're clean on the PR;git diffshows no behavioral hunks.scripts/tests/dev.test.js— makes the win32 launcher assertions separator-agnostic (\vs/). Relevant context: the 3 most recentmain-branch CI runs are red withTest (windows-latest, Node 22.x)as the failing job, while this PR's Windows check is green (all of this PR's checks pass) — consistent with this normalization being the fix that job needs.
Non-blocking, carried over from report #2
The PR body still says cleanup "previously only ran for user and cron turns" and "144/144". Since #4824, Hook turns were already in the compaction gate — the actual (and correct) change is the idle reference: lastApiCompletionTimestamp (refreshed by every API call → never fires mid-loop, scenario 1 above shows it live) → a dedicated lastHookMicrocompactionTimestamp that advances only on real compaction or user activity. Suggest the one-line wording tweak + the test-count refresh so reviewers aren't misled; the code itself is right.
Recommendation: ✅ verified — the fix is correct and now demonstrated end-to-end on live hook-continuation traffic (BEFORE reproduces #4838's unbounded retention; AFTER bounds it with per-window hook-turn compaction and an advancing checkpoint), the existing user/cron path is unregressed, 8 regression guards genuinely flip red without the implementation, and 17.9k core+cli tests show zero PR-attributable failures. Recommend merge after the PR-body wording touch-up.
中文小结
第 3 次复验(更新 06-08 6f874f700、06-09 737ac01b3 两份报告)。当前 head 92546b81d 已并入最新 origin/main(领先 11、落后 0,可直接合并);新增 3 个提交中,92546b81d(Cron 空转不再重置 Hook checkpoint)正是对我上一份报告第一条观察的落实,并配了回归测试。
本次新增的核心证据 —— tmux 真实运行时 A/B:用完整链路分别构建 origin/main 与 PR 两个真实 bundle(产物中新字段引用 0 vs 6,确证不同),以脚本化 mock provider(每轮真实 read_file、900ms 延迟)+ 始终 block 的 Stop hook 构成真实的 /goal 式自治循环,阈值设 0.5 分钟,逐请求捕获实际发往 provider 的 payload:
- BEFORE + hook 循环:94 秒、51 个工具结果累积,跨 3 个阈值窗口 0 次 compaction(基准
lastApiCompletionTimestamp每次 API 调用都被刷新,活跃循环中 gap≈1s 永远到不了 30s)——即 BUG: Hook continuations skip tool-result microcompaction in long /goal loops #4838 的无界增长,现场复现。 - AFTER + 同样脚本:t=31.7s 的 Hook turn 首次 compaction(清 16 留 1),t=63.3s 第二次(清 33 留 1)——每个阈值窗口一次;两次间隔 ≈31.6s 同时证明 checkpoint 在触发后正确推进。
3/4. 空闲对照(无 hook,46s 空闲后 user turn):两个 bundle 行为逐字节一致、都正确 compact——既证明检测手段对旧 bundle 有效(循环中的 0 是真阴性),也证明既有 user/cron 路径无回归。
常规验证全部重做:focused 169/169(正文写 144 已过时);revert-proof 红/绿 = 换回 main 的 client.ts 后 10 个新测试中 8 个失败(恢复后全绿,2 个通过者各有合理原因);core 全量 10396/0;cli 全量 7542 通过、5 个失败全部在 origin/main 上原样复现(root/无剪贴板环境),PR 归因失败为 0;dev.test.js(PR 触碰)通过且使本 PR 的 Windows CI 转绿(路径分隔符断言归一化);tsc/eslint exit 0;编译产物纯函数 A/B 同样仅因基准时间戳不同而行为相反。新增的 5 个非核心文件改动均为 prettier 归一化(main 上这 5 个文件当前确实 prettier 不干净),无行为变化。
结论:✅ 验证通过,建议合并。仍建议顺手把正文的「之前只在 user/cron 运行」改为「基准时间戳从 lastApiCompletionTimestamp 换为独立的 Hook checkpoint」并更新 144 的旧数字(非阻塞)。
The two Windows-targeted dev.js launcher tests added in #4728 mock existsSync with forward-slash suffix matching and assert spawn args via forward-slash stringContaining. On a real windows-latest runner dev.js builds these paths with path.join, which yields backslashes, so the existsSync mock never matches, dev.js takes the bare tsx.cmd shell fallback, and both tests fail. On macOS/Linux the platform() mock plus real forward-slash joins keep them green, which is why main CI has been red only on the Windows job since 423cac1. Normalize separators in both the existsSync mocks and the received spawn arguments before asserting, so the tests pass on every host OS. Same content as the fix bundled into #4840's merge commit 0e104e1, extracted into a standalone test-only change so main recovers without waiting on a core-behavior PR; both merge cleanly in either order. Co-authored-by: qqqys <qys177@gmail.com>
DragonnZhang
left a comment
There was a problem hiding this comment.
Code Review — PR #4840
Verdict: Approve
The refactoring cleanly extracts inline microcompaction into microcompactIdleHistory() with a proper try/catch error boundary that matches the original inline code on main. The Hook continuation checkpoint logic is well-designed:
- Separate checkpoint:
lastHookMicrocompactionTimestampcorrectly decouples Hook compaction cadence from user/cron idle detection, preventing unbounded tool-result retention during long/goalruns without disturbing the existinglastApiCompletionTimestamppath. - Seeding via
??=: The lazy initialization fromlastApiCompletionTimestamp(withDate.now()fallback) ensures the first Hook turn after a fresh session doesn't spuriously compact, while subsequent Hook turns use their own cadence. - Conditional Cron seeding: Cron only advances the Hook checkpoint when compaction actually ran (
compacted === true), which correctly avoids resetting the checkpoint on no-op Cron turns. - Error resilience:
microcompactIdleHistorycatches all errors and returnsfalse, leaving the checkpoint unchanged so the next Hook turn retries. The test'does not abort Hook continuations when microcompaction cleanup fails'validates this. - Test coverage: 8 new tests cover reset, seeding, error boundary, hook fire/skip, initialization, and Cron isolation.
CI is green (23/23 checks). Previously raised Critical comments (missing try/catch, Cron unconditionally resetting checkpoint) are stale — both have been addressed in the current HEAD.
— qwen3-coder via Qwen Code /review
DragonnZhang
left a comment
There was a problem hiding this comment.
No new issues found. LGTM.
The implementation correctly decouples Hook microcompaction from the API completion timestamp, using a dedicated lastHookMicrocompactionTimestamp checkpoint that advances only on real compaction or user activity. The error boundary in microcompactIdleHistory() is properly structured with try/catch, preserving the checkpoint on failure so the next Hook turn retries. Conditional Cron seeding (only when compacted === true) correctly avoids resetting the checkpoint on no-op turns.
CI is green (all checks SUCCESS). Tests are comprehensive with 10 new regression guards. Previously raised critical issues (missing try/catch) are fully addressed in the current HEAD.
— automated review bot
- Add telemetry: emit logChatCompression event in compressFast() for usage tracking - Add /compress-fast to docs/users/features/commands.md - Use CompressionStatus.NOOP enum instead of token count comparison for NOOP detection - Deduplicate disarm logic in microcompactIdleHistory to use shared disarmFileReadCacheAfterEviction method (resolves conflict with upstream QwenLM#4840)
- Add telemetry: emit logChatCompression event in compressFast() for usage tracking - Add /compress-fast to docs/users/features/commands.md - Use CompressionStatus.NOOP enum instead of token count comparison for NOOP detection - Deduplicate disarm logic in microcompactIdleHistory to use shared disarmFileReadCacheAfterEviction method (resolves conflict with upstream QwenLM#4840)
…ompression (#4893) * feat(cli): add /compress-fast command for no-LLM rule-based context compression Adds /compress-fast, a new slash command that compresses context without any LLM side-query. It combines two rule-based steps: 1. Force microcompaction — clears old tool results and media parts, keeping the most recent N (default 5, configurable via toolResultsNumToKeep). Uses a new { force: true } option on microcompactHistory() to skip the time-based trigger. 2. Strip thinking blocks — removes thought parts from all model turns, keeping text and tool_use parts intact. Uses setHistory() for zero-latency history replacement (no session rebuild, deferred tools survive). Writes a chat_compression checkpoint to JSONL so --resume works identically to /compress. Post-compression, tryCompressChatFast() surgically disarms affected file paths from FileReadCache via markReadEvictedFromHistory(), falling back to clear() only when paths can't be resolved. Resolves #4264. * fix(cli): address review comments for /compress-fast PR - Add test coverage for tryCompressChatFast FileReadCache disarming (NOOP, clear, surgical disarm with inode miss, full success) - Fix weak assertions in geminiChat compressFast tests: - NOOP test now strictly asserts CompressionStatus.NOOP - lastPromptTokenCount test guarantees COMPRESSED with larger history - Register 'No compression needed.' i18n key in en/zh/zh-TW locales Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address remaining review comments for /compress-fast PR - Fix token estimation: use same estimator (estimateContentTokens) on both sides of the NOOP gate, then delta-adjust API-authoritative lastPromptTokenCount instead of replacing it with char/4 heuristic - Handle lastPromptTokenCount=0 fallback for fresh/continued sessions - Extract duplicated FileReadCache disarm logic into shared disarmFileReadCacheAfterEviction() method with debug logs - Remove redundant setLastPromptTokenCount call from tryCompressChatFast - Update tests for delta-adjustment and zero-fallback behavior * fix(cli,core): address second-round review feedback - Add telemetry: emit logChatCompression event in compressFast() for usage tracking - Add /compress-fast to docs/users/features/commands.md - Use CompressionStatus.NOOP enum instead of token count comparison for NOOP detection - Deduplicate disarm logic in microcompactIdleHistory to use shared disarmFileReadCacheAfterEviction method (resolves conflict with upstream #4840) --------- Co-authored-by: 俊良 <zzj542558@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
What this PR does
This PR lets Hook continuations periodically microcompact old tool results during long-running autonomous loops such as
/goal, while keeping the existing user/cron idle cleanup behavior intact.Why it's needed
Long
/goalruns re-enter the model through Hook continuations rather than fresh user prompts. The previous cleanup path only ran for user and cron turns, so old tool outputs could remain in chat history until another user or cron turn happened. This leaves long goal sessions exposed to unbounded tool-result retention, which is one of the memory-growth paths identified in #4838 and discussed from #4815.Reviewer Test Plan
How to verify
Run the focused core client suite and confirm the new Hook continuation regression test passes. The test keeps the normal API completion timestamp fresh while making the Hook cleanup checkpoint old, then verifies that a Hook turn compacts stale
read_filetool results and disarms the file-read fast path without blanket-clearing the cache.Evidence (Before & After)
Before: the new regression failed because Hook turns did not call history microcompaction, so
setHistory()was never invoked. After:cd packages/core && npx vitest run src/core/client.test.tspasses with 144/144 tests.Tested on
Environment (optional)
Node/npm workspace local tests in this checkout.
Risk & Scope
Linked Issues
Fixes #4838. Related to #4815.
中文说明
这个 PR 做了什么
这个 PR 让 Hook continuation 在长时间自动循环(例如
/goal)期间也能周期性 microcompact 旧 tool result,同时保持现有 user/cron idle cleanup 行为不变。为什么需要
长时间
/goal运行会通过 Hook continuation 重新进入模型,而不是新的用户输入。之前的 cleanup 路径只在 user 和 cron turn 上运行,所以旧工具输出可能一直留在 chat history 里,直到下一次 user 或 cron turn。这样会让长 goal session 暴露在 tool-result retention 不断增长的风险下,这是 #4838 记录并从 #4815 讨论中确认的内存增长路径之一。Reviewer Test Plan
如何验证
运行 core client focused suite,并确认新的 Hook continuation 回归测试通过。该测试保持普通 API completion timestamp 很新,同时让 Hook cleanup checkpoint 变旧,然后验证 Hook turn 会 compact 过期的
read_filetool result,并在不 blanket clear cache 的情况下 disarm file-read fast path。证据(Before & After)
Before:新增回归测试失败,因为 Hook turn 没有调用 history microcompaction,
setHistory()从未被调用。After:cd packages/core && npx vitest run src/core/client.test.ts通过,144/144 tests。测试平台
环境(可选)
当前 checkout 的 Node/npm workspace 本地测试。
风险与范围
关联 Issue
Fixes #4838. Related to #4815.