feat(core,cli): two-phase session listing for instant /resume first frame#3989
feat(core,cli): two-phase session listing for instant /resume first frame#3989qqqys wants to merge 12 commits into
Conversation
`listSessions` previously called `countSessionMessages` per file, which streamed the entire JSONL through `readline` to count unique user/assistant UUIDs. For a project with N sessions averaging M bytes each, every /resume open paid O(N · M) wall time before showing the picker — by far the dominant cost once a project accumulated many multi-MB sessions. This change: - Drops the per-file count from listSessions / findSessionsByTitle. `SessionListItem.messageCount` is now optional. Callers that need a count call the new public `SessionService.countSessionMessages (sessionId)` lazily — typically only when a SessionPreview panel is about to display the badge. - Pools a single 64KB tail-read buffer across the per-file metadata reads in listSessions / findSessionsByTitle, mirroring the pattern in claude-code's `enrichLogs`. The two helpers in `sessionStorageUtils` (`readLastJsonStringFieldSync` and `readLastJsonStringFieldsSync`) accept an optional caller-owned scratch buffer; one-off callers (rename, single-session lookup) pass nothing and keep the original alloc behaviour. - Updates SessionPicker's row metadata to omit the "N messages" segment when `messageCount` isn't available, keeping the visual layout intact. Tests: - Pin that listSessions does not populate messageCount (regression guard against silently re-introducing the per-file scan). - Smoke test the buffer-pool plumbing — same caller-owned buffer handed to two reads of different file sizes returns both correct values without state bleed. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Tightens the title-write / title-read invariant so the picker's
metadata read is bounded to a fixed 2 × 64KB per file, regardless
of session length, with no fallback that scales with file size.
Writer (`ChatRecordingService`):
- New `bytesSinceTitleAnchor` counter and `TITLE_REANCHOR_BYTES`
threshold (32 KB, half of LITE_READ_BUF_SIZE).
- `appendRecord` now updates the counter and, once it crosses the
threshold while a title is set, re-appends a fresh `custom_title`
record to EOF (`reanchorTitle`). The recursive append routes
back through the same tracking path, which sees the title
record and resets the counter to zero.
- Sessions that never set a title pay zero overhead — the early
return in `updateTitleAnchorTracking` short-circuits.
Reader (`sessionStorageUtils`):
- `readLastJsonStringFieldSync` / `readLastJsonStringFieldsSync`
now read the file's last 64 KB, fall back to the first 64 KB
on miss, and return `undefined` if neither contains the field.
The previous Phase-2 streaming full-file scan (capped at 64 MB)
is removed entirely.
- The pooled scratch buffer (already optional) now backs both the
tail and head reads — only one allocation per `listSessions`
page even with the head fallback firing.
- `MAX_FULL_SCAN_BYTES` constant deleted (unused).
The two changes are coupled: the reader's tighter bound only works
because the writer guarantees the title stays in tail. A title
buried mid-file is now intentionally `undefined` (the picker
falls back to `firstPrompt` for display) rather than triggering a
full-file scan that would freeze the UI on long agent transcripts.
Tests:
- `chatRecordingService.customTitle.test.ts` — three scenarios
pinning the re-anchor invariant: threshold trigger, no-spurious
on no-title sessions, no-trigger on small write bursts.
- `sessionStorageUtils.test.ts` — replaces Phase-2 tests with
head-window fallback tests; pins the new "buried beyond both
windows returns undefined" contract; updates buffer-pool reuse
test to cover both tail and head reads with the same scratch.
E2E coverage (35/35 scenarios on a separate harness against real
fs): R1–R10 reader contract, M1–M4 multi-field, W1–W7 writer
re-anchor, L1–L6 listing latency + regression pins, S1 stress
(200 × 3 MB), C1 concurrency, I1–I5 writer/reader integration.
Measured 2.6× speedup on listSessions(50) over 50 × 4 MB
sessions vs the legacy `countSessionMessages` baseline; speedup
scales linearly with average file size.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- countSessionMessages: valid id counts unique user/assistant uuids and ignores other types/malformed lines; invalid id short-circuits without filesystem access; ENOENT degrades to 0 instead of bubbling. - readLastJsonStringFieldsSync: mirror the single-field variant's scratch-buffer reuse test across a tail-hit then head-fallback to catch any decode that ignores bytesRead. - ChatRecordingService re-anchor: legacy resumed session (source undefined) must omit titleSource on threshold-triggered re-anchor, not silently reclassify as 'manual'. Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
`String.length` undercounts the on-disk size of multi-byte payloads (CJK, emoji are 1 UTF-16 unit but 3 UTF-8 bytes), so a session full of CJK content could push >96KB of writes past the last anchor before the 32KB threshold thinks it has — silently drifting the title past the 64KB tail window the picker scans. Switch the byte counter to `Buffer.byteLength(..., 'utf8')` for parity with `jsonl.writeLine`. Also reset `bytesSinceTitleAnchor` to 0 in the reanchor catch: without it, a failing reanchor pins the counter at the threshold and turns a single transient I/O fault into a per-record retry storm. One missed anchor is the right tradeoff — finalize() will re-emit on the next lifecycle event. Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
The 64KB head-window fallback in `readLastJsonStringFieldSync` / `readLastJsonStringFieldsSync` reads a fixed slice and hands it straight to the extractor — its trailing bytes can fall mid-record. A partial line whose `customTitle` value happens to close inside the buffer but whose body extends past 64KB would otherwise win the latest-match race and surface as the picker's title. Drop everything past the last `\n` before extracting (only when the buffer is shorter than the file — a small file is necessarily whole- line). Honors the original Phase-2 contract that only complete lines get a vote, without paying for the deleted full-file scan. Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
…ession-list-perf
…ession-list-perf # Conflicts: # packages/core/src/services/sessionService.ts
…, pin perf contract Address QwenLM#3897 follow-up review findings: - SessionPreview footer no longer drops the message count when listSessions omits it. The prop is the override path; default falls back to a unique user/assistant uuid count derived from the already-loaded conversation, matching countSessionMessages semantics with zero extra disk I/O. - countSessionMessages now scopes to the current project, mirroring the first-record cwd check in deleteSession/renameSession/loadSession. A valid sessionId from another project sharing the chats dir no longer bypasses project boundaries on lazy count. - New regression tests: - SessionPicker row renders cleanly with messageCount === undefined (no "messages"/"undefined"/dangling separator) - findSessionsByTitle perf contract: matches have messageCount undefined and fs.createReadStream is never called - countSessionMessages: cross-project sessions return 0 without streaming, empty file returns 0 - Update corruption-recovery tests to call private countSessionMessagesFromPath (the streaming entry point) instead of the public sessionId-shaped API the merge from main pointed them at. Co-Authored-By: Qwen-Coder <noreply@qwen.com>
The session metadata reader uses two bounded 64KB windows (tail + head) since the perf rework on this branch — never a full-file scan. Two comments still described the prior behavior. Reported in MR review. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
ACP renameSession constructed a fresh SessionService and wrote the custom_title record straight to disk. When the same session was live in this process, ChatRecordingService.currentCustomTitle stayed at the old value, so the next title re-anchor (every 32KB) or finalize() re-emitted the stale title at EOF and silently reverted the rename. Now we look up the live ChatRecordingService first and route through recordCustomTitle, which keeps the in-memory cache and the on-disk record in sync. The SessionService path remains for the non-live case (e.g., another client renaming a backgrounded session). Reported in MR review. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…rame `SessionService.listSessions` is split into a stat-only `listSessionsLite` (readdir + stat, no JSONL IO) and an `enrichSessions` step that does the head + tail reads and applies the project-hash filter. The picker hook now drives both: the lite phase paints the picker frame with `pending: true` placeholder rows immediately, then enrichment hydrates each row with its full prompt / branch / customTitle. Why this matters: - The legacy single-call path took ~30 ms median for a page of 20 on a warm cache (reading head/tail per file). Within tail-buffer pooling and lazy message counts, that's already fine on modern NVMe — but on slow disks, NFS, or a project with thousands of sessions it scales linearly with page size and dominates the time-to-first- visible-row. - Lite-only first frame measures ~1 ms in the same conditions. Enrichment runs in parallel with the user's eyes registering the picker frame, so even if the IO budget is identical, perceived latency stops being a function of N. `SessionListItem` is now `pending`-aware: every field except `sessionId / mtime / filePath` (and the new `fileSize`) is optional, because lite items omit them. Callers that already coped with the optional `messageCount` keep working; one ACP integration callsite needed a `?? cwd` fallback. The legacy `listSessions(opts)` is a thin wrapper around `listSessionsLite + enrichSessions` — same public shape, same project- filter semantics — so SDK / WebUI / VSCode-companion callers don't break. Tests: - listSessionsLite returns stat-only items (regression pin: no jsonl.readLines call during the lite phase). - enrichSessions applies the project-hash filter and drops siblings living in a shared chats dir. - Picker renders the first frame from lite items before enrichment resolves: a held-open enrich Promise lets the test inspect the intermediate placeholder frame, then verify the swap-in. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…ession-list-progressive
wenshao
left a comment
There was a problem hiding this comment.
Found 3 Critical concurrency/error-handling issues in the progressive loading code paths, plus 1 Suggestion. The core architecture (two-phase loading, title re-anchor invariant, buffer pooling) is solid — the issues are concentrated in useSessionPicker's loadMoreSessions path and enrichSessions' error isolation.
| [], | ||
| ); | ||
|
|
||
| const replaceEnriched = useCallback((enriched: SessionListItem[]) => { |
There was a problem hiding this comment.
[Critical] replaceEnriched drops ALL unmatched pending rows regardless of which enrichment batch produced them, causing cross-page data loss when initial-load enrichment and loadMoreSessions enrichment race.
After the initial lite phase sets setIsLoading(false), the auto-load effect can trigger loadMoreSessions while the initial enrichSessions is still in flight. loadMoreSessions appends new pending rows from page 2. When the initial enrichment completes and calls replaceEnriched, it iterates ALL pending rows (including page 2's) and drops any not in its enrichment map. Page 2 items are permanently lost because the cursor has already advanced past them.
| const replaceEnriched = useCallback((enriched: SessionListItem[]) => { | |
| const replaceEnriched = useCallback( | |
| (enriched: SessionListItem[], scopeIds: Set<string>) => { | |
| setSessionState((prev) => { | |
| const enrichedById = new Map(enriched.map((s) => [s.sessionId, s])); | |
| const next: SessionListItem[] = []; | |
| for (const row of prev.sessions) { | |
| if (!row.pending) { | |
| next.push(row); | |
| continue; | |
| } | |
| if (!scopeIds.has(row.sessionId)) { | |
| // Not our page — leave it for its own enrichment callback. | |
| next.push(row); | |
| continue; | |
| } | |
| const hit = enrichedById.get(row.sessionId); | |
| if (hit) { | |
| next.push(hit); | |
| } | |
| // else: our page's item didn't survive enrichment — drop it. | |
| } | |
| return { ...prev, sessions: next }; | |
| }); | |
| }, | |
| [], | |
| ); |
Then call as replaceEnriched(enriched, new Set(lite.items.map(s => s.sessionId))) from both callers.
— glm-5.1 via Qwen Code /review
| * which items vanished should diff the input vs output by | ||
| * `sessionId`. | ||
| */ | ||
| async enrichSessions(items: SessionListItem[]): Promise<SessionListItem[]> { |
There was a problem hiding this comment.
[Critical] enrichSessions lacks per-item error isolation. A single session file that throws during enrichOne (ENOENT from concurrent delete, EACCES, corrupt JSONL) aborts the entire batch — all remaining items in the page lose their enrichment.
This amplifies the impact of the loadMoreSessions error path: not only does the hook's async callback fail, but every item in the batch is affected, not just the bad one.
| async enrichSessions(items: SessionListItem[]): Promise<SessionListItem[]> { | |
| async enrichSessions(items: SessionListItem[]): Promise<SessionListItem[]> { | |
| if (items.length === 0) return items; | |
| const tailBuffer = Buffer.alloc(LITE_READ_BUF_SIZE); | |
| const out: SessionListItem[] = []; | |
| for (const item of items) { | |
| if (!item.pending) { | |
| out.push(item); | |
| continue; | |
| } | |
| try { | |
| const enriched = await this.enrichOne(item, tailBuffer); | |
| if (enriched) out.push(enriched); | |
| } catch { | |
| // Single item failed (deleted/corrupt file) — drop it | |
| // but continue enriching the rest of the batch. | |
| } | |
| } | |
| return out; | |
| } |
— glm-5.1 via Qwen Code /review
| isLoadingMoreRef.current = false; | ||
| } | ||
| }, [sessionService, sessionState.hasMore, sessionState.nextCursor]); | ||
| }, [ |
There was a problem hiding this comment.
[Critical] loadMoreSessions uses try/finally but has no catch clause. When enrichSessions throws (per the per-item isolation bug above), the async callback's promise rejects without being caught. Since this is invoked via void loadMoreSessions() (fire-and-forget), the rejection becomes unhandled — in Node.js with --unhandled-rejections=throw (default since Node 16), this crashes the CLI process.
[Suggestion] Additionally, loadMoreSessions does not check enrichAbortRef between async gaps, unlike the initial-load effect which checks ctrl.signal.aborted after both listSessionsLite and enrichSessions. If the component unmounts or sessionService changes during a load-more, setSessionState and replaceEnriched will modify stale state.
| }, [ | |
| isLoadingMoreRef.current = true; | |
| try { | |
| const lite = await sessionService.listSessionsLite({ | |
| size: SESSION_PAGE_SIZE, | |
| cursor: sessionState.nextCursor, | |
| }); | |
| if (enrichAbortRef.current?.signal.aborted) return; | |
| setSessionState((prev) => ({ | |
| sessions: [...prev.sessions, ...lite.items], | |
| hasMore: lite.hasMore && lite.nextCursor !== undefined, | |
| nextCursor: lite.nextCursor, | |
| })); | |
| const enriched = await sessionService.enrichSessions(lite.items); | |
| if (enrichAbortRef.current?.signal.aborted) return; | |
| replaceEnriched(enriched); | |
| } catch { | |
| // Prevent unhandled rejection from void fire-and-forget call. | |
| } finally { | |
| isLoadingMoreRef.current = false; | |
| } |
— glm-5.1 via Qwen Code /review
| // silently skipped by the extractor (no terminating `"` before EOS). | ||
| // For boundary-straddling pre-invariant records, that means the title | ||
| // is lost. Truncating at the last newline keeps us on whole lines. | ||
| const headText = |
There was a problem hiding this comment.
[Suggestion] The head+tail bounded windows (64KB each) can miss custom_title records that sit in the "middle gap" for legacy sessions created before the re-anchor invariant existed. The function returns undefined silently with no logging or telemetry, making this gap unobservable.
Consider adding a debug log when the function returns undefined for a file larger than 2 * LITE_READ_BUF_SIZE (indicating the middle gap exists):
| const headText = | |
| const headText = | |
| headBytes < fileSize | |
| ? rawHead.slice(0, rawHead.lastIndexOf('\n') + 1) | |
| : rawHead; | |
| const headHit = extractLastJsonStringField(headText, key, lineContains); | |
| if (headHit !== undefined) return headHit; | |
| if (fileSize > 2 * LITE_READ_BUF_SIZE) { | |
| debugLogger('sessionStorageUtils', `title field '${key}' not found in head or tail window for ${filePath} (${fileSize} bytes)`); | |
| } |
— glm-5.1 via Qwen Code /review
| hasMore: result.hasMore && result.nextCursor !== undefined, | ||
| nextCursor: result.nextCursor, | ||
| sessions: [...prev.sessions, ...lite.items], | ||
| hasMore: lite.hasMore && lite.nextCursor !== undefined, |
There was a problem hiding this comment.
[Critical] loadMoreSessions has two error-handling gaps in the same block:
-
Cursor advanced before enrichment —
setSessionStateupdateshasMore/nextCursorat line 254 beforeenrichSessionsat line 260. If enrichment throws, the cursor has already moved past the un-enriched items, making them permanent placeholders with no retry path. -
No abort signal check — The initial-load path checks
ctrl.signal.abortedafter eachawait, butloadMoreSessionsnever checksenrichAbortRef.current?.signal.aborted. If the component unmounts during pagination,setSessionStateis called on an unmounted component.
| hasMore: lite.hasMore && lite.nextCursor !== undefined, | |
| isLoadingMoreRef.current = true; | |
| try { | |
| const lite = await sessionService.listSessionsLite({ | |
| size: SESSION_PAGE_SIZE, | |
| cursor: sessionState.nextCursor, | |
| }); | |
| if (enrichAbortRef.current?.signal.aborted) return; | |
| const enriched = await sessionService.enrichSessions(lite.items); | |
| if (enrichAbortRef.current?.signal.aborted) return; | |
| // Only update state after enrichment succeeds — keeps cursor | |
| // consistent and avoids permanent placeholder rows. | |
| setSessionState((prev) => ({ | |
| sessions: [...prev.sessions, ...enriched], | |
| hasMore: lite.hasMore && lite.nextCursor !== undefined, | |
| nextCursor: lite.nextCursor, | |
| })); | |
| } catch (error) { | |
| debugLogger.error('Failed to load more sessions:', error); | |
| } finally { | |
| isLoadingMoreRef.current = false; | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| const enriched = await sessionService.enrichSessions(lite.items); | ||
| if (ctrl.signal.aborted) return; | ||
| replaceEnriched(enriched); | ||
| } catch { |
There was a problem hiding this comment.
[Suggestion] Initial load catch block silently swallows errors with no logging. If listSessionsLite throws (e.g., directory permission denied, disk error, NFS mount issue), the user sees an empty picker with no indication of what went wrong — and oncall has no logs to diagnose.
| } catch { | |
| } catch (error) { | |
| debugLogger.error('Failed to load session list:', error); | |
| setIsLoading(false); | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
Summary
Splits
SessionService.listSessionsinto a stat-onlylistSessionsLiteand anenrichSessionsstep, then wires the picker to render the first frame from lite items immediately while enrichment hydrates each row in the background. The legacylistSessions(opts)is preserved as a thin wrapper around both, so SDK / WebUI / VSCode-companion callers don't break.Why
After #3897, the per-page cost of
listSessionsfor 20 rows is ~30 ms median on local NVMe — fine on modern hardware, but it scales linearly with page size and dominates time-to-first-visible-row on slow disks, NFS, and large-N projects.This PR decouples first-paint latency from N entirely:
listSessionsLitesize=20enrichSessions20 rowslistSessionssize=20First frame goes from ~30 ms → ~1 ms. Enrichment cost is unchanged but happens in parallel with the user's eyes registering the picker frame.
Visual proof (tmux capture, with 800 ms enrich delay injected for demo)
T+86ms — lite phase, picker frame already painted, every row a placeholder:
T+906ms — enrichment lands, rows swap to real metadata:
The injected delay was reverted before commit; on local NVMe the swap is imperceptible.
Type changes
SessionListItem's shape is now lite-aware:sessionId,mtime,filePath, plus the newfileSize?(cheap stat field)pending?: boolean—truewhile still in lite formcwd,startTime,prompt,gitBranch,customTitle,titleSource) are now optionalmessageCountwas already optional from #3897. Existing consumers that handle the optional case (e.g.vscode-ide-companion's?? messages.length, the picker'stypeof === 'number'check) keep working unchanged. One ACP integration callsite needed acwd ?? cwdfallback.Picker changes
useSessionPicker: initial-load + load-more both call lite first, then enrich in the background. AbortController on unmount cancels in-flight enrichment.SessionPicker.tsx: pending rows render…<sessionId-prefix>in the dim color slot; metadata row hides the time/branch suffix when fields are absent.Compatibility
listSessions(opts)returns the same shape it always did — same project-filter semantics, same pagination cursor (with the wrapper looping when sibling-project files drop out during enrichment, matching pre-split behaviour).cwdif a head record decode failed mid-page.Test plan
tsc --noEmitclean for bothpackages/coreandpackages/clivitest runforsessionService.test.ts(47 cases, +2 new),sessionStorageUtils.test.ts(44 cases),sessionService.rename.test.ts(17 cases) — 108/108 passvitest runforStandaloneSessionPicker.test.tsx(38 cases, +1 new),sessionPickerUtils.test.ts,useResumeCommand.test.ts— 60/60 passlistSessionsLitedoes not calljsonl.readLines— guards against silently re-introducing head IO into the fast pathenrichSessionsapplies the project-hash filter — sibling-project files in a shared chats dir get dropped during enrichment, matching pre-split semantics/resumeagainst 313-session / 127 MB chats dir confirms the lite frame appears at T+~86 ms and rows swap as enrichment completes