perf(filesearch): move @-picker crawl and fzf index to worker_threads#100
Open
BingqingLyu wants to merge 15 commits into
Open
perf(filesearch): move @-picker crawl and fzf index to worker_threads#100BingqingLyu wants to merge 15 commits into
BingqingLyu wants to merge 15 commits into
Conversation
Before P1 the recursive @-picker did the fdir crawl and synchronous `new AsyncFzf(...)` construction on the main thread. On 100k-file workspaces that blocked the Ink render loop for 500ms–2s, so pressing `@` felt unresponsive. This change introduces `FileIndexCore` (pure class), `FileIndexWorker` (worker_threads entry), and `FileIndexService` (main-thread singleton that multiplexes search requests and exposes onPartial for future streaming). `RecursiveFileSearch` becomes a thin proxy, preserving the public `FileSearch` contract (and vscode-ide-companion's usage) untouched. The esbuild bundle now emits a self-contained `dist/fileIndexWorker.js` alongside `dist/cli.js`.
…EFRESH trigger
Changes driven by an open-ended correctness audit:
- crawlError now tears the service down (dispose worker, delete
from INSTANCES, reject pending searches + whenReady waiters), so a
transient crawl failure no longer leaks a worker per attempt.
- dispose() now rejects whenReady() waiters with AbortError; previously
callers awaiting whenReady() through a dispose could hang forever.
Also reorders unsubscribe before posting 'dispose' so the in-process
transport's synchronous exit cleanup can't beat the AbortError.
- worker.on('error') is wired, converting uncaught worker errors into
a normal exit-cleanup path instead of re-throwing on the main process
and crashing the CLI.
- useAtCompletion gains a monotonic refreshToken in its reducer state
and effect deps, so REFRESH re-triggers the worker effect even when
the status was already SEARCHING (previously a partial chunk arriving
mid-search could not cause a re-run).
- Worker 'dispose' now closes parentPort instead of process.exit(0) so
in-flight searchResult/searchError messages drain before shutdown.
- FileIndexCore.startCrawl preserves the allFiles reference on the
cache-hit fallback path to keep concurrent search() iterations stable.
- dispose() races transport.terminate() against a 2s timeout so a
faulted worker cannot hang shutdown.
- handleExit is now idempotent: if dispose() already set disposed=true,
a subsequent worker exit won't double-reject pending/readyWaiters.
Added regression tests for whenReady-on-dispose rejection and for
post-dispose FileIndexService.for() creating a fresh instance.
…TANCES hygiene - `optionsKey()` now hashes the .gitignore / .qwenignore contents via loadIgnoreRules().getFingerprint(). Editing those files produces a new key so the next `FileIndexService.for()` spawns a fresh worker instead of serving a stale cached snapshot with outdated ignore rules. - `FileIndexService.for()` only memoises an instance that survived construction. If the Worker spawn errored synchronously and handleExit ran before INSTANCES ever had the key, a permanently-disposed instance would otherwise be cached for the lifetime of the process. - `FileSearch.dispose?()` is now a (optional) part of the public interface; the recursive proxy delegates to `FileIndexService.dispose()`. - `FileMessageHandler.clearFileSearchCache` calls `dispose()` when a workspace file create/delete fires so the worker's fzf index is rebuilt from disk instead of continuing to serve stale results. Added regression tests: editing .gitignore mid-session invalidates the singleton; disposed services don't pollute INSTANCES.
…ttern errors - Wrap FileIndexCore construction at worker module-init in try/catch. A bad projectRoot (e.g. containing NUL) used to crash the worker before its message handler attached, leaving the main thread waiting forever. We now surface the failure as a normal crawlError / searchError so the service sees it on the next message. - Validate reqId / pattern types on 'search' messages so a malformed IPC shape fails just the one request instead of the whole worker. - filter() now catches picomatch compile errors and returns [] — typing an interim `foo[` (common mid-keystroke state) no longer surfaces as a TypeError to the UI; it's simply "no matches" until the pattern is well-formed. - Worker error log trims to name+message instead of the full Error object, so a CLI transcript/wrapper doesn't capture absolute paths from the user's machine by default. Added regression test: malformed glob pattern in core.search returns [].
…y eviction - scripts/prepare-package.js now lists `fileIndexWorker.js` in the published tarball's `files` whitelist. Without this, `npm i -g` would produce an install missing the worker, and the first `@`-picker use would throw ERR_MODULE_NOT_FOUND at runtime. Release blocker. - FileIndexService.for() now disposes any stale instance under a different key but the same projectRoot before creating a new one. When `.gitignore` is edited mid-session, the fingerprint change would otherwise leave the old worker pinned in INSTANCES forever — the auditor flagged this as a real leak even though the commit message for round 2 implied it was fixed. - Dropped `__setIndexTransportFactory` from the public barrel; it's a test-only hook that shouldn't be part of the external API. Tests (and nothing else) can still import it from the module directly. - `search()` on a disposed service now throws AbortError instead of a plain Error, matching how in-flight searches are rejected inside dispose(). Added regression tests: stale-key service is disposed and its `search()` throws AbortError after .gitignore-driven eviction.
…nc throw Round 4 switched `FileIndexService.search()` on a disposed instance from `Error` to `AbortError` for notional consistency with how in-flight searches are rejected inside `dispose()`. The audit correctly called out that this silently breaks useAtCompletion: the hook's catch block swallows `AbortError` as "user pressed ESC" and never dispatches the ERROR state, so a crawlError-driven cascade (where dispose() is called and the next search hits the sync guard) would leave the UI stuck in SEARCHING forever. In-flight rejections inside dispose() remain AbortError — those are genuine cancellations. The post-dispose sync guard is caller misuse and must surface as a plain Error to drive the ERROR branch.
…cators Addresses post-audit UX feedback: - AppContainer kicks FileIndexService.for(...) as soon as Config.initialize resolves. The worker crawl starts in the background before the user types `@`, so the first picker open usually finds a ready (or nearly-ready) singleton and returns results without any visible wait. - useAtCompletion no longer flips isLoading on the INITIALIZE dispatch. A 200ms timer now arms during INITIALIZING too, so the "Loading suggestions..." placeholder only appears if the crawl/search is genuinely slow — the common pre-warmed path opens silently. - Remove the global ConfigInitDisplay render in Composer. The top-of-screen "Initializing..." banner is no longer shown while Config/MCP finish setting up; the prompt renders without any boot-time placeholder. Updated the two useAtCompletion tests that asserted the old flash-loading behaviour to assert the steady-state (isLoading=false) instead.
Initially planned as P2 to replace fdir with ripgrep for the @-picker crawl. Empirical benchmarking on two representative trees refuted the premise: qwen-code (~2700 files) fdir ~25ms vs rg (spawn+IPC) ~140ms node_modules (~48k files) fdir ~640ms vs rg (spawn+IPC) ~1800ms Even though `rg --files` is ~70ms in a shell, the child_process spawn and stdout-pipe overhead from Node dominates for every tree size we tested. Claude Code's speed almost certainly comes from being a native binary with an in-process walker, not from shelling out to ripgrep. The ripgrepCrawler implementation is kept for future re-evaluation on very large trees or different platforms, and is reachable via the `QWEN_FILESEARCH_USE_RG=1` env var. fdir remains the default. Also: useAtCompletion tests now sort before asserting on suggestion order. The exact ranking is an fzf tiebreak artifact that depends on crawler emission order — not a behavioural contract worth coupling tests to.
Earlier commit (21764b5) gated ripgrep off by default based on a benchmark that only covered small-to-medium trees (<48k files), where Node's spawn+IPC overhead does beat fdir. User-reported feedback prompted a re-test on a home-directory-scale target: qwen-code repo (~2700 files) fdir ~25ms rg ~140ms fdir wins project/node_modules (~48k) fdir ~640ms rg ~1800ms fdir wins ~/ home dir (100k-file cap) fdir ~9s rg ~2.5s rg 3-4× wins The slow case is the painful one — a user typing @ at $HOME shouldn't wait 9 seconds while Node's single-threaded walker catches up. On small repos both backends are well below the 200ms loading threshold, so the perceptual cost of rg's spawn overhead is zero. Flip the default; keep `QWEN_FILESEARCH_USE_RG=0` as the escape hatch to force fdir.
Windows CI was failing the full filesearch suite (useAtCompletion, crawler, fileSearch, fileIndexCore, fileIndexService) because ripgrep on Windows emits `.\file.txt`; the stdout pump stripped `./` before `toPosixPath`, so the leading `./` survived and the ancestor-directory walk in `buildRipgrepFileFilter` asked the ignore lib to test `"./"` — which throws RangeError and poisoned the data handler. Normalize to posix first, then strip; guard the filter for `.`/`./`/leading-`./` inputs as defence in depth. Also addresses review feedback: - fileIndexService: an early worker exit left `_state` at `'crawling'`, so a `whenReady()` call arriving after the exit parked in readyWaiters and never settled. `handleExit` now transitions to `'error'` so the state check rejects synchronously. - AppContainer prewarm: guarded on `enableRecursiveFileSearch` so opted-out users don't pay for a full crawl + worker at startup. - FileMessageHandler: added a per-rootPath generation token so an in-flight `initialize()` disposes the stale index instead of re-caching it when `clearFileSearchCache` fires mid-crawl. Regression tests added for all four. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ata payloads - Remove unused `buildCaseInsensitiveGlob` helper and its companion `globSpecialChars` Set. Both were carried over from an earlier case-insensitive-search attempt and have no callers anywhere in the tree. - Switch `Record<string, unknown>` reads in the message handler (`data?.query`, `data.path`, `data.content`, etc.) to bracket syntax so the file stays clean under stricter `noPropertyAccessFromIndexSignature` variants of the TS config. Pure cleanup; behaviour unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng tmp After the ripgrep path fix landed, the only remaining Windows CI failures were `EBUSY: resource busy or locked, rmdir` on test temp directories. Root cause: `ripgrep --files` is spawned with `cwd: crawlDirectory`, and on Windows the OS holds a handle on that working directory until the child fully exits. If `afterEach` calls `cleanupTmpDir` before the FileIndexService (and its rg subprocess) is disposed, rmdir races the handle release and fails. Three-pronged fix: - `cleanupTmpDir` now passes `maxRetries: 5, retryDelay: 100` to `fs.rm`. Node retries EBUSY/EPERM internally on Windows; half a second is plenty for tens-of-ms handle-release races. This is a blanket safety net for every caller. - `fileIndexService.test.ts` afterEach: `__resetForTests()` runs before `cleanupTmpDir` so the transport is torn down first. - `useAtCompletion.test.ts` afterEach: added the same `FileIndexService.__resetForTests()` call so the hook's in-process worker is disposed before the dir is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- fileIndexCore: drop the redundant second crawl() on cache hits; use the first call's return value when onProgress never fires. - fileIndexProtocol: new module holding WorkerRequest/WorkerResponse so fileIndexService and fileIndexWorker can't silently diverge. - fileIndexService: LRU cap (8) on the INSTANCES singleton Map; hits re-insert to refresh recency. Dispose timeout warns + retries terminate() as a force-kill fallback. - fileIndexService: drop `process.env['VITEST']` transport sniffing in favour of an explicit `installInProcessIndexTransport()` DI hook wired from each package's test-setup. Survives bundling. - crawler: `ripgrepDisabled` is now a timestamp with a 5-minute cooldown (was permanent-for-process-lifetime). A single spawn failure no longer downgrades long-lived hosts like the VSCode extension for the whole session. - crawler: collectRipgrepExcludeDirs now forwards plain directory patterns (.git/, build/, user ignoreDirs) to rg as `--glob '!dir'` args so rg prunes subtrees at the walker instead of streaming every path under them for the Node post-filter to reject. - useAtCompletion: fileSearchOptions is useMemo'd on [config, cwd] and both effects have it in their deps — the eslint-disables are gone. Extracted `buildFileSearchOptions(config, cwd)` helper that AppContainer uses too, so the prewarm and search paths can't drift from the same FileIndexService singleton key. Tests: - fileIndexService.test: new "evicts the oldest instance when the LRU cap is exceeded" regression test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Conflict in packages/cli/src/ui/components/Composer.tsx: upstream added `HistoryItemToolGroup` type import to support per-subagent token attribution, while this branch removed the `ConfigInitDisplay` banner as part of the pre-warm UX change. Resolved by keeping the upstream type import and dropping `ConfigInitDisplay` (import + JSX usage), which is the intended behaviour of this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…imers Addressing @wenshao's two findings on QwenLM#3455: 1. `--bare` mode tests failing in `packages/cli/src/config/config.test.ts` Root cause (correctly diagnosed): the top-level `installInProcessIndexTransport()` call in `test-setup.ts` eagerly evaluated `@qwen-code/qwen-code-core`'s module tree, pulling `workspaceContext.ts` in with a real `node:fs` binding. Later `vi.mock('fs', …)` declarations in individual test files no longer took effect, so `fs.existsSync()` fell through to the real FS and silently dropped mock directories. Fixed by removing the install from both `test-setup.ts` files and opt-ing in per-test-file (`beforeAll`/`afterAll`) in the three suites that actually exercise the worker-backed filesearch: - `packages/core/src/utils/filesearch/fileIndexService.test.ts` - `packages/core/src/utils/filesearch/fileSearch.test.ts` (also drops live singletons between tests so Windows rmdir doesn't race an open ripgrep child) - `packages/cli/src/ui/hooks/useAtCompletion.test.ts` The two `test-setup.ts` files now carry an explanatory comment so this doesn't regress. 2. CI matrix stuck for ~10h on "Run tests and generate reports" Likely cause: pending `setTimeout` handles keeping the event loop alive past test completion. - `FileIndexService.dispose()` raced `terminate()` against a 2 s `setTimeout` but never cleared the timer on the healthy-exit path — so after every disposal the process held a timer for up to 2 s. Now the terminate() arm clears the timer, and the timer is also `.unref()`'d as a belt-and-suspenders. - `crawlCache` TTL timers (default 30 s) are now `.unref()`'d too. They'd keep vitest workers waiting on exit even after all assertions passed. `clear()` still drops both synchronously between tests, so correctness is unchanged. Verified: - `npx vitest run src/config/config.test.ts -t "should ignore implicit startup"` now passes (was failing before). - Full test suites — core 5925 / 5927, cli 4275 / 4282 (skipped tests pre-existing), vscode-ide-companion 204 / 205 — all green. - Typecheck clean on core + cli. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 28, 2026
Owner
Author
Conflict Group 1This PR shares modified functions with 9 other PR(s): #107, #109, #113, #114, #117, #52, #88, #93, #96. These PRs should be reviewed as a batch — merging one may affect the others.
graph LR
PR100["PR #100"]
Fcrawl_3215["crawl<br>crawler.ts"]
PR100 -->|modifies| Fcrawl_3215
PR93["PR #93"]
PR93 -->|modifies| Fcrawl_3215
FisToolExecuting_7261["isToolExecuting<br>AppContainer.tsx"]
PR100 -->|modifies| FisToolExecuting_7261
PR107["PR #107"]
PR107 -->|modifies| FisToolExecuting_7261
PR109["PR #109"]
PR109 -->|modifies| FisToolExecuting_7261
PR113["PR #113"]
PR113 -->|modifies| FisToolExecuting_7261
PR114["PR #114"]
PR114 -->|modifies| FisToolExecuting_7261
PR117["PR #117"]
PR117 -->|modifies| FisToolExecuting_7261
PR52["PR #52"]
PR52 -->|modifies| FisToolExecuting_7261
PR88["PR #88"]
PR88 -->|modifies| FisToolExecuting_7261
PR96["PR #96"]
PR96 -->|modifies| FisToolExecuting_7261
Posted by codegraph-ai conflict detection. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TLDR
Moves the
@-picker's recursive file crawl and fzf index construction off the main thread into aworker_threadsworker, so pressing@no longer freezes the Ink render loop. On large workspaces (home directory, 100k-file monorepos) the CLI stays fully responsive where it previously stalled for 1–9 seconds.Also wires up a bundled ripgrep fast path for the crawl (3–4× faster than
fdirpast ~50k files) and pre-warms the index at CLI boot so the first@press usually finds a ready snapshot.Screenshots / Video Demo
N/A — no new UI surface; the visible change is "nothing freezes anymore" on large trees, which is hard to capture in a still. Happy to record a before/after clip against
~if reviewers want one.Dive Deeper
Root cause of the original freeze was two synchronous pieces of work on the main thread:
fdirrecursive crawl inpackages/core/src/utils/filesearch/crawler.ts.new AsyncFzf(allFiles, …)inpackages/core/src/utils/filesearch/fileSearch.ts— despite the name the constructor is synchronous and dominates for large file counts.This branch introduces:
FileIndexCore— pure class holding the crawl, fzf index, and result cache. Worker-safe and unit-testable without a real worker.fileIndexWorker.ts— thinparentPortpump that wrapsFileIndexCore.FileIndexService— main-thread singleton keyed by(projectRoot + ignore-fingerprint + options)that owns the worker, multiplexes reqId-basedsearchcalls, fans out streaming partial updates, and disposes stale-key instances when.gitignoreedits or cwd changes produce a new key.RecursiveFileSearchis now a thin proxy over the service, so the publicFileSearchinterface (and vscode-ide-companion's use of it) is unchanged.ripgrepCrawler.ts—rg --filesbackend using the bundled ripgrep binary. Empirically on macOS:At small sizes Node's spawn+IPC overhead beats rg's native parallel walker; past ~50k files rg's Rust walker wins decisively. Since both are well under the 200 ms loading threshold on small repos, rg is the default;
QWEN_FILESEARCH_USE_RG=0forces fdir.AppContainer.tsxpre-warmsFileIndexService.for(...)right afterconfig.initialize()resolves, so the worker crawl starts in parallel with Config/MCP boot instead of waiting for the first@keypress.useAtCompletion.tsno longer flipsisLoading=trueon the INITIALIZE dispatch; the 200 ms slow-load timer now also coversINITIALIZING, so the common pre-warmed path opens silently and only slow cold-starts surface a spinner.Composer.tsxno longer rendersConfigInitDisplay; the "Initializing…" banner at top of the prompt is gone.The branch also lands five rounds of self-audit fixes on top of the P1 baseline:
crawlErrornow disposes the service (no zombie worker on transient failures).dispose()rejectswhenReady()waiters.REFRESHuses a monotonic token so partial-driven re-searches fire even mid-SEARCHING.worker.on('error')is wired and converted to a synthetic exit.parentPort.close()(drains queued sends) instead ofprocess.exit(0).optionsKey()includes the.gitignore/.qwenignorefingerprint so edits invalidate the cached singleton;FileIndexService.for()also evicts stale-key instances sharing the same projectRoot.FileSearch.dispose?()added to the interface; vscode-ide-companion'sclearFileSearchCachenow disposes instead of only dropping the Map entry.searchpayloads are type-validated; picomatch compile errors degrade to empty results instead of throwing to the UI.scripts/prepare-package.jswhitelistsfileIndexWorker.jsso the published npm tarball actually ships it.Reviewer Test Plan
npm ci && npm run build && npm run bundlenpm startthen press@— should open instantly, no loading flash.QWEN_WORKING_DIR=~ npm start, press@within the first second — prompt should stay responsive (cursor keeps blinking, you can still type). Compare tomain: freeze.QWEN_FILESEARCH_USE_RG=0 QWEN_WORKING_DIR=~ npm start— slower crawl but still non-blocking..gitignoreinvalidation: open an @-picker, edit.gitignoreto add a dir, press@again — newly-ignored files disappear without a CLI restart.npm run test --workspace=packages/coreandnpm run test --workspace=packages/cli.Testing Matrix
Personally validated on macOS 15.1.1 (arm64) with Node 22.17. Worker spawn latency on Windows is known to be higher (30–80 ms) — pre-warm absorbs it, but independent validation there would be appreciated. Sandbox profiles may need to permit worker thread creation; I haven't repro'd a sandbox issue locally but flagged it as a risk.
Before:
qwencode@before.mp4
after:
qwencodeafter.mp4
Linked issues / bugs
Fixes QwenLM#3454