feat(core): replace fdir crawler with git ls-files + ripgrep fallback#3214
Conversation
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Great idea — git ls-files is a big perf win over fdir, and the three-level fallback with execFile (no shell injection risk) is well-designed.
Issues
-
maxDepthsilently ignored on git and rg paths. NeithercrawlWithGitLsFilesnorcrawlWithRipgrepapplies depth filtering. Callers usingcrawl({ maxDepth: 0 })for one-level autocomplete will get full recursive results — a behavioral regression. -
Subprocess ignore rules override caller options.
git ls-files --others --exclude-standarddrops .gitignore-ignored files, andrg --filesapplies its own filtering, regardless of the caller'suseGitignoresetting. The fdir path respects it; the new paths don't. -
Throttle broken for non-git paths.
crawlWithRipgrepcallsrecordRebuildbut neverupdateChangeState, so the 5s throttle has no effect outside git repos. Same issue for the fdir fallback. -
Tests don't isolate strategies. The test suite runs inside a git repo, so
findGitRoot(tmpDir)always resolves to qwen-code's.gitand the git path is always taken — including the "fdir fallback" test. Considergit initin temp dirs for git tests, and mockingfindGitRootto return null for non-git tests.
Verdict
COMMENT — The core approach is solid but these gaps need to be addressed before merge. Items 1-3 are behavioral regressions; item 4 means the new code paths have no real test coverage.
|
Everything seems to be fixed @tanzhenxin if you can point out any errors or shortcomings, I will try to correct them. |
|
Thanks for the thorough rework — maxDepth, useGitignore, and the test coverage all look good now. One scenario I'd like you to verify before merging: the non-git Trace:
The new Not blocking — most users are in git repos and the git path is correct — but worth closing the loop before merge. |
|
Thanks — I verified this and updated the non-git fallback so it re-crawls once the throttle window expires instead of caching forever. I also added a regression test that advances time past the 5s window, creates a new file, and confirms the fresh crawl picks it @tanzhenxin |
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Good progress — the non-git re-crawl, useGitignore, and test isolation concerns from the previous review are all addressed. One remaining issue:
maxDepth computed relative to cwd, not crawlDirectory
applyMaxDepthLimit filters on cwd-relative paths (already prefixed with relativeToCrawlDir). When crawlDirectory is a subdirectory of cwd, the prefix inflates the depth count.
Example: crawl({ crawlDirectory: '/project/src', cwd: '/project', maxDepth: 0 })
- Git path produces
src/file.ts(cwd-relative) getEntryDepth('src/file.ts')= 1 → filtered out- But
file.tsis depth 0 relative to the crawl root
The fdir fallback handles this correctly because it applies maxDepth natively before prefixing. fileSearch.ts calls exactly this pattern (maxDepth: 0 with a subdirectory), so this breaks file completion.
Suggestion: strip the relativeToCrawlDir prefix before depth filtering (and re-add after), or subtract the prefix depth from the limit.
|
I fixed the issue by making maxDepth relative to crawlDirectory instead of cwd when filtering git and ripgrep results. This now matches the fdir behavior for subdirectory crawls, so maxDepth: 0 works correctly for file completion. I also added a regression test for the nested crawlDirectory case, and crawler.test.ts plus fileSearch.test.ts are passing locally. @tanzhenxin |
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
The critical maxDepth bug from the previous review is fixed — stripCrawlDirectoryPrefix correctly computes depth relative to the crawl directory, and the new test validates subdirectory crawls with maxDepth: 0. The fallback chain is well-structured and the test infrastructure is solid.
One thing worth addressing: git ls-files --others (line 313) runs without --exclude-standard, so gitignored directories like node_modules/ get fully enumerated into the 20MB buffer before applyFilters removes them. The old fdir path skipped ignored directories during traversal, so this is a performance regression for repos with large gitignored trees. The fix is one line — add --exclude-standard to the --others args.
Verdict
COMMENT — No blockers remaining. The --exclude-standard omission is the most actionable suggestion; everything else is minor.
|
Fixed: Added --exclude-standard to git ls-files --others . Now gitignored and untracked files are not listed, eliminating a performance regression. @tanzhenxin |
tanzhenxin
left a comment
There was a problem hiding this comment.
Thanks for the --exclude-standard fix in 8156924 — clean addition with the test updated to document the new semantics on the git path. Approving.
tanzhenxin
left a comment
There was a problem hiding this comment.
Retracting my earlier approval — the CI is failing on macOS and Windows. Both failing tests are the ones added for the fixes I approved:
-
should treat maxDepth as relative to the crawl directory— fails withexpected [ 'level1', …(2) ] to deeply equal ArrayContaining{…}, meaning the maxDepth fix fromstripCrawlDirectoryPrefix/applyMaxDepthLimitdoesn't behave the same way outside Linux. -
should avoid enumerating gitignored untracked files on git path— fails withexpected [ '.', '.gitignore', 'keep.log', …(1) ] to not include 'keep.log', meaning--exclude-standardisn't actually filtering out the gitignored file in the macOS/Windows test fixtures.
Linux passes, macOS 20.x/22.x/24.x and Windows 20.x/22.x/24.x all fail. Likely candidates:
- Path separator handling (
/vs\on Windows) instripCrawlDirectoryPrefixor in how depths are computed from the relative path - The git fixture setup — maybe the repo init isn't complete on macOS/Windows before
git ls-files --others --exclude-standardis invoked, so git treats files as untracked-not-ignored
Please reproduce locally on macOS/Windows (or in CI logs) and address before merging.
|
Okay, I'll fix it right now. @tanzhenxin |
…proved file handling
Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
…e detection logic
|
Fixed the path separator issue on Windows/macOS — stripCrawlDirectoryPrefix now uses path.posix normalization throughout. @tanzhenxin @wenshao could you take another look? |
|
@scrollDynasty Run: https://github.com/QwenLM/qwen-code/actions/runs/25635817286 I just re-ran Look at the Two strong candidates that would behave differently on CI than on your dev box:
Concrete debug stepTemporarily add a console.error('[debug-3214] processTrackedFile', JSON.stringify({
rawLine: file,
parsed,
normalizedFile,
relativeToGitRoot,
relativeToCrawlDir,
fullPath,
}));…push, re-run CI, the failed-job log will tell you exactly what Alternatively: drop I'll re-review the moment CI flips green. |
- Enhanced the logic for handling git commands in the `crawlWithGitLsFiles` function by removing the NUL record delimiter for better compatibility with various Git versions. - Updated the `equalsRepoRelativeCaseAware` function to utilize a new helper for case-folding path comparisons, improving consistency across platforms. - Refactored the return logic in the `crawl` function to ensure proper handling of relative paths, enhancing the overall file crawling process.
|
@wenshao fixed |
|
@scrollDynasty Run: https://github.com/QwenLM/qwen-code/actions/runs/25636266712 All new failures share the same shape — duplicated directory entries: Root causeIn the new const cwdRelative = results // fdir already emits 'src/' (withDirs() + trailing slash)
.filter((p) => p.length > 0 && p !== '.')
.map((p) => path.posix.join(relativeToCrawlDir, p));
return buildResultsFromFileSet(new Set(cwdRelative));
FixEither:
Core
|
- Modified tests to expect two files instead of one in the results of the `crawl` function, ensuring accuracy in file counting. - Enhanced the validation of the returned file array to include specific expected file names, improving test reliability. - Adjusted the logic in the `crawlWithFdir` function to ensure proper handling of directory entries and file paths during crawling.
|
@wenshao fixed |
| options.ignore.getFingerprint(), | ||
| options.maxDepth, | ||
| options.maxFiles, | ||
| function normalizeGitRefPath(p: string): string { |
There was a problem hiding this comment.
[Suggestion] normalizeGitRefPath and normalizeGitLsOutputLine are nearly identical — both do toPosixPath + strip ./ prefix. The only difference is that normalizeGitRefPath additionally strips trailing slashes. Consider merging into a single function with an optional stripTrailingSlash parameter to reduce maintenance burden.
| function normalizeGitRefPath(p: string): string { | |
| function normalizeGitPath(p: string, stripTrailingSlash = false): string { | |
| let s = toPosixPath(p); | |
| if (s.startsWith('./')) s = s.slice(2); | |
| return stripTrailingSlash ? s.replace(/\/+$/, '') : s; | |
| } |
— deepseek-v4-pro via Qwen Code /review
| } | ||
|
|
||
| const before = fileSet.size; | ||
| fileSet.add(fullPath); |
There was a problem hiding this comment.
[Suggestion] processTrackedFile unconditionally calls fileSet.add(fullPath) for every tracked file, even when the file exceeds maxDepth. The depth check at shouldCountTowardBudget only controls whether the file counts against maxFiles — it does not prevent the file from being added to fileSet. When maxDepth is restrictive but maxFiles is unset (or generous), fileSet grows to O(all files) and most are later discarded by applyFilters. Consider early-skipping fileSet.add for out-of-depth files, or documenting that callers must set maxFiles when using maxDepth.
— deepseek-v4-pro via Qwen Code /review
| console.warn('[crawler] falling back to fdir (ripgrep unavailable)'); | ||
| } | ||
|
|
||
| let fdirResults = await crawlWithFdir(options); |
There was a problem hiding this comment.
[Suggestion] The fdir fallback path in crawl() does not call applyFilters() on fdirResults before applyMaxFilesLimit. The git/rg paths run applyFilters via buildResultsFromFileSet (which does depth + ignore filtering). The fdir path relies solely on its .exclude/.filter callbacks, with no secondary filtering pass. If a future change introduces a gap in those callbacks, there is no safety net. Consider running applyFilters on fdir results for consistency.
| let fdirResults = await crawlWithFdir(options); | |
| fdirResults = applyFilters(fdirResults, options, relativeToCrawlDir); | |
| const limitedResults = applyMaxFilesLimit(fdirResults, options.maxFiles); |
— deepseek-v4-pro via Qwen Code /review
| return results; | ||
| } | ||
|
|
||
| if (gitResult.gitRepoListingFailed) { |
There was a problem hiding this comment.
[Suggestion] The fdir fallback warning ([crawler] falling back to fdir) is gated on gitResult.gitRepoListingFailed. In non-git directories this flag is false, so the fdir degradation is silent — even though ripgrep failed and the user is now on the slowest path. The git path correctly shows both degradation warnings. Consider tracking whether ripgrep failed independently so the fdir warning also fires in non-git directories.
— deepseek-v4-pro via Qwen Code /review
…zation - Updated tests to expect specific warning messages when falling back to fdir due to ripgrep unavailability, ensuring accurate feedback during file crawling. - Refactored the `normalizeGitRefPath` function to `normalizeGitPath`, adding an option to strip trailing slashes for improved path handling. - Enhanced comments in the `applyFilters` function to clarify its purpose and behavior regarding depth limits and ignore rules.
|
@wenshao fixed |
CI failure analysis (macOS + Windows)Both failing jobs ( Failing assertions
In both cases the synthetic crawl-root directory entry is missing. Root causeThe two tests use
The
Windows has the analogous problem (short-name / drive-letter path normalization between Suggested fix directions
I'd recommend (1) + (2) together: (1) fixes the real bug (this would also affect production users on macOS who run the agent against any symlinked path); (2) keeps the new tests deterministic against tmpdir quirks. |
wenshao
left a comment
There was a problem hiding this comment.
Found 2 Critical and 14 Suggestion issues after multi-dimensional review (9 agents + verification + 2 reverse-audit rounds). Posting the two Critical findings as inline comments. Full finding list in the review report.
— glm-5.1 via Qwen Code /review
| return toPosixPath(p); | ||
| } | ||
|
|
||
| function getPosixRelative(from: string, to: string): string { |
There was a problem hiding this comment.
[Critical] getPosixRelative symlink regression — symlinked projects return empty file listings
When a user opens a project via a symlinked path (e.g. ~/projects/link → ~/projects/real), findGitRoot returns the real path via git rev-parse --show-toplevel, but crawlDirectory remains the symlinked path. getPosixRelative computes a ../-prefixed relativeToGitRoot (e.g. ../link), which causes all three git ls-files calls to return zero matching entries (the pathspec resolves outside the repo). crawlWithGitLsFiles returns { success: true, files: ['.'] } — no ripgrep/fdir fallback is triggered.
The PR removed fs.realpathSync.native (the old canonicalizePath helper) that previously resolved symlinks before computing relative paths.
| function getPosixRelative(from: string, to: string): string { | |
| function getPosixRelative(from: string, to: string): string { | |
| const fromAbs = fs.realpathSync(path.resolve(from)); | |
| const toAbs = fs.realpathSync(path.resolve(to)); | |
| const relative = path.relative(fromAbs, toAbs); | |
| return toPosixPath(relative) || '.'; | |
| } |
— glm-5.1 via Qwen Code /review
| trackedArgs.push(relativeToGitRoot); | ||
| } | ||
|
|
||
| const deletedSet = new Set(deletedFiles ?? []); |
There was a problem hiding this comment.
[Critical] Partial git listing failures return { success: true } — deleted files shown as present, new files silently omitted
When listDeletedTrackedFiles returns null (command failure), deletedSet is empty, so files deleted from the working tree are NOT filtered from results. When listUntrackedFiles returns null, newly created files are silently skipped. Both cases return { success: true }, preventing ripgrep fallback.
| const deletedSet = new Set(deletedFiles ?? []); | |
| if (untrackedFiles === null || deletedFiles === null) { | |
| return { success: false, files: [], gitRepoListingFailed: true }; | |
| } | |
| const deletedSet = new Set(deletedFiles); |
— glm-5.1 via Qwen Code /review
| clearTimeout(timeout); | ||
| } | ||
|
|
||
| if (timedOut) { |
There was a problem hiding this comment.
[Suggestion] runCommand close handler checks timedOut before killedByLimit — race loses valid partial data
When a budget-limited kill occurs close to the timeout deadline, both flags can be set. Since timedOut is checked first, valid partial data collected before the budget limit is discarded as a timeout failure. The budget-limited kill is a deliberate successful operation — it should take priority.
| if (timedOut) { | |
| if (killedByLimit) { | |
| finalize(true, lines); | |
| return; | |
| } | |
| if (timedOut) { |
— glm-5.1 via Qwen Code /review
|
|
||
| let workingTreePrefetch: GitWorkingTreePrefetch | undefined; | ||
|
|
||
| if (!options.cache) { |
There was a problem hiding this comment.
[Suggestion] Change detection code is dead code in production
The hasFileListChanged/scanWorkingTreeForChange logic (~60 lines of mtime + fingerprint change detection) is gated behind if (!options.cache), but all production callers (RecursiveFileSearch, DirectoryFileSearch) pass cache: true. This sophisticated change-detection subsystem is never exercised in production — only the TTL-based crawlCache is used. Either document this explicitly or unify the two caching paths.
— glm-5.1 via Qwen Code /review
| let stderrBuf = ''; | ||
| let child; | ||
| try { | ||
| child = spawn(command, args, { |
There was a problem hiding this comment.
[Suggestion] spawn() inherits the parent process environment without sanitization. If the parent process has GIT_DIR, GIT_WORK_TREE, or GIT_INDEX_FILE set (e.g., from a git hook, CI wrapper, or IDE integration), all git child processes silently operate on the wrong repository — autocomplete shows files from an unrelated project with no error indication.
| child = spawn(command, args, { | |
| const child = spawn(command, args, { | |
| cwd, | |
| windowsHide: true, | |
| stdio: ['ignore', 'pipe', 'pipe'], | |
| env: { | |
| ...process.env, | |
| GIT_DIR: undefined, | |
| GIT_WORK_TREE: undefined, | |
| GIT_INDEX_FILE: undefined, | |
| }, | |
| }); |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| } | ||
|
|
||
| interface RunCommandOptions { | ||
| maxLines?: number; |
There was a problem hiding this comment.
[Suggestion] The maxLines parameter in RunCommandOptions and all associated logic (early-return on maxLines <= 0, lines.length >= maxLines check, killedByLimit flag) are dead code — every production caller uses collectLines: false and never sets maxLines.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| if (!state) return true; | ||
|
|
||
| if (currentMtime !== null && state.gitRootMtimeMs !== null) { | ||
| return currentMtime > state.gitRootMtimeMs || !isThrottled(stateKey); |
There was a problem hiding this comment.
[Suggestion] When currentMtime > state.gitRootMtimeMs, the throttle (5s) is bypassed and a full re-crawl always fires. But git index mtime changes on most git operations (git status, git add, branch switches), even when no tracked files changed. A git status in another terminal triggers an unnecessary triple-subprocess re-crawl on the next autocomplete keystroke.
| return currentMtime > state.gitRootMtimeMs || !isThrottled(stateKey); | |
| if (currentMtime > state.gitRootMtimeMs) { | |
| // Index touched — verify tracked file list actually changed | |
| const change = scanWorkingTreeForChange(stateKey, crawlDirectory); | |
| return change.changed || !isThrottled(stateKey); | |
| } |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| }; | ||
| } | ||
|
|
||
| function posixPathUnderGitRoot( |
There was a problem hiding this comment.
[Suggestion] posixPathUnderGitRoot() recomputes normalizeGitPath(relativeToGitRoot, true) and normalizeGitPath(relativeToCrawlDir, true) on every file in the streaming callback. These two values are loop-invariant per crawl call.
| function posixPathUnderGitRoot( | |
| // Hoist normalized paths in crawlWithGitLsFiles before the streaming callback: | |
| const normGitRoot = normalizeGitPath(relativeToGitRoot, true); | |
| const normCrawlDir = normalizeGitPath(relativeToCrawlDir, true); |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
- Updated tests to ensure the `maxDepth` is treated correctly relative to the crawl directory by resolving the real path of the temporary directory. - Refactored the `hasFileListChanged` function to improve logic for determining file list changes based on modification times. - Introduced a new function, `refreshChangeStateGitMtime`, to update the state with the latest git root modification time, enhancing state management during crawling. - Improved path normalization in `getPosixRelative` and `posixPathUnderGitRoot` to handle cases where paths may not exist, ensuring better relativity in path calculations.
|
@wenshao fixed |
Local verification reportPulled Unit tests
Real-world scenarios (9/9)
Performance (qwen-code repo, ~2,900 files)
On a synthetic 30k-tracked-file repo, main is actually faster on cold (35ms vs 103ms) because git subprocess startup overhead dominates fdir's raw walk on already-clean trees. Speed is not the headline win here — the real value is correct Earlier Critical comments — confirmed addressed on
|
wenshao
left a comment
There was a problem hiding this comment.
Approved after local verification — see comment above.
| return { changed: true }; | ||
| } | ||
|
|
||
| const changed = |
There was a problem hiding this comment.
[Critical] scanWorkingTreeForChange() only compares the untracked/deleted fingerprints after the index mtime advances. If a file is staged or otherwise added to the tracked set, it disappears from --others and is not in --deleted, so those fingerprints can stay unchanged and this branch returns changed: false; crawl() then refreshes the stored mtime and returns the old state.fileList. That leaves file search missing newly tracked files (or retaining stale tracked entries) after index-changing operations.
Please treat a newer index mtime as requiring a tracked git ls-files --cached refresh, or store/compare a tracked/index fingerprint in ChangeState; the untracked/deleted fingerprint optimization should not skip refreshing tracked files when the index changed.
— gpt-5.5 via Qwen Code /review
| untrackedArgs.push('--exclude-standard'); | ||
| } | ||
| if (relativeToGitRoot && relativeToGitRoot !== '.') { | ||
| untrackedArgs.push(relativeToGitRoot); |
There was a problem hiding this comment.
[Critical] The pathspec derived from crawlDirectory is appended directly after git ls-files options without a -- separator. --literal-pathspecs only disables pathspec magic; it does not stop a path beginning with - from being parsed as another ls-files option. A repository subdirectory such as --stage/ can therefore make this command run with unintended options instead of limiting the listing to that path.
Please insert -- before every user/path-derived pathspec in the git commands, e.g. push -- before relativeToGitRoot here and in the deleted/tracked listings.
— gpt-5.5 via Qwen Code /review
| ): Promise<string[] | null> { | ||
| const deletedArgs = ['--literal-pathspecs', 'ls-files', '-z', '--deleted']; | ||
| if (relativeToGitRoot && relativeToGitRoot !== '.') { | ||
| deletedArgs.push(relativeToGitRoot); |
There was a problem hiding this comment.
[Critical] Same pathspec parsing issue as above: this deleted-files listing appends relativeToGitRoot without a -- separator, so a crawl directory whose repo-relative path starts with - can be parsed as a git ls-files option instead of a literal pathspec.
Please add -- before appending the pathspec.
— gpt-5.5 via Qwen Code /review
| const trackedArgs = ['--literal-pathspecs', 'ls-files', '--cached']; | ||
| trackedArgs.push('-t'); | ||
| if (relativeToGitRoot && relativeToGitRoot !== '.') { | ||
| trackedArgs.push(relativeToGitRoot); |
There was a problem hiding this comment.
[Critical] Same pathspec parsing issue as above: the tracked-files command appends relativeToGitRoot without a -- separator. If the requested crawl directory starts with -, Git can treat it as an option and list the wrong scope or emit option-specific output that this parser was not expecting.
Please add -- before appending the pathspec.
— gpt-5.5 via Qwen Code /review
…#3214) Replaces the fdir filesystem crawler with a three-tier strategy for `@` file mention autocomplete. Strategy: 1. `git ls-files --cached` + `git ls-files --others --exclude-standard` for git repos — reads from the git index, gitignore correctness delegated to git. 2. `ripgrep --files` for non-git directories when `rg` is available. 3. `fdir` as the original last-resort fallback. Additional improvements: - mtime-based change detection on `.git/index` skips re-crawl when nothing changed. - 5s refresh throttling avoids rebuilding the index on every keystroke. - Async chunked indexing yields to the event loop every 1000 entries to stay responsive on 200k+ file trees. - Background untracked-file merge with a 10s timeout. - `resolveGitDir()` correctly handles `.git` as a file (worktrees) including relative `gitdir:` pointers and walks up to parent repos. - POSIX-style path normalization on Windows. - Streaming `spawn()` with line-by-line `onLine` processing and an enforced `maxFiles` budget that terminates the child early, preventing the previous 20MB stdout buffer hazard on large repos. - `--exclude-standard` on the untracked listing so gitignored directories (e.g. `node_modules/`, `dist/`) are not enumerated. Closes #3137
`new AsyncFzf(allFiles, ...)` is misleadingly named — its constructor is synchronous and dominates the @-picker's main-thread cost on workspaces with >20k files (where the v2→v1 algorithm switch kicks in). It's the remaining freeze source after PR QwenLM#3214 moved the crawl onto async spawn- based git ls-files / ripgrep / fdir. This change hosts the AsyncFzf instance in a worker_threads worker so the construction happens off the main thread; only completed find() results cross the message channel. Below ~5k files the in-thread path is kept because worker spawn + IPC overhead exceeds the construction cost there. Notes: - `FzfWorkerHandle.create()` awaits the worker `ready` event before returning, so init errors surface at create() time rather than as a surprise on the first find(). - Worker is `unref()`-ed so a mid-find @-picker doesn't pin process exit. - `FileSearch` interface gains an optional `dispose?(): Promise<void>` to release worker resources. `FileMessageHandler.clearFileSearchCache` calls it so the long-running VS Code extension host doesn't accumulate workers across watcher-triggered evictions. - `installInProcessFzfTransport()` is provided for tests / sandboxed environments that need to skip worker spawn entirely. - Bundle gains a second esbuild build that emits `dist/fzfWorker.js` alongside `dist/cli.js`; `prepare-package.js` ships it in the tarball. - In `npm run dev` (tsx) the bundled worker file isn't on disk, so the handle silently falls back to in-thread — acceptable for a dev script. Closes the door on QwenLM#3455's two open `FileIndexService` review items by design: there is no `whenReady()` (handle awaits ready inside `create()`) and there is no in-worker crawl (crawl stays on main, only fzf moves). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`new AsyncFzf(allFiles, ...)` is misleadingly named — its constructor is synchronous and dominates the @-picker's main-thread cost on workspaces with >20k files (where the v2→v1 algorithm switch kicks in). It's the remaining freeze source after PR QwenLM#3214 moved the crawl onto async spawn- based git ls-files / ripgrep / fdir. This change hosts the AsyncFzf instance in a worker_threads worker so the construction happens off the main thread; only completed find() results cross the message channel. Below ~5k files the in-thread path is kept because worker spawn + IPC overhead exceeds the construction cost there. Notes: - `FzfWorkerHandle.create()` awaits the worker `ready` event before returning, so init errors surface at create() time rather than as a surprise on the first find(). - Worker is `unref()`-ed so a mid-find @-picker doesn't pin process exit. - `FileSearch` interface gains an optional `dispose?(): Promise<void>` to release worker resources. `FileMessageHandler.clearFileSearchCache` calls it so the long-running VS Code extension host doesn't accumulate workers across watcher-triggered evictions. - `installInProcessFzfTransport()` is provided for tests / sandboxed environments that need to skip worker spawn entirely. - Bundle gains a second esbuild build that emits `dist/fzfWorker.js` alongside `dist/cli.js`; `prepare-package.js` ships it in the tarball. - In `npm run dev` (tsx) the bundled worker file isn't on disk, so the handle silently falls back to in-thread — acceptable for a dev script. Closes the door on QwenLM#3455's two open `FileIndexService` review items by design: there is no `whenReady()` (handle awaits ready inside `create()`) and there is no in-worker crawl (crawl stays on main, only fzf moves). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#4621) * perf(filesearch): move AsyncFzf index construction to a worker thread `new AsyncFzf(allFiles, ...)` is misleadingly named — its constructor is synchronous and dominates the @-picker's main-thread cost on workspaces with >20k files (where the v2→v1 algorithm switch kicks in). It's the remaining freeze source after PR #3214 moved the crawl onto async spawn- based git ls-files / ripgrep / fdir. This change hosts the AsyncFzf instance in a worker_threads worker so the construction happens off the main thread; only completed find() results cross the message channel. Below ~5k files the in-thread path is kept because worker spawn + IPC overhead exceeds the construction cost there. Notes: - `FzfWorkerHandle.create()` awaits the worker `ready` event before returning, so init errors surface at create() time rather than as a surprise on the first find(). - Worker is `unref()`-ed so a mid-find @-picker doesn't pin process exit. - `FileSearch` interface gains an optional `dispose?(): Promise<void>` to release worker resources. `FileMessageHandler.clearFileSearchCache` calls it so the long-running VS Code extension host doesn't accumulate workers across watcher-triggered evictions. - `installInProcessFzfTransport()` is provided for tests / sandboxed environments that need to skip worker spawn entirely. - Bundle gains a second esbuild build that emits `dist/fzfWorker.js` alongside `dist/cli.js`; `prepare-package.js` ships it in the tarball. - In `npm run dev` (tsx) the bundled worker file isn't on disk, so the handle silently falls back to in-thread — acceptable for a dev script. Closes the door on #3455's two open `FileIndexService` review items by design: there is no `whenReady()` (handle awaits ready inside `create()`) and there is no in-worker crawl (crawl stays on main, only fzf moves). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(filesearch): address review feedback on fzf worker PR - Fix worker thread leak in useAtCompletion: dispose previous FileSearch instance on re-init and component cleanup - Fix worker leak on init failure: wrap inst.ready() in try/catch with timeout, dispose on failure - Fix find()-after-dispose hang: reject immediately when worker is disposed - Add limit field to FindMessage protocol to cap IPC payload before postMessage serialization - Add real worker_threads test coverage (spawn/find/dispose lifecycle, concurrent finds, init failure, post-dispose rejection) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(filesearch): address second round of review feedback - Fix useAtCompletion cleanup: move dispose() to [cwd,config] effect so it only fires on workspace change, not every keystroke/status transition - Clear setTimeout in Promise.race on happy path to prevent timer leak - Dispose FileSearch instances in FileMessageHandler on panel close - Fix incorrect comment in prepare-package.js (resolveBundleDir, not URL) - Add worker crash test: verify failAll rejects pending find() on exit - Add RecursiveFileSearch.dispose() test in fileSearch.test.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(core): close missing braces in formatDateForContext test The second test case in the formatDateForContext describe block was missing its closing `});`, causing a TS1005 parse error that broke CI on all platforms. This was introduced upstream in 6a99ed1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(filesearch): guard initialize() against races and stale completions - Add cancelled flag to initialize() effect so a superseded or unmounted init disposes the newly-created searcher instead of writing it into fileSearch.current with no owner - Add fzfWorker.js test artifact to .gitignore to prevent accidental commits if the test process crashes before afterAll cleanup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(filesearch): mark worker handle as failed on post-init crash failAll() previously only set readyState to 'failed' when the worker crashed during init (state 'pending'). A crash after successful init left the handle in 'ready' state, causing subsequent find() calls to silently return empty results via the catch handler. Now failAll() unconditionally marks the handle as 'failed' (unless already disposed), and find() rejects immediately on a failed handle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Replaces the fdir-based filesystem crawler with a two-tier strategy for the
@file mention autocomplete feature.Closes #3137
Problem
The current fdir crawler re-scans the entire directory tree on every keystroke, which is slow on large repos and doesn't respect
.gitignorenatively.Solution
Three-level fallback strategy:
git ls-files— for git repos. Fast,.gitignore-aware, reads directly from git indexripgrep --files— for non-git directories. 20s timeout, 20MB buffer cap, respects.gitignoreautomaticallyfdir— original behavior preserved as last resortSupporting improvements:
.git/indexmtime, skips re-crawl when nothing changedpath.posix.relativewhencrawlDirectoryequalsgitRootTesting
npm run preflight