fix(core): cap recursive file crawler at 100k entries to prevent OOM#3138
Conversation
When the @ autocomplete triggers RecursiveFileSearch, the crawler materialises the entire project tree into memory with no upper bound. For very large workspaces (missing .gitignore, huge node_modules, home directory as cwd) this pushes Node.js past its heap limit and crashes. - Add `maxFiles` option to CrawlOptions; use fdir's withMaxFiles() to stop traversal early instead of post-hoc truncation - Apply file-level ignore patterns during crawl via fdir filter() so ignored files don't consume the maxFiles budget - Include maxFiles in the crawl cache key for correctness - Set MAX_CRAWL_FILES = 100 000 in RecursiveFileSearch (caps peak memory at ~50 MB for the file list) Fixes #3130
📋 Review SummaryThis PR addresses a critical OOM vulnerability where recursive file crawling with 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…wenLM#3138) When the @ autocomplete triggers RecursiveFileSearch, the crawler materialises the entire project tree into memory with no upper bound. For very large workspaces (missing .gitignore, huge node_modules, home directory as cwd) this pushes Node.js past its heap limit and crashes. - Add `maxFiles` option to CrawlOptions; use fdir's withMaxFiles() to stop traversal early instead of post-hoc truncation - Apply file-level ignore patterns during crawl via fdir filter() so ignored files don't consume the maxFiles budget - Include maxFiles in the crawl cache key for correctness - Set MAX_CRAWL_FILES = 100 000 in RecursiveFileSearch (caps peak memory at ~50 MB for the file list) Fixes QwenLM#3130
TLDR
When user input contains
@(e.g.@latestin an npm context), the CLI's autocomplete triggersRecursiveFileSearchwhich crawls the entire project tree with no upper bound. For very large workspaces — missing.gitignore, hugenode_modulestrees, or home directory as cwd — this pushes Node.js past its ~4 GB heap limit and crashes with an OOM. This PR caps the crawler at 100,000 entries and applies file-level ignore patterns during traversal so ignored files don't consume the budget.Screenshots / Video Demo
N/A — no user-facing change beyond preventing the OOM crash. The autocomplete behavior is identical for projects under 100k files (virtually all real projects).
Dive Deeper
Root cause:
RecursiveFileSearch.initialize()→crawl()usesfdirto traverse the full tree, storing every path in astring[]. The results are then.map()'d to relative paths, stored in aResultCache, and indexed byAsyncFzf— at least 3–5× the raw file list in memory. With millions of entries this exceeds the V8 heap limit.What changed (4 files):
crawler.tsmaxFilestoCrawlOptions. Usesfdir.withMaxFiles()to stop traversal natively. Addedfdir.filter()to apply file-level ignore patterns during crawl so ignored files (e.g.*.log,*.map) don't eat the budget.crawlCache.tsmaxFilesin the SHA-256 cache key so different limits don't share cached results.fileSearch.tsMAX_CRAWL_FILES = 100_000constant. Passed tocrawl()fromRecursiveFileSearch.initialize().crawler.test.tsbar.mkthat should have been filtered by*.mkignore pattern.Why 100,000? At ~100 bytes/path average, 100k entries ≈ 10 MB for the path array, ~50 MB total with caches and FZF index — well under the heap limit while covering virtually all real projects.
Reviewer Test Plan
cd packages/core && npx vitest run src/utils/filesearch/(71 tests, all pass)node dist/cli.js "check if my configs use @latest" --approval-mode yolo --output-format jsonshould complete with exit code 0 and no OOM@and confirm the autocomplete still works (returns results from the first 100k entries) without crashingTesting Matrix
Linked issues / bugs
Fixes #3130