Skip to content

Replace fdir filesystem crawler with git ls-files + ripgrep for file search #3137

@tanzhenxin

Description

@tanzhenxin

What would you like to be added?

Replace the current fdir-based recursive filesystem crawler (used by the @ file mention / file search feature) with a two-tier strategy:

  1. Primary: git ls-files — read tracked files directly from the git index
  2. Fallback: ripgrep --files — for non-git directories, with timeout and buffer caps

Additionally, adopt the following supporting improvements:

  • Background untracked file merge — fetch untracked files asynchronously with a timeout (e.g. 10s), merge them into the index in the background so the main search stays fast
  • Mtime-based change detection — watch .git/index mtime to detect when the file list has changed, instead of re-crawling on every refresh
  • Refresh throttling — throttle index rebuilds (e.g. once per 5s) to prevent thrashing
  • Async chunked indexing — yield to the event loop periodically during indexing so the UI stays responsive even with 200k+ files

Why is this needed?

The current fdir-based crawler performs an unbounded recursive filesystem walk. This causes problems in several real-world scenarios:

Switching to git ls-files addresses the root cause:

  • Inherently bounded: only returns tracked files, so the result set is always reasonable regardless of what's on disk
  • Fast: reads from .git/index (an in-memory data structure), no filesystem walk needed
  • No ignore rule complexity: git already knows what's tracked; no need to parse and apply .gitignore rules ourselves
  • Battle-tested: git handles edge cases (submodules, sparse checkout, large repos) that a naive filesystem walk does not

The ripgrep fallback for non-git directories brings its own protections:

  • Timeout (e.g. 20s) — kills runaway searches; escalates from SIGTERM to SIGKILL if the process doesn't exit gracefully
  • Buffer cap (e.g. 20 MB on stdout) — prevents OOM when ripgrep discovers 200k+ files
  • Built-in ignore support — respects .gitignore, .ignore, .rgignore out of the box

Additional context

None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions