perf(grep): speed up the native engine and add a search timeout#4113
Merged
Conversation
The native search engine (the fallback when ripgrep is absent) walked serially and paid two avoidable costs on every run, which a profile of a 6k-file no-match walk pinned down: - it recompiled the full cumulative .gitignore pattern set into regexes at every directory, so inherited ancestor rules were compiled over and over (~36% of the walk); - it allocated ~72 KiB per file (8 KiB peek + 64 KiB scanner buffer), churning the GC across the whole tree. Memoizing the compiled matcher by its pattern set and reusing the per-file buffers across the serial walk roughly halves the warm full-walk time (6k-file tree: 1.31s -> 0.62s with ignores, 0.61s -> 0.30s without). The decoder path is handed a copy of the peek bytes so its goroutine can't alias the reused buffer after an early return. Also add a model-supplied timeout_seconds (default 30s, max 300s): a pathological search now returns the matches found so far with a clear note pointing at timeout_seconds, instead of hanging for minutes. Both the native and ripgrep paths honor it through the request context.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Users on machines without ripgrep reported
grepbeing extremely slow — a single search occasionally running for minutes without returning. Withrgpresent the tool delegates to it, but the native Go fallback was both slow and unbounded.Profiling a native full walk (6k-file tree, no match, so no early cap exit) attributed the time to two avoidable costs:
.gitignorecompile (enter→CompileIgnoreLines)searchFileWhat changed
timeout_secondsparameter (default 30s, max 300s). A pathological search now returns the matches found so far with a clear note pointing attimeout_seconds, instead of hanging. Both the native and ripgrep paths honor it through the request context.Numbers (warm best, synthetic 300-dir / 6k-file tree)
End-to-end: a 20k-file cold tree with
timeout_seconds: 1now aborts at ~1.0s with a partial-results note instead of running 8s+.Tests
TestGrepTimeoutClamp— default / negative / in-range / over-cap clamping.TestGrepTimeoutPreservesPartialResults— a timed-out search keeps the matches found so far and flags the cutoff; a zero-match timeout reports the timeout rather than(no matches); a completed zero-match search still returns(no matches).Full
internal/tool/builtinsuite,go vet, andgofmtare clean. Directory-level pruning of ignored/vendor trees is unchanged.Follow-ups (not in this PR)
rgso the fast path is always taken.