feat(fuzzy): implement SIMD-accelerated Smith-Waterman with affine gaps #17581

glepnir · 2025-06-20T07:24:49Z

Introduces an optimized fuzzy matching algorithm using the Smith-Waterman algorithm with affine gap penalties, accelerated via SIMD instructions (SSE2 for x86 and NEON for ARM). The implementation features:

Full affine gap penalty model using three state matrices:
- M matrix: Tracks match/mismatch scores
- X matrix: Tracks gaps in the haystack (insertions in needle)
- Y matrix: Tracks gaps in the needle (deletions in needle)
Position-based scoring bonuses:
- Prefix bonus (+15) for start of haystack
- Delimiter bonus (+30) after separator chars
- Capitalization bonus (+30) capital after lowercase
- Case match bonus (+10) for exact case matches
- Exact match bonus (+50) for perfect matches
SIMD optimizations:
- Diagonal strip processing with 8-element parallelization
- Vectorized gap penalty calculations
- Batched character comparison with bonus application
- Fallback to scalar computation for edge cases

The algorithm improves on classic Smith-Waterman by:

Using affine gap penalties (open: -3, extend: -1) instead of linear
Adding position-aware bonuses for natural language patterns
Leveraging SIMD for 4-8x speedup on modern CPUs
Implementing traceback with state machine for efficient path recovery

References:

CMU affine gap lecture: https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/gaps.pdf
Nucleo's SIMD approach: https://github.com/helix-editor/nucleo
Frizbee's scoring model: https://github.com/Saghen/frizbee
Fzf: https://github.com/junegunn/fzf

relate neovim/neovim#34101

glepnir · 2025-06-20T07:27:17Z

cc @ychin

habamax · 2025-06-20T07:43:21Z

Does it have enhanced camelcase support or "it just works"?

glepnir · 2025-06-20T09:45:02Z

Capitalization bonus for camelcase

habamax · 2025-06-20T09:58:34Z

So the algorithm is different and results as well?

Need to try it out with my usecases.

habamax · 2025-06-20T10:00:58Z

oh, it is not drop in replacement for existing fuzzymatch functions

glepnir · 2025-06-20T10:04:25Z

based on the Smith-Waterman with fully affine gaps. Unlike nucleo, three matrices are used here instead of two. fzf uses one matrix. This is just a preliminary implementation of the algorithm. It needs further testing. I have not replaced the fuzzy related functions yet. Currently, it is used in fuzzy_match_str. That is to say, it can be tested with omnifunc

chrisbra · 2025-06-20T16:41:23Z

Thanks, but I am still not sure what you want to achieve here. I don't really think it makes sense to use SIMD here.
Also why have another fuzzy algorithm here? I don't think any user will be aware of the specifics of the fuzzy algorithm so that they know that they even can decide to use a different algorithm.

glepnir · 2025-06-21T05:32:39Z

This algorithm is much better and is used in many successful projects that I have listed in the references. The current fuzzy algorithms are not ideal and are not efficient. At present, The overall implementation has been completed. After some testing, it can replace the current algorithm, as it uses SIMD acceleration on platforms that support it, and falls back to a scalar implementation otherwise.

src/Makefile

ychin · 2025-06-22T03:42:33Z

Are there concrete analysis or at least examples of 1) matching quality of the new algorithm, and 2) performance improvements enabled by SIMD, and how it perform compared to old algorithm?

For (1), while it may sound better I think it would be easier to illustrate the point if there are examples of common use cases and we can see how the old / new compare. I'm sure it performs better in some cases but do we know it generally gives better matches on average?

For (2), I would imagine having some basic performance benchmarks to be able to both compare the scalar performance of old versus new algo, and the scalar vs SIMD of the new algo would be useful. SIMD code adds more maintenance cost but if we can make concrete statements like "this <insert_string> costs 6 seconds to match in <insert_modern_cpu> but 0.5 seconds in SIMD" that seems easier to understand the impact. I think I mentioned a similar sentiment in #15837.

Also, do you know what was the current algorithm based on again?

glepnir · 2025-06-22T06:10:27Z

I’ll definitely provide it — just not at the moment. I’ve only completed an initial implementation of the algorithm and want to evaluate whether it meets the requirements and if there are any aspects I might have overlooked. Once it's stable, I’ll include benchmark data as well. There’s also a fallback scalar implementation. current is ported from https://github.com/forrestthewoods/lib_fts/tree/master/code and blog post

ubaldot · 2025-07-19T16:41:00Z

I second @ychin comment: we need to define a sound comparison metric which shall be measurable.

And on top of that, we shall keep in mind the maintenance of it: will it be required a PhD in math along being a guru of the C language and knowing every corner of Vim to maintain the feature? If so, by experience, a sub-optimal solution would be way better.

EDIT: ... and also, out of curiosity, why you choose this algo instead of Levenshtein & variants of it, BK-Tree, etc? :)

glepnir · 2025-07-20T00:23:01Z

EDIT: ... and also, out of curiosity, why you choose this algo instead of Levenshtein & variants of it, BK-Tree, etc? :)

Because this algorithm is used in the referenced projects. It has been implemented in the completion projects. These projects are very successful. So why do we need another algorithm...

This PR fully implements the three-matrix gaps. I am considering removing simd to make it simpler. But I have something to do recently. Maybe I will come back to these PRs after next month.

ubaldot · 2025-07-20T20:07:31Z

Because this algorithm is used in the referenced projects. It has been implemented in the completion projects. These > projects are very successful. So why do we need another algorithm...

Well, I am pretty sure that other algorithms are used in other successful project as well :)
I am fairly confident that each algorithm has pros and cons, and the choice depends on what one wants to achieve and what features have greater priority: speed? accuracy? other metrics?

Generally, before choosing a method, it should be cautionary to perform a literature review and evaluate the best approach for the given use-case rather than going with the first method found. ;-)

But I have something to do recently. Maybe I will come back to these PRs after next month.

No worries. Keep in mind that here we are al volunteers and we don't/can't expect anything from anyone. We rely on our common passion for software and reciprocal trust.

glepnir · 2025-07-21T06:17:08Z

Well, I am pretty sure that other algorithms are used in other successful project as well :)
I am fairly confident that each algorithm has pros and cons, and the choice depends on what one wants to achieve and what features have greater priority: speed? accuracy? other metrics?

main reason for reference is https://github.com/helix-editor/nucleo and fzf. They do a very good job here. helix is the editor. fzf is built with many tools including vim/neovim

lifepillar · 2025-07-22T06:09:22Z

I'm glad that someone is looking at Swith-Waterman for this.

@glepnir Your implementation takes quadratic space in the input size. Do you think that memory usage can become a bottleneck for the intended usage, also considering that it is using three matrices instead of one? Implementing SW in linear space is possible, although backtracking becomes a bit more difficult (and I'm not suggesting that it should be done in this PR, eh!).

Perhaps, this PR would be accepted more easily if the SIMD optimization were left for a separate PR. Would doing that be too much work?

glepnir · 2025-07-22T06:28:49Z

Perhaps, this PR would be accepted more easily if the SIMD optimization were left for a separate PR. Would doing that be too much work?

Yes. That's my plan. Remove SIMD and do a benchmark. If the performance is OK, SIMD is not necessary then

glepnir · 2025-07-29T12:51:53Z

@lifepillar I’ve already replaced the current algorithm (though it still needs some cleanup — this is just for early testing). You can compile it directly and use this GAPS Smith-Waterman implementation. I’ve removed the SIMD part for now. You can try testing it with your completion plugin.

This implementation reflects my personal understanding after reading the GAPS paper and the Smith-Waterman algorithm. There isn’t a full C version of the three-matrix approach that can be ported directly 😢. There are still some parts I’m uncertain about — it might take more time to look into them further.

I also noticed someone is trying to port fzy, which is another option. But I haven’t looked into the details of fzy’s algorithm yet. It probably works great too. I might check it out when I have time.

Give this a try first — we can revisit things later. since I might be a bit busy recently.😩

gcanat · 2025-08-01T06:08:37Z

Got a compilation error:

quickfix.c: In function ‘vgr_match_buflines’:
quickfix.c:6491:20: error: too many arguments to function ‘fuzzy_match’
 6491 |             while (fuzzy_match(str + col, spat, FALSE, &score,
      |                    ^~~~~~~~~~~
In file included from proto.h:188,
                 from vim.h:2464,
                 from quickfix.c:14:
proto/search.pro:40:5: note: declared here
   40 | int fuzzy_match(char_u *str, char_u *pat_arg, int matchseq, int *outScore, int_u *matches, int maxMatches);
      |     ^~~~~~~~~~~

works with the following patch:

diff --git a/src/quickfix.c b/src/quickfix.c
index ab595d7bb..11a4a5d3f 100644
--- a/src/quickfix.c
+++ b/src/quickfix.c
@@ -6489,7 +6489,7 @@ vgr_match_buflines(
            // Fuzzy string match
            CLEAR_FIELD(matches);
            while (fuzzy_match(str + col, spat, FALSE, &score,
-                       matches, sz, TRUE) > 0)
+                       matches, sz) > 0)
            {
                // Pass the buffer number so that it gets used even for a
                // dummy buffer, unless duplicate_name is set, then the

Introduces an optimized fuzzy matching algorithm using the Smith-Waterman algorithm with affine gap penalties, accelerated via SIMD instructions (SSE2 for x86 and NEON for ARM). The implementation features: 1. Full affine gap penalty model using three state matrices: - M matrix: Tracks match/mismatch scores - X matrix: Tracks gaps in the haystack (insertions in needle) - Y matrix: Tracks gaps in the needle (deletions in needle) 2. Position-based scoring bonuses: - Prefix bonus (+15) for start of haystack - Delimiter bonus (+30) after separator chars - Capitalization bonus (+30) Capital after lowercase - Case match bonus (+10) for exact case matches - Exact match bonus (+50) for perfect matches 3. SIMD optimizations: - Diagonal strip processing with 8-element parallelization - Vectorized gap penalty calculations - Batched character comparison with bonus application - Fallback to scalar computation for edge cases The algorithm improves on classic Smith-Waterman by: - Using affine gap penalties (open: -3, extend: -1) instead of linear - Adding position-aware bonuses for natural language patterns - Leveraging SIMD for 4-8x speedup on modern CPUs - Implementing traceback with state machine for efficient path recovery References: - CMU affine gap lecture: https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/gaps.pdf - Nucleo's SIMD approach: https://github.com/helix-editor/nucleo - Frizbee's scoring model: https://github.com/Saghen/frizbee

gcanat · 2025-08-01T12:52:38Z

Not sure if it is the best way to test but I tried like this:
vim --clean then sourced the file containing this:

func! Fuzzytest(pattern)
	let file_list = systemlist("rg --files")
	echom file_list->len()
	let starttime = reltime()
	let result = matchfuzzy(file_list, a:pattern)
	echom (starttime->reltime()->reltimefloat() * 1000)
endfunc

then :call Fuzzytest("diffusion")
file_list length is approx 52000.
Time is 85ms on master branch, whereas it is 500ms with this branch. So it appears to be much slower.

glepnir · 2025-08-02T08:46:58Z

thanks does the results are correct ? There are indeed some performance issues in certain areas. I want to confirm whether the algorithm produces better results.

gcanat · 2025-08-02T19:38:16Z

thanks does the results are correct ? There are indeed some performance issues in certain areas. I want to confirm whether the algorithm produces better results.

I dont know, it's probably a case by case thing. Maybe some results are better in some cases, and other way around in other cases... 🤷‍♂️
But I can give you an example of "bad" results with this branch.
The top3 results for Fuzzytest("normal.c"):

tools/ctags/Units/optscript.r/op-matchloc2line.d/args.ctags
tools/ctags/Units/optscript.r/op-markplaceholder.d/args.ctags
tools/vim/src/normal.c

where as with master branch it is:

tools/vim/src/normal.c
tools/wezterm/deps/harfbuzz/harfbuzz/src/hb-ot-shape-normalize.cc
tools/ctags/Tmain/common-prelude.d/normalize_spaces.expected

benknoble · 2025-08-04T16:41:23Z

Likely conflicts with #17900

glepnir · 2025-08-05T05:59:16Z

I don't know fzy's algorithm. I'm not sure if it's better. I just don't have much time to optimize this PR because my goal is to introduce simd. But seems like there does not like simd in core.

przepompownia · 2025-08-05T09:19:38Z

Or maybe give users the option to choose the algorithm by some option? I do not strongly encourage this due to the costs of maintaining each one individually.

glepnir · 2025-08-05T09:47:15Z

So, there's only one best algorithm. I used the SIMD Smith-Waterman+GAP algorithm from Helix and Blink.cmp as reference. it already works well in completion plugins and editors. I haven't compared the specific algorithm with fzy. So I don't know which one is better, and I don't have time to evaluate it yet. A pull request for fzy has already been created, so let's give it a try.

benknoble · 2025-08-05T14:41:53Z

So, there's only one best algorithm.

That seems unlikely ;) best in which cases? Along which axes (time, memory, user experience, etc.)?

glepnir · 2025-08-06T05:33:30Z

So, there's only one best algorithm.

That seems unlikely ;) best in which cases? Along which axes (time, memory, user experience, etc.)?

aha..I mean there can only be one optimal algorithm for the core. not multiple ..😢

ychin · 2025-08-07T23:17:24Z

But seems like there does not like simd in core

I don't think SIMD is an issue per se personally. I just think it's better to only do it if we felt there was a need in specific spots first, rather than introducing SIMD because we want to. It's fun to add SIMD though so I get the allure. The other thing is that SIMD mostly introduces a constant time improvement, so it's not going to turn a fundamentally slow algorithm into a fast one.

glepnir force-pushed the simd_sw branch from 6c2c82b to f9e7e5c Compare June 20, 2025 07:26

glepnir marked this pull request as draft June 20, 2025 07:26

glepnir force-pushed the simd_sw branch 2 times, most recently from 79a7744 to 1459f84 Compare June 20, 2025 07:39

habamax mentioned this pull request Jun 20, 2025

Fuzzy matching prioritizes internal match to prefix match #17531

Closed

glepnir force-pushed the simd_sw branch 4 times, most recently from a5fa44a to 242ffb0 Compare June 20, 2025 09:41

glepnir force-pushed the simd_sw branch from 242ffb0 to 0476fe8 Compare June 20, 2025 09:58

glepnir force-pushed the simd_sw branch 6 times, most recently from 5e1e241 to ff3678f Compare June 20, 2025 11:15

glepnir mentioned this pull request Jun 22, 2025

fuzzy completeopt option doesn't seem to respect "match at beginning" and puts prefixed words above neovim/neovim#34101

Closed

ychin reviewed Jun 22, 2025

View reviewed changes

src/Makefile Show resolved Hide resolved

glepnir force-pushed the simd_sw branch 2 times, most recently from d3035ec to 5dae211 Compare June 22, 2025 08:35

glepnir force-pushed the simd_sw branch 2 times, most recently from dcb33a1 to 034f1e3 Compare July 29, 2025 12:41

glepnir force-pushed the simd_sw branch from 034f1e3 to 42145c4 Compare August 1, 2025 06:13

benknoble mentioned this pull request Aug 4, 2025

Feat: Replace fuzzy algorithm with improved fzy-based implementation #17900

Closed

glepnir closed this Aug 5, 2025

glepnir deleted the simd_sw branch August 5, 2025 09:47

Uh oh!

feat(fuzzy): implement SIMD-accelerated Smith-Waterman with affine gaps #17581

feat(fuzzy): implement SIMD-accelerated Smith-Waterman with affine gaps #17581

Uh oh!

Conversation

glepnir commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glepnir commented Jun 20, 2025

Uh oh!

habamax commented Jun 20, 2025

Uh oh!

glepnir commented Jun 20, 2025

Uh oh!

habamax commented Jun 20, 2025

Uh oh!

habamax commented Jun 20, 2025

Uh oh!

glepnir commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisbra commented Jun 20, 2025

Uh oh!

glepnir commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ychin commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glepnir commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ubaldot commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glepnir commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ubaldot commented Jul 20, 2025

Uh oh!

glepnir commented Jul 21, 2025

Uh oh!

lifepillar commented Jul 22, 2025

Uh oh!

glepnir commented Jul 22, 2025

Uh oh!

glepnir commented Jul 29, 2025

Uh oh!

gcanat commented Aug 1, 2025

Uh oh!

gcanat commented Aug 1, 2025

Uh oh!

glepnir commented Aug 2, 2025

Uh oh!

gcanat commented Aug 2, 2025

Uh oh!

benknoble commented Aug 4, 2025

Uh oh!

glepnir commented Aug 5, 2025

Uh oh!

przepompownia commented Aug 5, 2025

Uh oh!

glepnir commented Aug 5, 2025

Uh oh!

benknoble commented Aug 5, 2025

Uh oh!

glepnir commented Aug 6, 2025

Uh oh!

ychin commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

glepnir commented Jun 20, 2025 •

edited

Loading

glepnir commented Jun 20, 2025 •

edited

Loading

glepnir commented Jun 21, 2025 •

edited

Loading

ychin commented Jun 22, 2025 •

edited

Loading

glepnir commented Jun 22, 2025 •

edited

Loading

ubaldot commented Jul 19, 2025 •

edited

Loading

glepnir commented Jul 20, 2025 •

edited

Loading