-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
feat(fuzzy): implement SIMD-accelerated Smith-Waterman with affine gaps #17581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @ychin |
79a7744 to
1459f84
Compare
|
Does it have enhanced camelcase support or "it just works"? |
a5fa44a to
242ffb0
Compare
|
|
|
So the algorithm is different and results as well? Need to try it out with my usecases. |
|
oh, it is not drop in replacement for existing |
|
based on the Smith-Waterman with fully affine gaps. Unlike nucleo, three matrices are used here instead of two. fzf uses one matrix. This is just a preliminary implementation of the algorithm. It needs further testing. I have not replaced the fuzzy related functions yet. Currently, it is used in fuzzy_match_str. That is to say, it can be tested with omnifunc |
5e1e241 to
ff3678f
Compare
|
Thanks, but I am still not sure what you want to achieve here. I don't really think it makes sense to use SIMD here. |
|
This algorithm is much better and is used in many successful projects that I have listed in the references. The current fuzzy algorithms are not ideal and are not efficient. At present, The overall implementation has been completed. After some testing, it can replace the current algorithm, as it uses SIMD acceleration on platforms that support it, and falls back to a scalar implementation otherwise. |
|
Are there concrete analysis or at least examples of 1) matching quality of the new algorithm, and 2) performance improvements enabled by SIMD, and how it perform compared to old algorithm? For (1), while it may sound better I think it would be easier to illustrate the point if there are examples of common use cases and we can see how the old / new compare. I'm sure it performs better in some cases but do we know it generally gives better matches on average? For (2), I would imagine having some basic performance benchmarks to be able to both compare the scalar performance of old versus new algo, and the scalar vs SIMD of the new algo would be useful. SIMD code adds more maintenance cost but if we can make concrete statements like "this <insert_string> costs 6 seconds to match in <insert_modern_cpu> but 0.5 seconds in SIMD" that seems easier to understand the impact. I think I mentioned a similar sentiment in #15837. Also, do you know what was the current algorithm based on again? |
|
I’ll definitely provide it — just not at the moment. I’ve only completed an initial implementation of the algorithm and want to evaluate whether it meets the requirements and if there are any aspects I might have overlooked. Once it's stable, I’ll include benchmark data as well. There’s also a fallback scalar implementation. current is ported from https://github.com/forrestthewoods/lib_fts/tree/master/code and blog post |
d3035ec to
5dae211
Compare
|
I second @ychin comment: we need to define a sound comparison metric which shall be measurable. And on top of that, we shall keep in mind the maintenance of it: will it be required a PhD in math along being a guru of the C language and knowing every corner of Vim to maintain the feature? If so, by experience, a sub-optimal solution would be way better. EDIT: ... and also, out of curiosity, why you choose this algo instead of Levenshtein & variants of it, BK-Tree, etc? :) |
Because this algorithm is used in the referenced projects. It has been implemented in the completion projects. These projects are very successful. So why do we need another algorithm... This PR fully implements the three-matrix gaps. I am considering removing simd to make it simpler. But I have something to do recently. Maybe I will come back to these PRs after next month. |
Well, I am pretty sure that other algorithms are used in other successful project as well :) Generally, before choosing a method, it should be cautionary to perform a literature review and evaluate the best approach for the given use-case rather than going with the first method found. ;-)
No worries. Keep in mind that here we are al volunteers and we don't/can't expect anything from anyone. We rely on our common passion for software and reciprocal trust. |
main reason for reference is https://github.com/helix-editor/nucleo and fzf. They do a very good job here. helix is the editor. fzf is built with many tools including vim/neovim |
|
I'm glad that someone is looking at Swith-Waterman for this. @glepnir Your implementation takes quadratic space in the input size. Do you think that memory usage can become a bottleneck for the intended usage, also considering that it is using three matrices instead of one? Implementing SW in linear space is possible, although backtracking becomes a bit more difficult (and I'm not suggesting that it should be done in this PR, eh!). Perhaps, this PR would be accepted more easily if the SIMD optimization were left for a separate PR. Would doing that be too much work? |
Yes. That's my plan. Remove SIMD and do a benchmark. If the performance is OK, SIMD is not necessary then |
dcb33a1 to
034f1e3
Compare
|
@lifepillar I’ve already replaced the current algorithm (though it still needs some cleanup — this is just for early testing). You can compile it directly and use this GAPS Smith-Waterman implementation. I’ve removed the SIMD part for now. You can try testing it with your completion plugin. This implementation reflects my personal understanding after reading the GAPS paper and the Smith-Waterman algorithm. There isn’t a full C version of the three-matrix approach that can be ported directly 😢. There are still some parts I’m uncertain about — it might take more time to look into them further. I also noticed someone is trying to port fzy, which is another option. But I haven’t looked into the details of fzy’s algorithm yet. It probably works great too. I might check it out when I have time. Give this a try first — we can revisit things later. since I might be a bit busy recently.😩 |
|
Got a compilation error: works with the following patch: diff --git a/src/quickfix.c b/src/quickfix.c
index ab595d7bb..11a4a5d3f 100644
--- a/src/quickfix.c
+++ b/src/quickfix.c
@@ -6489,7 +6489,7 @@ vgr_match_buflines(
// Fuzzy string match
CLEAR_FIELD(matches);
while (fuzzy_match(str + col, spat, FALSE, &score,
- matches, sz, TRUE) > 0)
+ matches, sz) > 0)
{
// Pass the buffer number so that it gets used even for a
// dummy buffer, unless duplicate_name is set, then the |
Introduces an optimized fuzzy matching algorithm using the Smith-Waterman algorithm with affine gap penalties, accelerated via SIMD instructions (SSE2 for x86 and NEON for ARM). The implementation features: 1. Full affine gap penalty model using three state matrices: - M matrix: Tracks match/mismatch scores - X matrix: Tracks gaps in the haystack (insertions in needle) - Y matrix: Tracks gaps in the needle (deletions in needle) 2. Position-based scoring bonuses: - Prefix bonus (+15) for start of haystack - Delimiter bonus (+30) after separator chars - Capitalization bonus (+30) Capital after lowercase - Case match bonus (+10) for exact case matches - Exact match bonus (+50) for perfect matches 3. SIMD optimizations: - Diagonal strip processing with 8-element parallelization - Vectorized gap penalty calculations - Batched character comparison with bonus application - Fallback to scalar computation for edge cases The algorithm improves on classic Smith-Waterman by: - Using affine gap penalties (open: -3, extend: -1) instead of linear - Adding position-aware bonuses for natural language patterns - Leveraging SIMD for 4-8x speedup on modern CPUs - Implementing traceback with state machine for efficient path recovery References: - CMU affine gap lecture: https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/gaps.pdf - Nucleo's SIMD approach: https://github.com/helix-editor/nucleo - Frizbee's scoring model: https://github.com/Saghen/frizbee
|
Not sure if it is the best way to test but I tried like this: func! Fuzzytest(pattern)
let file_list = systemlist("rg --files")
echom file_list->len()
let starttime = reltime()
let result = matchfuzzy(file_list, a:pattern)
echom (starttime->reltime()->reltimefloat() * 1000)
endfuncthen |
|
thanks does the results are correct ? There are indeed some performance issues in certain areas. I want to confirm whether the algorithm produces better results. |
I dont know, it's probably a case by case thing. Maybe some results are better in some cases, and other way around in other cases... 🤷♂️ where as with master branch it is: |
|
Likely conflicts with #17900 |
|
I don't know fzy's algorithm. I'm not sure if it's better. I just don't have much time to optimize this PR because my goal is to introduce simd. But seems like there does not like simd in core. |
|
Or maybe give users the option to choose the algorithm by some option? I do not strongly encourage this due to the costs of maintaining each one individually. |
|
So, there's only one best algorithm. I used the SIMD Smith-Waterman+GAP algorithm from Helix and Blink.cmp as reference. it already works well in completion plugins and editors. I haven't compared the specific algorithm with fzy. So I don't know which one is better, and I don't have time to evaluate it yet. A pull request for fzy has already been created, so let's give it a try. |
That seems unlikely ;) best in which cases? Along which axes (time, memory, user experience, etc.)? |
aha..I mean there can only be one optimal algorithm for the core. not multiple ..😢 |
I don't think SIMD is an issue per se personally. I just think it's better to only do it if we felt there was a need in specific spots first, rather than introducing SIMD because we want to. It's fun to add SIMD though so I get the allure. The other thing is that SIMD mostly introduces a constant time improvement, so it's not going to turn a fundamentally slow algorithm into a fast one. |
Introduces an optimized fuzzy matching algorithm using the Smith-Waterman algorithm with affine gap penalties, accelerated via SIMD instructions (SSE2 for x86 and NEON for ARM). The implementation features:
Full affine gap penalty model using three state matrices:
Position-based scoring bonuses:
SIMD optimizations:
The algorithm improves on classic Smith-Waterman by:
References:
relate neovim/neovim#34101