Skip to content

Conversation

@girishji
Copy link
Contributor

@girishji girishji commented Aug 3, 2025

Feat: Replace fuzzy matching algorithm with improved fzy-based implementation

The current fuzzy matching algorithm has several accuracy issues:

  • It struggles with CamelCase.
  • It fails to prioritize matches at the beginning of strings, often ranking middle matches higher.

You can observe these issues in action via the demo app — (try typing into the input box).

Why fzy?

After evaluating alternatives (see my comments here and here), I chose to adopt the fzy algorithm, which:

  • Resolves the aforementioned issues.
  • Performs better.

Implementation details

This version is based on the original fzy algorithm, with one key enhancement: multibyte character support.

  • The original implementation supports only ASCII.
  • This patch replaces ascii lookup tables with function calls, making it compatible with multibyte character sets.
  • Core logic (match_row() and match_positions()) remains faithful to the original, but now operates on codepoints rather than single-byte characters.

Performance

Tested against a dataset of 90,000 Linux kernel filenames. Results (in milliseconds) show a ~2x performance improvement over the current fuzzy matching algorithm.

Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75 
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b', 'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax', 'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor

Additional changes

  • Removed the camelcase option from both matchfuzzy() and matchfuzzypos(), as it's now obsolete with the improved algorithm.

Resolves #17531 and neovim/neovim#34101.

@habamax
Copy link
Contributor

habamax commented Aug 3, 2025

First impressions: it is really good for the usecases I have -- matches what I believe it should match, no surprises like current implementation has.

I didn't measure the performance -- it would be interesting to see the difference with 100k lines of regular text with long lines (80+ characters). I remember it was noticeably slow.

@habamax
Copy link
Contributor

habamax commented Aug 3, 2025

Same test with the book: https://www.gutenberg.org/ebooks/4300 (30k+ lines)

needle                        new,time (ms)  old, time (ms)
===========================================================
init                          24.607008      57.040414
main                          17.853692      40.735591
sig                           19.121312      52.534456
index                         7.884531       51.64927 
ab                            17.1621        46.484005
cd                            17.471492      30.892485
a                             25.44113       31.759264
b                             16.933126      20.254163
c                             19.665065      21.913082
k                             13.247087      18.20468 
z                             7.004227       15.308381
w                             18.711294      21.165332
cpa                           11.887729      29.903938
arz                           7.26018        51.601771
zzzz                          6.362547       15.71733 
dimag                         9.281163       41.561186
xa                            7.857765       16.133889
nha                           25.907618      55.965319
nedax                         7.805038       52.714896
dbue                          10.368032      34.732728
fp                            11.958991      25.241654
tr                            27.14616       54.724735
kw                            10.031663      20.380516
rp                            16.667419      39.946756
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 9.934473       22.13698 

War and Peace 66k+ lines, https://www.gutenberg.org/ebooks/2600

needle                        new,time (ms)  old, time (ms)
===========================================================
init                           59.328768     125.320524
main                           41.944414     85.487774
sig                            44.443681     115.257263
index                          17.870937     112.966163
ab                             37.299306     100.451458
cd                             44.872373     67.932252
a                              60.890125     70.903439
b                              36.866244     42.332422
c                              47.185453     48.005703
k                              28.464591     39.022517
z                              15.507651     32.582009
w                              47.652044     49.42231
cpa                            26.929167     62.380237
arz                            16.881851     112.217946
zzzz                           14.794418     33.190434
dimag                          21.456334     94.00777
xa                             18.980815     35.144998
nha                            67.410542     127.848764
nedax                          18.219076     117.826659
dbue                           23.179709     77.437491
fp                             27.814872     54.267577
tr                             66.866412     121.778725
kw                             22.846171     42.294782
rp                             39.519883     86.845848
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk  21.183752     45.051524

@girishji girishji changed the title Feat: Replace fuzzy algorithm with fzy-based implementation Feat: Replace fuzzy algorithm with improved fzy-based implementation Aug 4, 2025
@benknoble
Copy link
Contributor

This version is based on the original fzy algorithm, with one key enhancement: multibyte character support.

  • The original implementation supports only ASCII.
  • This patch replaces ascii lookup tables with function calls, making it compatible with multibyte character sets.
  • Core logic (match_row() and match_positions()) remains faithful to the original, but now operates on codepoints rather than single-byte characters.

You mention an original implementation (presumably of the fzy algorithm), but I don't see any mention of original source code (or a license)—if we are adapting existing code, understanding copyright requirements might be essential.


Likely conflicts with #17581

Comment on lines +1062 to +1086
if (positions)
for (int i = 0; i < n; i++)
positions[i] = i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (positions)
for (int i = 0; i < n; i++)
positions[i] = i;
if (positions)
{
for (int i = 0; i < n; i++)
positions[i] = i;
}

src/fuzzy.c Outdated
Comment on lines 1078 to 1080
{
match_row(&match, i, D[i], M[i], D[i - 1], M[i - 1]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{
match_row(&match, i, D[i], M[i], D[i - 1], M[i - 1]);
}
match_row(&match, i, D[i], M[i], D[i - 1], M[i - 1]);

@chrisbra chrisbra requested a review from Copilot August 7, 2025 20:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the current fuzzy matching algorithm with an improved fzy-based implementation that addresses several issues with the previous algorithm, particularly with CamelCase matching and prioritization of matches at the beginning of strings. The new implementation is based on the fzy algorithm but enhanced with multibyte character support.

Key changes:

  • Complete replacement of the fuzzy matching algorithm with a fzy-based implementation
  • Removal of the camelcase option from matchfuzzy() and matchfuzzypos() functions
  • Performance improvements with ~2x faster matching speed

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/vim.h Updated fuzzy matching constants and removed old limits
src/fuzzy.c New file containing the fzy-based fuzzy matching implementation
src/search.c Removed old fuzzy matching algorithm implementation
src/testdir/test_matchfuzzy.vim Updated tests to reflect new algorithm behavior and removed camelcase option tests
src/quickfix.c Updated to use new fuzzy matching constants and function signature
src/proto/search.pro Removed fuzzy matching function prototypes
src/proto/fuzzy.pro Added new fuzzy matching function prototypes
Multiple files Updated function calls to use new fuzzy score constants and removed camelcase parameter

@habamax
Copy link
Contributor

habamax commented Aug 10, 2025

@chrisbra, should fzy MIT license be added as well?

@chrisbra
Copy link
Member

As I understood it, merely the fzy algorithm was picked but no code was shared, in which case we don't need to apply the MIT license. Correct @girishji ?

@girishji
Copy link
Contributor Author

@chrisbra, I intentionally copied 2 functions (~130 lines) from https://github.com/jhawthorn/fzy/blob/master/src/match.c, and made minor changes to support multibyte codepoints.
I had to reimplement the other parts (~100 lines) out of necessity.

Do you think we should also reimplement the copied functions? (Not a big deal).
If not, my understanding is that we must include the following license notice in fuzzy.c.

/*
 * Portions of this file are adapted from fzy (https://github.com/jhawthorn/fzy)
 * Original code:
 *   Copyright (c) 2014 John Hawthorn
 *   Licensed under the MIT License.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */

@girishji
Copy link
Contributor Author

girishji commented Aug 10, 2025

The license does look ugly when you open fuzzy.c. Maybe I'll just reimplement the algorithm from ground up? Let me know either way.

@chrisbra
Copy link
Member

Okay, it should be fine to just mention the license like in iscygpty.c.

I don't care if it looks ugly ;)

@yegappan
Copy link
Member

The license does look ugly when you open fuzzy.c. Maybe I'll just reimplement the algorithm
from ground up. Let me know either way.

When I implemented the current fuzzy feature, I reached out to Forrest Smith and received his
permission to include his algorithm in Vim. Note that the search algorithm was completely
re-implemented in Vim. In this case, we should get permission from the fzy team before
including this in Vim.

@girishji
Copy link
Contributor Author

The license does look ugly when you open fuzzy.c. Maybe I'll just reimplement the algorithm
from ground up. Let me know either way.

When I implemented the current fuzzy feature, I reached out to Forrest Smith and received his permission to include his algorithm in Vim. Note that the search algorithm was completely re-implemented in Vim. In this case, we should get permission from the fzy team before including this in Vim.

Ok, I'll reach out to the author.

@girishji
Copy link
Contributor Author

Screenshot 2025-08-10 at 4 06 59 PM

@girishji
Copy link
Contributor Author

We have the answer:

Screenshot 2025-08-11 at 2 04 27 PM

@chrisbra
Copy link
Member

Great, can you please squash this in?

diff --git a/src/fuzzy.c b/src/fuzzy.c
index eb37effad..6ebfa5b21 100644
--- a/src/fuzzy.c
+++ b/src/fuzzy.c
@@ -171,7 +171,7 @@ fuzzy_match_item_compare(const void *s1, const void *s2)
     {
        int exact_match1 = FALSE, exact_match2 = FALSE;
        char_u *pat = ((fuzzyItem_T *)s1)->pat;
-       int patlen = STRLEN(pat);
+       int patlen = (int)STRLEN(pat);
        int startpos = ((fuzzyItem_T *)s1)->startpos;
        exact_match1 = (startpos >= 0) && STRNCMP(pat,
                ((fuzzyItem_T *)s1)->itemstr + startpos, patlen) == 0;

Otherwise, this causes failure on Appveyor build

This version is based on the original fzy algorithm, with one key enhancement:
multibyte character support.

The original implementation supports only ASCII via table lookups for
uppercase/lowercase scoring.

This patch replaces those lookups with function calls, making it compatible
with multibyte character sets.

Core logic (match_row() and match_positions()) remains faithful to the
original, but now operates on codepoints rather than single-byte characters.
@girishji
Copy link
Contributor Author

Great, can you please squash this in?

Done.

@chrisbra chrisbra closed this in 7e0df5e Aug 12, 2025
@chrisbra
Copy link
Member

Thanks, I include it. Can you please verify, if this description is still correct?
https://github.com/vim/vim/blob/master/runtime/doc/pattern.txt#L1502-L1538

@habamax
Copy link
Contributor

habamax commented Aug 12, 2025

Thank you!

@girishji
Copy link
Contributor Author

Thanks, I include it. Can you please verify, if this description is still correct? https://github.com/vim/vim/blob/master/runtime/doc/pattern.txt#L1502-L1538

Not quite but almost. I'll edit it later.

@girishji girishji deleted the fzy branch August 13, 2025 02:37
zeertzjq added a commit to zeertzjq/vim that referenced this pull request Aug 13, 2025
Problem:  matchfuzzy() leaks allocated lists (after 9.1.1627).
Solution: Restore the "retmatchpos" condition.

The memory leak cannot be caught by ASAN as the garbage collector still
tracks these lists, but it can be reproduced by the following script,
which increases Vim's memory usage to over 1 GiB without this fix:

```vim
for i in range(100000)
  call matchfuzzy([repeat('a', 300)], repeat('a', 257))
endfor
```

related: vim#17900
zeertzjq added a commit to zeertzjq/vim that referenced this pull request Aug 13, 2025
Problem:  matchfuzzy() leaks allocated lists (after 9.1.1627).
Solution: Restore the "retmatchpos" condition.

The memory leak cannot be caught by ASAN as the garbage collector still
tracks these lists, but it can be reproduced by the following script,
which increases Vim's memory usage to over 1 GiB without this fix:

```vim
for i in range(100000)
  call matchfuzzy([repeat('a', 300)], repeat('a', 257))
endfor
```

related: vim#17900
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Aug 13, 2025
Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](vim/vim#17531 (comment))
and
[here](vim/vim#17531 (comment))),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim#34101
fixes vim/vim#17531
closes: vim/vim#17900

vim/vim@7e0df5e

Co-authored-by: Girish Palya <girishji@gmail.com>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Aug 13, 2025
Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](vim/vim#17531 (comment))
and
[here](vim/vim#17531 (comment))),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim#34101
fixes vim/vim#17531
closes: vim/vim#17900

vim/vim@7e0df5e

Co-authored-by: Girish Palya <girishji@gmail.com>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Aug 13, 2025
Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](vim/vim#17531 (comment))
and
[here](vim/vim#17531 (comment))),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim#34101
fixes vim/vim#17531
closes: vim/vim#17900

vim/vim@7e0df5e

Co-authored-by: Girish Palya <girishji@gmail.com>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Aug 13, 2025
Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](vim/vim#17531 (comment))
and
[here](vim/vim#17531 (comment))),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim#34101
fixes vim/vim#17531
closes: vim/vim#17900

vim/vim@7e0df5e

Co-authored-by: Girish Palya <girishji@gmail.com>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Aug 13, 2025
Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](vim/vim#17531 (comment))
and
[here](vim/vim#17531 (comment))),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim#34101
fixes vim/vim#17531
closes: vim/vim#17900

vim/vim@7e0df5e

Co-authored-by: Girish Palya <girishji@gmail.com>
zeertzjq added a commit to zeertzjq/neovim that referenced this pull request Aug 13, 2025
Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](vim/vim#17531 (comment))
and
[here](vim/vim#17531 (comment))),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim#34101
fixes vim/vim#17531
closes: vim/vim#17900

vim/vim@7e0df5e

Co-authored-by: Girish Palya <girishji@gmail.com>
yochem pushed a commit to yochem/neovim that referenced this pull request Aug 23, 2025
Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](vim/vim#17531 (comment))
and
[here](vim/vim#17531 (comment))),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim#34101
fixes vim/vim#17531
closes: vim/vim#17900

vim/vim@7e0df5e

Co-authored-by: Girish Palya <girishji@gmail.com>
sahinf pushed a commit to sahinf/vim that referenced this pull request Sep 7, 2025
Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](vim#17531 (comment))
and
[here](vim#17531 (comment))),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim/neovim#34101
fixes vim#17531
closes: vim#17900

Signed-off-by: Girish Palya <girishji@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
dundargoc pushed a commit to dundargoc/neovim that referenced this pull request Sep 27, 2025
Problem:  fuzzy-matching can be improved
Solution: Implement a better fuzzy matching algorithm
          (Girish Palya)

Replace fuzzy matching algorithm with improved fzy-based implementation

The
[current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/)
fuzzy matching algorithm has several accuracy issues:

* It struggles with CamelCase
* It fails to prioritize matches at the beginning of strings, often
  ranking middle matches higher.

After evaluating alternatives (see my comments
[here](vim/vim#17531 (comment))
and
[here](vim/vim#17531 (comment))),
I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm,
which:

* Resolves the aforementioned issues.
* Performs better.

Implementation details

This version is based on the original fzy
[algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c),
with one key enhancement: **multibyte character support**.

* The original implementation supports only ASCII.
* This patch replaces ascii lookup tables with function calls, making it
  compatible with multibyte character sets.
* Core logic (`match_row()` and `match_positions()`) remains faithful to
  the original, but now operates on codepoints rather than single-byte
  characters.

Performance

Tested against a dataset of **90,000 Linux kernel filenames**. Results
(in milliseconds) show a **\~2x performance improvement** over the
current fuzzy matching algorithm.

```
Search String            Current Algo    FZY Algo
-------------------------------------------------
init                          131.759    66.916
main                          83.688     40.861
sig                           98.348     39.699
index                         109.222    30.738
ab                            72.222     44.357
cd                            83.036     54.739
a                             58.94      62.242
b                             43.612     43.442
c                             64.39      67.442
k                             40.585     36.371
z                             34.708     22.781
w                             38.033     30.109
cpa                           82.596     38.116
arz                           84.251     23.964
zzzz                          35.823     22.75
dimag                         110.686    29.646
xa                            43.188     29.199
nha                           73.953     31.001
nedax                         94.775     29.568
dbue                          79.846     25.902
fp                            46.826     31.641
tr                            90.951     55.883
kw                            38.875     23.194
rp                            101.575    55.775
kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519     30.921
```

```vim
vim9script

var haystack = readfile('/Users/gp/linux.files')

var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b',
'c', 'k',
    'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax',
'dbue',
    'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk']
for needle in needles
    var start = reltime()
    var tmp = matchfuzzy(haystack, needle)
    echom $'{needle}' (start->reltime()->reltimefloat() * 1000)
endfor
```

Additional changes

* Removed the "camelcase" option from both matchfuzzy() and
  matchfuzzypos(), as it's now obsolete with the improved algorithm.

related: neovim#34101
fixes vim/vim#17531
closes: vim/vim#17900

vim/vim@7e0df5e

Co-authored-by: Girish Palya <girishji@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fuzzy matching prioritizes internal match to prefix match

6 participants