Skip to content

Conversation

@glepnir
Copy link
Member

@glepnir glepnir commented Oct 9, 2024

some experimental attempts. I might try some new algorithms. will make a benchmark when I am satisfied with it. It's pretty loose at the moment. but no relevant fuzzy completion tests are broken. I'm guessing there are still edge cases that aren't caught. more actual behavioral testing is needed. It will still fall back to the original fuzzy_match for unsupported platforms to avoid incompatibilities.

(don't expect too much and pay too much attention. It may not be accepted 😊)

@glepnir glepnir marked this pull request as draft October 9, 2024 13:58
@glepnir glepnir force-pushed the simd_fuzzy branch 2 times, most recently from f22ae29 to c6b82a5 Compare October 11, 2024 11:51
Copy link
Member

@yegappan yegappan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any performance data running fuzzy match over a large number of strings using this new function?

@glepnir
Copy link
Member Author

glepnir commented Oct 12, 2024

Do you have any performance data running fuzzy match over a large number of strings using this new function?

In preparation. I will update here once the results are available. :) idea is that it will only try single bytes and if it contains multiple bytes it will still fall back to the original fuzzy match

if (match_mask)
{
// find position
int shift = __builtin_ctz(match_mask);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the compatibility of this function __builtin_ctz?

#endif

#ifdef PLATFORM_X86
#include <immintrin.h> // For x86/x64 SIMD intrinsics
Copy link
Contributor

@ychin ychin Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using immintrin.h is a bad idea, because it pulls in all intrinsics including those from SSE3 / SSE4 / AVX instruction sets. I'm assuming you are targeting SSE2 (that's the one guaranteed by x64), but the CPU you are targeting isn't guaranteed to support say SSE4 or AVX. The MSVC compiler allows you to use all kinds of instructions even if the compiler flag doesn't specify it (e.g. with -mavx2). On your machine it will work and then on some random person's old machine this code will just crash, which is not great.

I think it's better to just include the headers you need to be explicit what instruction sets you are pulling in. So in this case you probably want emmintrin.h (which only contains SSE2 instructions)

See https://stackoverflow.com/questions/11228855/header-files-for-x86-simd-intrinsics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I've been on arm for a while. So a bit sloppy. I'll test on an x64

Copy link
Contributor

@ychin ychin Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I looked at all the intrinsics you were using and they did seem like they are SSE2 only (from referencing Intel's docs), so I think any x64 would work.

My comment is just that ideally you should have a target hardware spec (just saying baseline SSE2 is fine too since it works on all x64 devices so your __x86_64__ ifdef check suffices) and make sure you stick to it. Including the all-inclusive immintrin.h makes it harder to stick to it since it allows newer intrinsics to be called whereas you want to the smaller headers that only include the specific instructions you want since it's easier to tell at a glance what instruction sets you are using.

I haven't done enough ARM NEON programming though. Are all instructions available to all __ARM_NEON platforms?

Edit: Nevermind the ARM NEON questions. I forgot it's all one instruction set so this isn't really an issue. I guess I'm traumatized having to write x86 SIMD code and needing to be very careful which instruction set version to pull from because some CPUs support some instruction sets but not others 😅. It's kind of a MSVC/x86 specific issue where you need to be quite careful with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate your advice. I will check carefully when i back here and can i ping you for any improve review?

@ychin
Copy link
Contributor

ychin commented Oct 18, 2024

Haven't taken a close look at the actual SIMD implementations yet, but just some initial comments looking at this:

  1. Can you specify what hardware targets you are targeting here? Seems like for x86 you are targeting any x64 chips (which support SSE2 at a minimum) but what's the ARM target? Is this supposed to work on all ARM chips (I'm a little less familiar with ARM NEON than x86 SSE/AVX)? Are we going to have situations where the program could crash because it's running on some older hardware that don't support certain instructions? Also see this comment from mine: Experiment: fuzzy completion based on simd #15837 (comment). If we use newer instructions than what older chips support we may need to switch dynamically instead of just using compiler flags.

  2. Just to be clear, this works with ASCII only right? From initial glance I don't think this code works with anything other than that. To be fair, ASCII is the most common things we use but I just want to be explicit about this. The utilities etc are named in such a way (e.g. simd_tolower) that could imply it works across the board. I would imagine you need some kind of switch to first detect if the string is ASCII only and then dynamically pick which version to use. Or is the plan to extend this to arbitrary text in Unicode or other encodings (seems hard)?

    • Speaking of which I am surprised this didn't break any fuzzy tests. I'm wondering if we should add more edge cases to fuzzy matching for Unicode texts.
  3. I wonder how this would be tested. IMO it's probably a good idea to set up a C unit test that compile both versions and feed a list of test data to make sure they perform identically on the same hardware (and do the same for utility functions like simd_tolower too). I guess Vim script tests could work but it's a somewhat indirect way to test this since you are just testing end-to-end and could miss the smaller stuff.

  4. As the other comment mentioned, having a consistent way to evaluate the performance of SIMD code would be useful here (I know you plan to add it later), since adding SIMD specializations add a lot more testing and maintenance burdens and it would be nice to have concrete benchmarking from realistic scenarios. What's the motivation for this change btw? Is fuzzy slow in real usage? Genuine question (since I don't use the feature much). It would be nice to get a feel of (1) how slow it currently is, and (2) how much performance advantage this actually brings (both in just calling the functions itself and also the end-to-end performance). Given that we are doing 16 chars at a time, theoretically we could get up to 16x the speed but usually it's smaller than that.

@glepnir
Copy link
Member Author

glepnir commented Oct 19, 2024

Just to be clear, this works with ASCII only right?

This is how the initial work is planned, because other encodings are somewhat troublesome.

about 4..I am also looking forward to whether the benchmark here will be improved. wait for a while

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants