-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Experiment: fuzzy completion based on simd #15837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f22ae29 to
c6b82a5
Compare
yegappan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any performance data running fuzzy match over a large number of strings using this new function?
In preparation. I will update here once the results are available. :) idea is that it will only try single bytes and if it contains multiple bytes it will still fall back to the original fuzzy match |
| if (match_mask) | ||
| { | ||
| // find position | ||
| int shift = __builtin_ctz(match_mask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the compatibility of this function __builtin_ctz?
| #endif | ||
|
|
||
| #ifdef PLATFORM_X86 | ||
| #include <immintrin.h> // For x86/x64 SIMD intrinsics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using immintrin.h is a bad idea, because it pulls in all intrinsics including those from SSE3 / SSE4 / AVX instruction sets. I'm assuming you are targeting SSE2 (that's the one guaranteed by x64), but the CPU you are targeting isn't guaranteed to support say SSE4 or AVX. The MSVC compiler allows you to use all kinds of instructions even if the compiler flag doesn't specify it (e.g. with -mavx2). On your machine it will work and then on some random person's old machine this code will just crash, which is not great.
I think it's better to just include the headers you need to be explicit what instruction sets you are pulling in. So in this case you probably want emmintrin.h (which only contains SSE2 instructions)
See https://stackoverflow.com/questions/11228855/header-files-for-x86-simd-intrinsics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I've been on arm for a while. So a bit sloppy. I'll test on an x64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I looked at all the intrinsics you were using and they did seem like they are SSE2 only (from referencing Intel's docs), so I think any x64 would work.
My comment is just that ideally you should have a target hardware spec (just saying baseline SSE2 is fine too since it works on all x64 devices so your __x86_64__ ifdef check suffices) and make sure you stick to it. Including the all-inclusive immintrin.h makes it harder to stick to it since it allows newer intrinsics to be called whereas you want to the smaller headers that only include the specific instructions you want since it's easier to tell at a glance what instruction sets you are using.
I haven't done enough ARM NEON programming though. Are all instructions available to all __ARM_NEON platforms?
Edit: Nevermind the ARM NEON questions. I forgot it's all one instruction set so this isn't really an issue. I guess I'm traumatized having to write x86 SIMD code and needing to be very careful which instruction set version to pull from because some CPUs support some instruction sets but not others 😅. It's kind of a MSVC/x86 specific issue where you need to be quite careful with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really appreciate your advice. I will check carefully when i back here and can i ping you for any improve review?
|
Haven't taken a close look at the actual SIMD implementations yet, but just some initial comments looking at this:
|
This is how the initial work is planned, because other encodings are somewhat troublesome. about 4..I am also looking forward to whether the benchmark here will be improved. wait for a while |
some experimental attempts. I might try some new algorithms. will make a benchmark when I am satisfied with it. It's pretty loose at the moment. but no relevant fuzzy completion tests are broken. I'm guessing there are still edge cases that aren't caught. more actual behavioral testing is needed. It will still fall back to the original fuzzy_match for unsupported platforms to avoid incompatibilities.
(don't expect too much and pay too much attention. It may not be accepted 😊)