DRAFT ENH Introduce `AVX` SIMD instructions for `ManhattanDistances.dist` by Micky774 · Pull Request #27145 · scikit-learn/scikit-learn

Micky774 · 2023-08-23T21:40:50Z

Reference Issues/PRs

A follow-up to #26010

What does this implement/fix? Explain your changes.

This PR demonstrates the minimal infrastructure required for proper building and runtime dispatch of AVX SIMD instructions. This means that scikit-learn can optionally be built with AVX instructions. If, at runtime, AVX support is found on the runtime machine and scikit-learn was built with AVX support, then the optimized AVX instructions are leveraged when computing distances. Any other case results in the usual scalar loops currently found on main.

Concrete changes

Introduction of _simd/ directory housing new files
Introduction of simd.hpp which provides a definition and extern declarations of templated Manhattan distance functions (xsimd_manhattan_dist)
Introduction of simd.cpp which provides externed declarations of xsimd_manhattan_dist. This file serves as a translation unit to target with -mavx feature flags.
Compiles the above into a composite avx_dist_metrics library which is linked against to isolate the rest of _dist_metrics from requiring additional feature flags (preventing auto-vectorization of unintended code, e.g. backup scalar loops)
Provides templated scalar loops in C++ which match our current Cython implementaiton
Adds WITH_SIMD macro to correctly compile SIMD optimized code, or default to a trivial implementation to satisfy compilation
Adds DIST_METRICS macro to correctly allow inclusion of _dist_optim.cpp in _dist_metrics.pyx but not in other Cython files (necessary since any cimport of _dist_metrics.pyx forces the inclusion of _dist_optim.cpp)
~~Adds build and runtime dependency on archspec to provide out-of-the-box support~~ Uses xsimd directly for runtime microarchitecture feature detection (to determine whether building with AVX is possible, and whether running with AVX is possible) via _runtime_check.pyx and by compiling runtime_check_program in setup.py for a quick build-time check.
Includes xsimd as a submodule (~2MB but could be reduced by half if we only include the include directory)
Changes self.{r}dist(...) calls to <Class>.rdist_csr(self, ...) to work around Cython's idiosyncrasies when compiling to C++

TODO:

~~Resolve whether to add a dependency on archspec or vendor it~~ Uses xsimd directly
Resolve whether to add xsimd as a submodule, vendor an instance, or only include as build-time dependency for wheels and conda-forge

Any other comments?

AVX instructions were deployed in late 2011 and 2012 and thus are likely found in a significant portion of our users' machines. Honestly, we may want to consider going to AVX2 directly since it was released only a couple years later...

I don't know of any comprehensive surveys in the ML field which indicate the presence of these features on work machines and laptops.

Initial benchmarks generated with this script (exported from a Jupyter notebook).

Note

PR_noruntime is the case where the library is built with AVX support, but the runtime check fails to detect AVX capabilities. PR_nobuild is built without AVX capabilities.
PR is built with AVX capabilities, and uses them at runtime.

There are no slowdowns compared to main, and when built with AVX support, and it is detected at runtime, we see minor speedups even in cases of small n_features, to drastic speedups (4-7x) when n_features is large.

Plots

github-actions · 2023-08-23T21:42:34Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: fd481b3. Link to the linter CI: here}

This reverts commit 0ebe57c.

adrinjalali · 2023-08-24T17:22:12Z

Thinks that came to my mind seeing this:

are we introducing C++? We don't really have any, except the ones we've vendored, so this has no precedence and I would rather intorduce rust than C++ in the project as a new-ish language.
definitely don't want to introduce a submodule, they're hard to work with, and confuse [almost] everyone. It would make it significantly harder to contribute to the project compared to what we have now.

Otherwise I think this is pretty cool.

Micky774 · 2023-08-24T18:05:21Z

are we introducing C++? We don't really have any, except the ones we've vendored, so this has no precedence and I would rather intorduce rust than C++ in the project as a new-ish language.

That's something I'm hoping we can discuss a bit more. I'm unfamiliar with Rust, and although I know it is picking up popularity pretty quickly, afaik it does make it harder to work on relatively complex data structures (the antagonistic example is doubly-linked-lists) but I don't honestly know to what degree. My (comparative) familiarity with C++biases me. I'd love for someone more familiar with Rust to educate me in this regard :)

I do think that there is potentially a lot of room to improve code readability by leveraging C++ templating and binding/importing via Cython. This is actually a line-item for discussion in the new native performance discussion group (see notes) although has not been broadly discussed yet. IMO it is easier to slowly-but-safely transition from pure Cython/C to some C++ in key areas than to introduce Rust, but I've not looked at any example implementations so I can still be swayed.

I do also believe that the ability to include a relatively simple static dispatch to a widely supported SIMD target, propagating performance gains across a potentially wide swathe of the library is itself a strong merit for C++ inclusion (even if not widespread adoption).

definitely don't want to introduce a submodule, they're hard to work with, and confuse [almost] everyone. It would make it significantly harder to contribute to the project compared to what we have now.

Right, that seems to be a common opinion. I have it added as a submodule right now for (my) convenience but it is simple to vendor it as a header-only library (they're under BSD-3 so licensing is no issue either). I don't have a strong preference one way or the other so I think we can navigate around that.

Edit: It's also not necessary at runtime at all, so we could try to specify it as a build-time requirement for users if they want SIMD acceleration and setup CI solutions for wheel building. I'm not sure what the best way to do that is, since (as usual) windows makes things tricky

adrinjalali · 2023-08-25T14:59:07Z

IMO it is easier to slowly-but-safely transition from pure Cython/C to some C++ in key areas than to introduce Rust, but I've not looked at any example implementations so I can still be swayed.

I would be very skeptical of that. C++ has become a monster. I would agree with you maybe 20 years ago though. But I've done enough C++ that I really wouldn't want us to maintain C++ code for multiple platforms.

I do also believe that the ability to include a relatively simple static dispatch to a widely supported SIMD target, propagating performance gains across a potentially wide swathe of the library is itself a strong merit for C++ inclusion (even if not widespread adoption).

I don't mind vendoring some C++ library at all, as long as we don't have to maintain it, and it doesn't give us much headache to support multiple platforms. And as long as the only thing we have to do is to add a very light binding layer which could also be auto-generated, then we should be fine.

jjerphan · 2023-09-10T13:59:33Z

As discussed on the last monthly meeting (though it is not reported in its notes), the introduction of C++ code (be it done via this PR or another one) necessitates considering several technical aspects (packaging consideration, linkage and build system, depending on third party library or vendoring it, etc.) and several social aspects (when do we want to use C++ over existing and used solutions like Cython, who gets to maintain C++ implementations for those part of the project to have an healthy bus-factor, what are the conventions around C++, etc.) of scikit-learn.

We decided that such choices needs to be discussed and voted in favor or in disfavor via a SLEP (I invoke @lorentzenchr for confirmation).

IIRC, @Micky774 (let me know if I am wrong) proposed to author one; I also propose helping with writing it.

lorentzenchr · 2023-09-10T19:49:06Z

Yes, let’s discuss in a slep which has as its goal the usage of simd instructions. Rust, for instance, would then be one alternative.
@Micky774 It does not need to be perfect to start a discussion.

Let me add here that we hit the boundaries of Cython several times, e.g. usage of tempita templates.

jjerphan · 2024-09-08T08:55:47Z

FYI, conda-forge now supports x86_64-microarch-level builds. 🙂

Micky774 added 15 commits August 21, 2023 15:04

Initial changes to include xsimd acceleration

6f8b473

Merge branch 'main' into simd

01bd9d4

Added exception to simd directory

c944542

Added initial simd

5b4f9bb

Reverted cpp changes

43b8b9f

Clean diff

fe126cb

Revert changes to binary tree

3174447

Revert changes to other trees

5312296

Reverted changes to partition nodes

5c3c2a4

Corrected include

33c1f09

Removed unnecessary header and set up for dynamic library creation

494404a

Implemented full build/runtime dispatch support

7de13c7

Updated comments

fc286b8

Added build-time dependnacy on archspec

52a3040

Minimize diff at expense of formatting

2f5e124

github-actions bot added cython module:metrics labels Aug 23, 2023

Micky774 added 2 commits August 23, 2023 18:14

Updated lockfiles to account for archspec

0ebe57c

Revert "Updated lockfiles to account for archspec"

87f6267

This reverts commit 0ebe57c.

Micky774 added 4 commits August 24, 2023 15:15

Moved to xsimd-only build/run-time feature checks

9bb7143

Removed old archspec dep

b50d966

Moved partial import

b98642b

Merge branch 'main' into simd

fd481b3

jjerphan mentioned this pull request Sep 10, 2023

PERF Implement PairwiseDistancesReduction backend for RadiusNeighbors.predict_proba #26828

Merged

jjerphan mentioned this pull request Oct 3, 2024

RFC: spatial: distance computation kernels scipy/scipy#21645

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DRAFT ENH Introduce `AVX` SIMD instructions for `ManhattanDistances.dist`#27145

DRAFT ENH Introduce `AVX` SIMD instructions for `ManhattanDistances.dist`#27145
Micky774 wants to merge 21 commits intoscikit-learn:mainfrom
Micky774:simd

Micky774 commented Aug 23, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Aug 23, 2023 •

edited

Loading

Uh oh!

adrinjalali commented Aug 24, 2023

Uh oh!

Micky774 commented Aug 24, 2023 •

edited

Loading

Uh oh!

adrinjalali commented Aug 25, 2023

Uh oh!

jjerphan commented Sep 10, 2023

Uh oh!

lorentzenchr commented Sep 10, 2023

Uh oh!

jjerphan commented Sep 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Micky774 commented Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adrinjalali commented Aug 24, 2023

Uh oh!

Micky774 commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented Aug 25, 2023

Uh oh!

jjerphan commented Sep 10, 2023

Uh oh!

lorentzenchr commented Sep 10, 2023

Uh oh!

jjerphan commented Sep 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Micky774 commented Aug 23, 2023 •

edited

Loading

github-actions bot commented Aug 23, 2023 •

edited

Loading

Micky774 commented Aug 24, 2023 •

edited

Loading