PERF Pass buffers via pointers in PairwiseDistancesReductions routines for sparse data#26765
Conversation
|
@jjerphan @thomasjpfan In case either of you have interest in the PR. The scope/content is fairly limited. |
|
Nice one! Those implementations get up to 1.5× speedups! Two questions:
|
thomasjpfan
left a comment
There was a problem hiding this comment.
Given the performance improvement, I think this deserves a changelog entry in v1.4. Otherwise, LGTM.
|
Also, should we adjust the title of this PR to be more specific? For instance, how about this one? |
PairwiseDistancesReductions routines for sparse data
Almost certainly. Anywhere we don't use the associated attributes of a memory-view and are able to leverage contiguous memory, we could technically use pointers instead. With that being said, this is really only a benefit for function calls in tight loops where the overhead is a dominant factor. I'm not sure where else that pattern is present (maybe somewhere in linear models or trees?) |
jjerphan
left a comment
There was a problem hiding this comment.
Thank you @Micky774!
LGTM modulo a comment.
As you indicated, other Cython implementations might be adapted similarly. I think we can keep them for first time Cython-contributions. What do you think?
@da-woods: are the optimisations that you have mentioned in #25608 (comment) similar to those ones?
doc/whats_new/v1.4.rst
Outdated
| - |Performance| Computing pairwise distances for (CSR x CSR) and (CSR x Dense) | ||
| datasets is now 1.5x faster by improving the argument passing strategy used | ||
| in the computation routines in :class:`metrics.DistanceMetric`. |
There was a problem hiding this comment.
Nitpick: I think that we generally keep changelog entry short without giving too much technical details.
| - |Performance| Computing pairwise distances for (CSR x CSR) and (CSR x Dense) | |
| datasets is now 1.5x faster by improving the argument passing strategy used | |
| in the computation routines in :class:`metrics.DistanceMetric`. | |
| - |Performance| Computing pairwise distances via :class:`metrics.DistanceMetric` for CSR × CSR, Dense × CSR, and CSR × Dense datasets is now 1.5x faster. |
Should we also add another one for estimators of module.neighbors?
There was a problem hiding this comment.
Should we also add another one for estimators of
module.neighbors?
I'm not sure -- there are a lot of estimators which at least partly use the DistanceMetric backend
(the following list is puled from #26329):
- NearestNeighbors
- KNeighborsRegressor
- KNeighborsClassifier
- RadiusNeighborsRegressor
- RadiusNeighborsClassifier
- DBSCAN
- OPTICS
- Isomap
- TSNE (self.method != "exact")
- KernelDensity
- AffinityPropagation
- Birch
- MeanShift
- NearestCentroid
I don't quite know where to draw the line for this if we do include it for other estimators specifically. Most .predict methods, for example, benefit from this.
I think that's a good idea, though I'm not sure what the best way to discover where in the code these changes ought to be propagated. Perhaps an open ended meta-issue of "If you spot this pattern, feel free to open a PR fixing it" would be appropriate? |
|
Yes, I guess we can spend some time to read implementations and list in a meta-issue candidate places for those optimisations. |
|
The one thing I'm slightly nervous of is that you look to be taking these pointers from non-contiguous memoryviews. I suspect practically they are contiguous in practice, but it might be nice to enforce that. Passing a pointer is about as light-weight as you can get, so nothing Cython can do is ever likely to beat it. The changes I mentioned in the linked issue should bring memoryview slicing a lot closer, but pointers are still likely to win. (Essentially all I've done is noticed that taking multiple slices of the same array is a reference counting no-op, so you pay for one reference count at the start of the loop and that's it. From the point of view of this PR, it's a distraction though) |
jjerphan
left a comment
There was a problem hiding this comment.
I think we need to operate on continuous buffers as mentioned by #26765 (comment).
Good point! I've now done so in the attribute declarations and some method signatures. I wanted to as well check if enforcing it in the Let me know if there are any other spots you think we could tighten up our guarantees. Thanks to all 😄 |
|
@jjerphan Thoughts? |
jjerphan
left a comment
There was a problem hiding this comment.
LGTM modulo a suggestion.
This must remove some of the dispatch cost that @Vincent-Maladiere and I typically have observed in the past with #25170.
…nes for sparse data (scikit-learn#26765)
…nes for sparse data (scikit-learn#26765)
Reference Issues/PRs
Discussed a little here
What does this implement/fix? Explain your changes.
The
{r}dist_csrmethods now accept pointers rather than memory views since none of the peripheral data of memory views are used. This significantly decreases call overhead, which is especially beneficial for the tight loops in which these functions are often used.This also includes a formatting fix for
HaversineDistanceThis also enforces contiguous layouts wherever possible.
Any other comments?
Benchmarks generated via: https://gist.github.com/Micky774/9daede3d638ebbdbb34bc26f884f2748
Benchmarks