You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These benefits translate effectively into computation-bound estimators, such as KNeighborsRegressor (based on #26267):
Plots
Alternatives Considered
As discussed in #26010 and Micky774#11, while there is a significant preference towards avoiding implementing SIMD-based solutions within scikit-learn at this time. I do believe that there is a reasonable way to maintain such work (at least up to SSE3 instructions), however a better-accepted solution is to create a plug-in for DistanceMetric and offer the SIMD-accelerated implementations as an engine. While this is indeed a good solution in the long run, there is still much work needed to be done on the plug-in API (#22438). Working on a separate engine/plug-in for DistanceMetric while the API is still being solidified and #25535 is still unmerged is probably going to do more harm than good by adding one more moving part to the mix and slowing down the review process.
Suggested Solution
Allow users to pass instances of DistanceMetric directly to metric keyword arguments. This is backwards compatible and doesn't require any significant new infrastructure (mainly small changes to validation and updated docs/tests). This enables third-party libraries to provide their own accelerated solutions immediately.
In practice, this involves changes mainly in the following:
I have a sample implementation of this for KNeighborsRegressor, which is achieved by enabling this functionality for ArgKmin along with updating parameter validation in NeighborsBase; please see #26267.
Motivation
SIMD intrinsics can accelerate pairwise distance computation by a factors of ~2.5-3.5x for
float64data, and ~5-6x forfloat32data (benchmarked by this gist: https://gist.github.com/Micky774/bd1b8394fdaa82b25dcdfc111835c19b).Plots
These benefits translate effectively into computation-bound estimators, such as
KNeighborsRegressor(based on #26267):Plots
Alternatives Considered
As discussed in #26010 and Micky774#11, while there is a significant preference towards avoiding implementing SIMD-based solutions within scikit-learn at this time. I do believe that there is a reasonable way to maintain such work (at least up to
SSE3instructions), however a better-accepted solution is to create a plug-in forDistanceMetricand offer the SIMD-accelerated implementations as an engine. While this is indeed a good solution in the long run, there is still much work needed to be done on the plug-in API (#22438). Working on a separate engine/plug-in forDistanceMetricwhile the API is still being solidified and #25535 is still unmerged is probably going to do more harm than good by adding one more moving part to the mix and slowing down the review process.Suggested Solution
Allow users to pass instances of
DistanceMetricdirectly tometrickeyword arguments. This is backwards compatible and doesn't require any significant new infrastructure (mainly small changes to validation and updated docs/tests). This enables third-party libraries to provide their own accelerated solutions immediately.In practice, this involves changes mainly in the following:
ArgKminRadiusNeighborsArgKminClassModepairwise_distancespairwise_distances_argminThis will allow us to enable the functionality in parts of the following estimators (non-exhaustive):
Notes:
pairwise_distanecscan't actually use theDistanceMetricin its current state, however once FEA IntroducePairwiseDistances, a generic back-end forpairwise_distances#25561 is completed, it can benefit from acceleratedDistanceMetricoptions as well.{KD, Ball}Treesupport passingDistanceMetricthrough themetricargument, however do not supportDistanceMetric32(see: ENH Addfloat32implementations forBallTreeandKDTree#25914)Implementation
I have a sample implementation of this for
KNeighborsRegressor, which is achieved by enabling this functionality forArgKminalong with updating parameter validation inNeighborsBase; please see #26267.