ENH Release GIL in DistanceMetric when validating data#17038
ENH Release GIL in DistanceMetric when validating data#17038thomasjpfan merged 12 commits intoscikit-learn:masterfrom
Conversation
Merge from upstream
Merge master
|
Thanks @webber26232 ! Do you actually see a difference in performance with multithreading after this change? The |
Yes I saw a reasonable parallel performance improvement after moving And it seems like the validation has to be performed here. May need another solution. |
|
Hmm, could you please provide an example script where you see the impact? If Cython doesn't indeed handle this well, it would have impact elsewhere as well as we frequently use conditional |
You can try the scripts below. I got the output on my machine: |
|
Thanks you are right, it's pretty bad. Using your scripts with I get Opened cython/cython#3554 . It would be nice to also fix other affected metrics: SEuclideanDistance, WMinkowskiDistance, MahalanobisDistance. |
|
For CI I think you need to also validate input in |
Yes, I think that's a good idea. |
|
Given that this PR is unlikely to make it in v0.23 (Release Candidate released in the following days) and that this is fixed in cython 3.0a3, maybe we can just wait for the stable cython release and re-build scikit-learn wheels with it.. |
|
Although doing input validation in a |
|
Is n_features=200 realistic for haversine
|
No, my mistake. But anyway the initial benchmarks in #17038 (comment) are still relevant (and correct). I was just double checking. |
Sounds good, I'll start to work on it. |
rth
left a comment
There was a problem hiding this comment.
Thanks @webber26232 and @scoder! LGTM.
thomasjpfan
left a comment
There was a problem hiding this comment.
Thank you @webber26232
This is a really nice improvement!
|
Please add an entry to the change log at |
merge new master
Merge branch 'master' of github.com:webber26232/scikit-learn into haversineGIL
|
@rth @thomasjpfan Thanks for your reviewing. I updated the What's New. It should be good to merge. |
|
Thanks! So WMinkowskiDistance etc and |
I also updated it to referring |
|
Thank you @webber26232! |
Reference Issues/PRs
Fixes #16970
What does this implement/fix? Explain your changes.
HaversineDistance: [sklearn.neighbors._dist_metrics.HaversineDistance] to enable parallel computing.BinaryTree.Any other comments?
Not sure if hard coding input dimension in
BinaryTreeis the best solution. Maybe allDistanceMetricneed avalidate_input()function.