[MRG+1] Parallel radius neighbors#10887
Conversation
|
We found in benchmarks that this did not improve runtime. Does your mileage
vary?
|
|
I haven't done a proper benchmark yet but I saw much better runtime on my laptop. I'll do a benchmark today and post the results here. |
|
I ran 10 times the following benchmark on a Google Cloud instance with 64 vCPUs: from sklearn.datasets import make_blobs
from sklearn.neighbors import NearestNeighbors
from sklearn.externals.joblib import cpu_count
import time
d = make_blobs(100000, 100)[0]
nn = NearestNeighbors().fit(d)
for n_jobs in range(1, cpu_count() + 1):
nn.n_jobs = n_jobs
start = time.time()
nn.radius_neighbors()
end = time.time()
print('{},{}'.format(n_jobs, end - start))Although the scaling is not linear it definitely runs faster: |
|
maybe I'm not recalling correctly, and we never got as far as completely
removing the gil. The results look great! The implementation looks good at
a glance but will need a proper review. please add a test that results are
unchanged.
|
|
One thing that was not obvious to me but made a huge difference was that in - raise ValueError("Fatal: count out of range. "
- "This should never happen.")
+ with gil:
+ raise ValueError("Fatal: count out of range. "
+ "This should never happen.")
instead of this: - raise ValueError("Fatal: count out of range. "
- "This should never happen.")
+ return -1have a completely different scaling behavior. In both case the function is |
|
@jnothman Thank you for your comments. I think this is ready for review. |
|
And I wish I were available to review it, but will not be for a few weeks
at least...
…On 2 April 2018 at 17:44, Joël Billaud ***@***.***> wrote:
@jnothman <https://github.com/jnothman> Thank you for your comments. I
think this is ready for review.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10887 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yEYiXdA6Uiz7cn1_xVbU6Y1IzEcks5tkdbigaJpZM4S_fFL>
.
|
|
Nice work ! The code seems reasonable, though I am a novice in C memory allocation. As a general comment, your code could benefit from more comments, especially near non-standard function like For instance, you could add comments with the equivalent python expressions: |
|
@TomDLT Thank you for your comment. I added some comments as you suggested. |
|
@TomDLT thank you for the review. I updated the release notes. |
|
Merge when green |

Reference Issues/PRs
Fixes #8003
What does this implement/fix? Explain your changes.
This makes
RadiusNeighborsMixin.radius_neighborshonor then_jobsargument and split the queries among processors. This also makesquery_radiusGIL-free so that it is actually faster than single thread.TODO:
Any other comments?