[WIP] Number of threads in KMeans should not be bigger than number of chunks#17210
Conversation
|
curious, how much of a gain is this? and should it go in 0.23.1? |
It is a regression in #17208. It seems x10 on very small dataset. |
adrinjalali
left a comment
There was a problem hiding this comment.
Then if you've tested and this is fixing the issue, I'm happy. I don't think we can easily write a test for it.
|
It's an attempt to fix 17208, but it's still wip. I can reproduce the slowdown on my laptop and this pr fixes it but it seems to not work for the person who opened the issue. I need to investigate further with him |
|
The fix sounds like a good improvement in any case. Though he has 4 cores, so I would have imagined spawning 8 threads shouldn't be too costly performance wise? |
I agree
In the reproducible snippet, there are only 150 samples, which means there will only be one chunk. On my laptop with 4 cores, it spawns 4 threads and it makes a huge diff. The thing is that it only concerns very small datasets for which the whole fit time is ~ 0.005 sec. So I guess that the overhead of thread creation become non negligible. |
|
Let's merge this as is? As if necessary you could create a new PR with more improvements? Please add a what's new entry. BTW, can this affect other parts of the code that do parallel chunking where we could also apply this fix ? |
Not that I can think about. |
It can't hurt and It's still an improvement |
|
Some profiling showed that it's threadpoolctl that takes 90% of the time on these very small problems. It's called at each iteration. I'll make a pr to move it outside of the loop. |
|
I guess @rth is also happy to have this merged. Merging, hopefully the other PR improving the threadpoolctl overhead would get in quickly too. |
…hunks (scikit-learn#17210) * num threads not bigger than num chunks * what's new
…hunks (scikit-learn#17210) * num threads not bigger than num chunks * what's new
…hunks (#17210) * num threads not bigger than num chunks * what's new
…hunks (scikit-learn#17210) * num threads not bigger than num chunks * what's new
related to #17208
When the number of chunks is smaller than the number of cores (i.e. very small datasets), KMeans launches as many threads as there are cores anyway. It should use n_chunks threads instead.