[MRG] Automatically group short tasks in batches#157
Conversation
|
One of the applications that brought this to attention comes from the parallel OvR metaestimator, where some benchmarks revealed the estimation being much faster without parallelization if the number of classes (and therefore parallel jobs) is very high (4000). The following is with default settings except where specified With old joblib: With this PR: This seems great! |
|
All tests under windows pass: https://ci.appveyor.com/project/ogrisel/joblib/build/1.0.29 I had to push |
|
@GaelVaroquaux this is ready for final review. @larsmans you might also be interested in having a look at this. |
|
@jnothman I think you might also be interested in reviewing this. There is a benchmark script you can use to explore the runtime behavior. |
joblib/parallel.py
Outdated
There was a problem hiding this comment.
Maybe the class name should be changed to 'ImmediateComputeBatch': the 'ImmediateApply' name comes from the 'apply' terminology used in multiprocessing.
The docstring should clearly be changed.
|
I'll have to see if I have time to review, but @vene's summary above is very enticing! |
|
note that the predict is not parallelized. :-) |
|
I think the first batch of comments are addressed in new commits. I plan to squash those prior to merging once the review is over. |
joblib/parallel.py
Outdated
There was a problem hiding this comment.
"a specific worker" isn't quite right here. Perhaps just "to each worker"?
|
@jnothman thanks for the review. I addressed your comments in the last commit. |
|
I haven't reviewed the benchmark or the tests, but the feature itself looks great! |
|
Looks like the memory_profiler profile decorators had a significant overhead in the benchmark scripts. So here are the more accurate timings:
|
|
Thanks for the benchmarks! |
|
BTW, you asked earlier I how to test the dev branch of joblib in scikit-learn without modifying the scikit-learn source code: I put a monkeypatch in a try:
import joblib
from sklearn.externals import joblib as skl_joblib
print('Monkeypatching scikit-learn embedded joblib')
for k, v in vars(joblib).items():
setattr(skl_joblib, k, v)
except ImportError as e:
print("Could not apply joblib monkeypatch: %s" % e) |
joblib/test/test_parallel.py
Outdated
There was a problem hiding this comment.
I guess this comment is a copy and paste of the one in test_batching_auto_threading and should be removed.
joblib/parallel.py
Outdated
There was a problem hiding this comment.
You didn't change jobs to tasks here but you did it below somewhere.
|
@lesteve I pushed a fix for the |
benchmarks/bench_auto_batching.py
Outdated
There was a problem hiding this comment.
by pair I assume you mean even? 🇫🇷
There was a problem hiding this comment.
no I mean a couple, e.g. like in the expression "a pair of gloves": https://www.google.fr/search?q=define:pair
Isn't that idiomatic English?
There was a problem hiding this comment.
I mean "a pair power of cos". The first use of "pair" is perfectly readable to me, but the second struck as odd.
There was a problem hiding this comment.
ah ok, will fix it.
|
Thanks a lot @ogrisel for the work on this, is there anything I can do to help? |
|
Merging, thanks a lot, great stuff. |
[MRG] Automatically group short tasks in batches
|
Yay! On 29 June 2015 at 16:51, Loïc Estève notifications@github.com wrote:
|
|
Great !!! |
|
🍻 |
This is based on an online evaluation of the batch completion time to dynamically tune the size of the batch of tasks to dispatch at once to a worker.
TODO: