[MRG+1] Backend hints and shared memory constraints#602
[MRG+1] Backend hints and shared memory constraints#602ogrisel merged 5 commits intojoblib:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #602 +/- ##
==========================================
+ Coverage 94.83% 94.93% +0.09%
==========================================
Files 39 39
Lines 5287 5389 +102
==========================================
+ Hits 5014 5116 +102
Misses 273 273
Continue to review full report at Codecov.
|
b63c1d4 to
79d9558
Compare
|
Quick results from training a
Classification performance:
https://github.com/tomaugspurger/joblib-distributed-benchmark. I need to re-run the multiply-nested CV example with the correct patches applied. |
|
Thanks. This confirms that threading is optimal for RF dominated workloads when run on a single machine. The individual tasks in this benchmark are probably too short to be efficiently run on a distributed cluster (compared to the amount of computing resource used (~2x speedup for 8x more machines). The data is probably transferred unnecessarily repeatedly many times with the current state of the joblib distributed connector. Running the same benchmark with the dask-ml implementation of cross-validation and random search should make it possible to confirm this. A |
Indeed, I forgot to scatter the data in the outer |
jcrist
left a comment
There was a problem hiding this comment.
Apologies for the delay in review. Overall this looks fine to me. I left a few points that intersect with design issues I mentioned here: #537 (comment).
One rough edge on the backend selection code as currently implemented (and in this PR) is that n_jobs is coupled to the backend, and can be specified in a number of different ways:
- Keyword to
parallel_backend - Keyword to
Parallel - Global default value
The defaults for these differ - parallel_backend defaults to -1 while Parallel defaults to 1. This makes it tricky in either case to determine if n_jobs was explicitly set or using the defaults. Both of these default values should maybe be changed to None to make detecting of explicit setting easier. The default semantics (either -1 or 1) should also probably be unified.
I believe the behavior should be:
- If
n_jobsis explicitly set inParallel, it should be respected even if set inparallel_backend - If
n_jobsis not explicitly set inParallel, thenn_jobsfromparallel_backendshould be used. However, if the backend fromparallel_backendis ignored due to requirements, then then_jobsfromparallel_backendshould also be ignored.
More discussion of this was given in #52 (the first take at allowing overrides).
joblib/parallel.py
Outdated
| "as the latter does not provide shared memory semantics." | ||
| % (sharedmem_backend.__class__.__name__, | ||
| backend.__class__.__name__)) | ||
| return sharedmem_backend, n_jobs |
There was a problem hiding this comment.
When falling back on the backend as it doesn't satisfy the constraints, I think you should also fall back to the n_jobs default instead of the one from the contextmanager. An all-or-nothing approach seems the easiest to reason about to me.
| # fallback to the default thead-based backend. | ||
| sharedmem_backend = BACKENDS[DEFAULT_THREAD_BACKEND]() | ||
| if verbose >= 10: | ||
| print("Using %s as joblib.Parallel backend instead of %s " |
There was a problem hiding this comment.
This might be a project idiom, but I'd prefer a warning in this case rather than print if verbose >= 10. Warnings can be silenced as needed, but if I wrote code to use a certain backend and that backend is being ignored I'd like to know even if I have verbose set to 0.
There was a problem hiding this comment.
Actually, I don't want to issue a warning in this case because the user cannot do anything to "fix" the cause of the problem: take the example fo random forests in scikit-learn: the current implementation relies on shared memory semantics in the prediction loop but not in the main fit loop. If the users is calling RF in a nested cross-val loop (which can itself benefit from parallelism). If the user uses a context manager to use the dask-distributed backend to parallelize as much as possible on all the levels, the prediction part should stay sequential (the default behavior) while all other parallel calls will benefit from dask (which would be more than enough to saturate all the cores in typical uses cases). There is no point in issuing a noisy warning that the user will not understand without understanding the details of the inner code of scikit-learn.
I have changed the verbosity of the parallel call itself to make it explicit which backend is used for each individual call however.
joblib/_parallel_backends.py
Outdated
| """ | ||
|
|
||
| supports_timeout = True | ||
| use_threads = True |
There was a problem hiding this comment.
Slight preference for uses_threads instead. use_threads feels a bit off grammatically, especially given other notations like supports_timeout.
|
Those are very good points.
|
c0d1342 to
037aee4
Compare
Uses the new prefer / require keywords from joblib/joblib#602. This allows users to control how jobs are parallelized in more situations. For example, training a RandomForest on a cluster of machines with the dask backend. Closes scikit-learn#8804
Uses the new prefer / require keywords from joblib/joblib#602. This allows users to control how jobs are parallelized in more situations. For example, training a RandomForest on a cluster of machines with the dask backend. Closes scikit-learn#8804
Sorry, I deleted my remote branch by accident and it closed #595. Here is the same PR again.
This is an alternative implementation of #537. I reimplemented it from scratch because joblib had diverged a bit and I did not agree with the semantics of constraints violations in #537.
TODO:
RandomForestClassifieron dask-distributed cluster and check that fit run trees in parallel (done with this branch of sklearn: scikit-learn/scikit-learn@master...ogrisel:joblib-backend-hints);prefer='threads'andrequire='sharedmem'or keep the currently implemented boolean flags for hinting and hard constraints;