[WIP] Implement backend requirements and preferences#537
[WIP] Implement backend requirements and preferences#537jcrist wants to merge 2 commits intojoblib:masterfrom
Conversation
Adds an initial set of backend preferences and requirements that a call to Parallel can specify. If the active backend meets the requirements and preferences, it will be used over the specified backend in the parallel call. Currently only two requirements/preferences are implemented: - `prefer_processes` Whether to choose a backend that uses multiple processes over one that uses a single process. Note that this is only a preference and not a strict requirement. - `require_shared_memory` Whether shared memory (a single process) is required. If True, backends that use multiple processes are forbidden.
|
I'm not set on the keyword names or interface, but I think the defaults make sense (and are compatible with the current implementation). A few assorted thoughts on the design/implementation:
|
Codecov Report
@@ Coverage Diff @@
## master #537 +/- ##
==========================================
+ Coverage 93.72% 93.72% +<.01%
==========================================
Files 37 37
Lines 4779 4829 +50
==========================================
+ Hits 4479 4526 +47
- Misses 300 303 +3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #537 +/- ##
==========================================
+ Coverage 93.72% 93.76% +0.04%
==========================================
Files 37 37
Lines 4779 4829 +50
==========================================
+ Hits 4479 4528 +49
- Misses 300 301 +1
Continue to review full report at Codecov.
|
|
Ping @GaelVaroquaux for review. |
|
I have started looking at the code, but before I comment on the specific details, here is a though that we've been having with @ogrisel:
What do people think? @ogrisel ? |
| @@ -300,6 +371,14 @@ class Parallel(Logger): | |||
| - finally, you can register backends by calling | |||
| register_parallel_backend. This will allow you to implement | |||
| a backend of your liking. | |||
There was a problem hiding this comment.
I think that we need to revisit the docstring and the arguments to deprecate "backend" and replace it by the mechanism of backend specification that you implemented and the default backend.
What do people think?
Not that I'm an expert in Joblib usability, but this change makes sense to me. I think it's useful to stress that this is "prefers" and not "requires" to algorithm authors. |
ogrisel
left a comment
There was a problem hiding this comment.
I think I like the prefer={None, 'threads', 'processes'} API as well. It makes it more future-proof (will not change if we change the default backend of joblib).
| process backends (e.g. threading). This is useful if the function | ||
| being applied doesn't release the global interpretter lock (GIL). | ||
| require_shared_memory : bool, optional | ||
| If true, a backend supporting shared memory (e.g. a single process) is |
There was a problem hiding this comment.
i.e. a single process with thread-based workers
TomAugspurger
left a comment
There was a problem hiding this comment.
Is there anything else design or code-wise that needs to be done to move this forward? (aside from a rebase to fix the merge conflict)
Also, cc @stephen-hoover you might find this interesting. This short summary is that this PR would allow user-level code to (safely) override which parallel backend is used inside scikit-learn. I don't think this would directly affect how civis is using joblib, but I may be wrong.
| for backend, n_jobs in backend_and_n_jobs: | ||
| if getattr(backend, 'uses_processes', None) is False: | ||
| return backend, n_jobs | ||
| raise ValueError("Unable to satisfy shared memory requirement " |
There was a problem hiding this comment.
Looks like this case isn't hit by the test suite.
This is a silly example, but does hit the code:
choose_backend_and_n_jobs('multiprocessing', 5,
require_shared_memory=True, prefer_processes=False)|
Thanks for tagging me, @TomAugspurger ! I had to read through all of the code changes to figure out exactly what was going on -- it looks to me like this change will require that custom backends add a Any chance you could add a |
Hmm, is https://github.com/joblib/joblib/pull/537/files#diff-12c1bd53fbe4c0df54e8edee36f2d2e9R335 the line in question? If so, changing that to |
|
Yes, that's the line I was looking at. I really don't know what most custom backends look like, so it's hard for me to say what the assumption should be. Defaulting to |
|
I gave another shot at this (#595) with different semantics in case of constraints violation. See the assertions in the tests for details. |
|
Closing in favor of #602. |
Adds an initial set of backend preferences and requirements that a call to Parallel can specify. If the active backend meets the requirements and preferences, it will be used over the specified backend in the parallel call.
Currently only two requirements/preferences are implemented:
prefer_processesWhether to choose a backend that uses multiple processes over one that uses a single process. Note that this is only a preference and not a strict requirement.
require_shared_memoryWhether shared memory (a single process) is required. If True, backends that use multiple processes are forbidden.
Supersedes #524.