BUG Passes global configuration when spawning joblib jobs#17634
BUG Passes global configuration when spawning joblib jobs#17634adrinjalali merged 12 commits intoscikit-learn:masterfrom
Conversation
|
Thanks for detecting this and the issue at joblib! Let's see if there is a way to fix it on the joblib side (or maybe our config mechanism needs to be revisited). Using a parallel wrapper means that external users of joblib + scikit-learn would still run into this issue.. |
|
@rth do you have a proposal for a revised config handling? |
|
I have a "fun" proposal here: joblib/joblib#1071 (comment) (Global config + doing things in parallel = fun) |
|
We could just use os.environ to store the config? |
|
Storing them in the environment may not work: joblib/joblib#1064 |
|
Doesn't that depend on the backend? It should work as long as all those processes are on the same machine, doesn't it? |
|
Maybe we could pass it as environment and make joblib / loky detect that environment has changed and re-spawn workers accordingly. That might be a bit expensive though. |
The threading backend should have no problem. multiprocessing + fork should work out of the box but it's broken in other ways and not available on windows. loky and dask will need explicit config init. |
|
I think I'm okay with this, but I'd probably move the code to |
jnothman
left a comment
There was a problem hiding this comment.
Apart from not being comfortable with the use of utils.fixes for something other than a backport, this looks great
build_tools/circle/linting.sh
Outdated
| if [[ "$MODIFIED_FILES" == "no_match" ]]; then | ||
| echo "No file outside sklearn/externals and doc/sphinxext has been modified" | ||
| else | ||
| # else |
Should we use |
|
I think I can come to accept fixes :)
|
adrinjalali
left a comment
There was a problem hiding this comment.
Apart from not being comfortable with the use of utils.fixes for something other than a backport, this looks great
This needs to be fixed in joblib upstream, and then it'll be removed, from that perspective, fixes.py is the right place for it I think.
Otherwise this LGTM. I'm happy to have it as a temporary workaround.
sklearn/utils/fixes.py
Outdated
| return arr[fancy_index] | ||
|
|
||
|
|
||
| def delayed(function, *args, **kwargs): |
There was a problem hiding this comment.
maybe add a comment like
remove when https://github.com/joblib/joblib/issues/1071 is fixed
?
|
This looks reasonable but would one among @lesteve @ogrisel or @tomMoral be also be able to review this, following discussion in joblib/joblib#1071? |
|
Could you please also check that the vendored version of |
|
I looked at: from tags 0.11 to 0.16 and the function does the same thing. After reading its implementation, we can use a simplified version in our def delayed(function):
"""Decorator used to capture the arguments of a function."""
@functools.wraps(function)
def delayed_function(*args, **kwargs):
return _FuncWrapper(function), args, kwargs
return delayed_functionThis way we do not need to depend on a vendored version. |
|
Hi @scikit-learn/core-devs, two approvals here: is something left? Or is it good to merge? |
|
Since the last commit and hence the CI runs were from almost 3 weeks ago, I'm happy to merge once we do another merge with the current master. Would you mind updating @thomasjpfan ? I feel like we agreed to having this in the meantime. |
|
Merged with master and it still passes. |
1 similar comment
|
Merged with master and it still passes. |
|
woohoo 🥳 let's do this then 👍 |
…rn#17634) * WIP * BUG Passes context to joblib jobs * BUG Fix * BUG Fix * CLN Moves delayed into fixes * CLN Adds linter * REV Revert diff * DOC Adds comment for removal * Use a simple implementation
On master the global configuration flag would not be passed joblib jobs with a multiprocess backend:
XREF: joblib/joblib#1071