Skip to content

[MRG] Nested backend context manager#538

Merged
ogrisel merged 5 commits intojoblib:masterfrom
ogrisel:nested-backend-context-manager
Jan 23, 2018
Merged

[MRG] Nested backend context manager#538
ogrisel merged 5 commits intojoblib:masterfrom
ogrisel:nested-backend-context-manager

Conversation

@ogrisel
Copy link
Copy Markdown
Contributor

@ogrisel ogrisel commented Jul 31, 2017

While reviewing #516 (loky integration) and #537 (backend negotiation based on code hints / semantic constraints), I realized that the with parallel_backend(backend_name): idiom should support nesting when possible.

Here is a (currently failing) test to demonstrate what I mean. I will work on an implementation. If we ensure that backend instances should be picklable, that should be quite easy to implement although I am worried about the performance overhead it could induce.

@ogrisel ogrisel force-pushed the nested-backend-context-manager branch from 7b48b85 to 5fa776e Compare September 8, 2017 08:15
@joblib joblib deleted a comment from codecov bot Sep 8, 2017
@ogrisel ogrisel force-pushed the nested-backend-context-manager branch from 5fa776e to f4a0965 Compare September 14, 2017 14:12
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 14, 2017

Codecov Report

Merging #538 into master will increase coverage by 0.25%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #538      +/-   ##
==========================================
+ Coverage   94.78%   95.04%   +0.25%     
==========================================
  Files          38       38              
  Lines        4969     5005      +36     
==========================================
+ Hits         4710     4757      +47     
+ Misses        259      248      -11
Impacted Files Coverage Δ
joblib/parallel.py 98.58% <100%> (+0.37%) ⬆️
joblib/test/test_parallel.py 96.31% <100%> (+0.36%) ⬆️
joblib/_parallel_backends.py 96.92% <100%> (+3.47%) ⬆️
joblib/__init__.py 100% <0%> (ø) ⬆️
joblib/test/test_memory.py 99.36% <0%> (+0.42%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65d0cb2...8602549. Read the comment docs.

@ogrisel ogrisel force-pushed the nested-backend-context-manager branch from e934b17 to 05bae7b Compare September 15, 2017 13:45
@joblib joblib deleted a comment from codecov bot Sep 18, 2017
@ogrisel ogrisel force-pushed the nested-backend-context-manager branch 5 times, most recently from 7d45b07 to ec61f0a Compare September 18, 2017 12:27
@ogrisel ogrisel force-pushed the nested-backend-context-manager branch from ec61f0a to 66b74b6 Compare September 18, 2017 13:42
@ogrisel ogrisel force-pushed the nested-backend-context-manager branch from 66b74b6 to fca1e83 Compare January 22, 2018 10:38
delayed(nested_call, check_pickle=False)(i)
for i in range(5)
)
assert ba.i == 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look like the right number of calls to retrieve, given my Parallel calls?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is one only call to retrieve per call to Parallel.__call__.

TomAugspurger added a commit to TomAugspurger/distributed that referenced this pull request Jan 23, 2018
This implements the new API introduced in
joblib/joblib#538 for to ensure nested parallelism works
correctly.

This breaks API as previously `DaskDistributedBackend` had a `client` attribute
with the `Client` that was used.

Nested parallelism will be workers creating a `DaskDistributedBackend`, with no
'scheduler_host'. To avoid deadlocks, worker threads will secede and rejoin.
TomAugspurger added a commit to TomAugspurger/distributed that referenced this pull request Jan 23, 2018
This implements the new API introduced in
joblib/joblib#538 for to ensure nested parallelism works
correctly.

This breaks API as previously `DaskDistributedBackend` had a `client` attribute
with the `Client` that was used.

Nested parallelism will be workers creating a `DaskDistributedBackend`, with no
'scheduler_host'. To avoid deadlocks, worker threads will secede and rejoin.
TomAugspurger added a commit to TomAugspurger/distributed that referenced this pull request Jan 23, 2018
This implements the new API introduced in
joblib/joblib#538 for to ensure nested parallelism works
correctly.

This breaks API as previously `DaskDistributedBackend` had a `client` attribute
with the `Client` that was used.

Nested parallelism will be workers creating a `DaskDistributedBackend`, with no
'scheduler_host'. To avoid deadlocks, worker threads will secede and rejoin.
TomAugspurger added a commit to TomAugspurger/distributed that referenced this pull request Jan 23, 2018
This implements the new API introduced in
joblib/joblib#538 for to ensure nested parallelism works
correctly.

This breaks API as previously `DaskDistributedBackend` had a `client` attribute
with the `Client` that was used.

Nested parallelism will be workers creating a `DaskDistributedBackend`, with no
'scheduler_host'. To avoid deadlocks, worker threads will secede and rejoin.
TomAugspurger added a commit to TomAugspurger/distributed that referenced this pull request Jan 23, 2018
This implements the new API introduced in
joblib/joblib#538 for to ensure nested parallelism works
correctly.

This breaks API as previously `DaskDistributedBackend` had a `client` attribute
with the `Client` that was used.

Nested parallelism will be workers creating a `DaskDistributedBackend`, with no
'scheduler_host'. To avoid deadlocks, worker threads will secede and rejoin.
Copy link
Copy Markdown
Contributor Author

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

'retrieval_context' this could lead to deadlock, as all the workers
managed by the backend may be "busy" waiting for the nested parallel
calls to finish, but the backend has no free workers to execute those
tasks.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice explanation :)

delayed(nested_call, check_pickle=False)(i)
for i in range(5)
)
assert ba.i == 1
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is one only call to retrieve per call to Parallel.__call__.


with parallel_backend("retrieval") as (ba, _):
Parallel(n_jobs=2)(
delayed(nested_call, check_pickle=False)(i)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove that check_pickle=False now that loky is the default backend and that cloudpickle is used to pickle function definitions.

@ogrisel ogrisel changed the title [WIP] Nested backend context manager [MRG] Nested backend context manager Jan 23, 2018
@ogrisel ogrisel merged commit c019f26 into joblib:master Jan 23, 2018
@ogrisel ogrisel deleted the nested-backend-context-manager branch January 23, 2018 17:21
@ogrisel
Copy link
Copy Markdown
Contributor Author

ogrisel commented Jan 23, 2018

Merged! Thanks @TomAugspurger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants