Skip reconciliate_processes if used within a cluster environment that creates processes externally#9389
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9389 +/- ##
=======================================
- Coverage 93% 89% -4%
=======================================
Files 180 180
Lines 14965 15065 +100
=======================================
- Hits 13904 13370 -534
- Misses 1061 1695 +634 |
|
@tchaton @awaelchli I'm not sure why this test is failing:
are special tests run with |
|
Hey @ananthsub. Interesting. Mind taking |
|
ah @tchaton @awaelchli - I believe it's due to the check in the LightningEnvironment here: https://github.com/PyTorchLightning/pytorch-lightning/blob/b294c5760eee30a995fcd400127be209b12c4d7c/pytorch_lightning/plugins/environments/lightning_environment.py#L43-L49
So while This was added in #7480 |
You are right, I should have captured the value like you do in this PR |
|
Hey @ananthsub, Do you see blockers to introduce an env variable to enable Lightning to kill the processes even with a cluster env. Something like I believe this would be useful for Grid.ai. Best, |
@tchaton @awaelchli could you take another look to confirm the opt-in deadlock detection looks reasonable? |
… creates processes externally (#9389) * [RFC] Skip reconciliate_processes if used within a cluster environment that creates processes externally
… creates processes externally (#9389) * [RFC] Skip reconciliate_processes if used within a cluster environment that creates processes externally
… creates processes externally (#9389) * [RFC] Skip reconciliate_processes if used within a cluster environment that creates processes externally
What does this PR do?
Fixes #9388
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃