Avoid wrapping LightningModule in DDP plugins when not fitting#9096
Avoid wrapping LightningModule in DDP plugins when not fitting#9096ananthsub merged 2 commits intoLightning-AI:masterfrom
Conversation
ananthsub
left a comment
There was a problem hiding this comment.
Thanks for carrying this forward!
870790e to
1a2245a
Compare
Codecov Report
@@ Coverage Diff @@
## master #9096 +/- ##
=======================================
- Coverage 92% 88% -4%
=======================================
Files 176 176
Lines 14810 14817 +7
=======================================
- Hits 13663 13050 -613
- Misses 1147 1767 +620 |
|
@four4fish isn't this a subset of changes in #8632? |
@awaelchli no, I have commandeered the #8632, as Ning is focus on data loader changes ~~ |
|
@four4fish are we closing #8632 then? the contents are virtually identical. at least we should mark one of them as draft |
@awaelchli Yeah! Will close #8632. Sorry I should have marked it as commandeer PR, thanks :)! |
ananthsub
left a comment
There was a problem hiding this comment.
i don't think we need to pass wrap_model into configure_ddp purely for the debug message. i'd rather it be that configure_ddp only wraps the module. duplicating the debug-line while keeping the configure_ddp interace simple doesn't seem like a bad option to me here
Head branch was pushed to by a user without write access
|
@four4fish - the tests are failing because the DeepSpeed plugin extends the DDP plugin. However, for DeepSpeed, we should always wrap the model, and not override the training/validation/test/predict steps (ie. DeepSpeed should continue calling This is another motivation for not relying on inheritance between these plugins, as both the wrapping and checkpoint behavior differ. |
Head branch was pushed to by a user without write access
| - Fixed `DDP` "CUDA error: initialization error" due to a `copy` instead of `deepcopy` on `ResultCollection` ([#9239](https://github.com/PyTorchLightning/pytorch-lightning/pull/9239)) | ||
|
|
||
|
|
||
| - Fixed wrapping issue: avoid wrapping LightningModule with data-parallel modules when not fitting in `DDPPlugin`, `DDPSpawnPlugin`, `DDPShardedPlugin`, `DDPSpawnShardedPlugin` ([#9096]https://github.com/PyTorchLightning/pytorch-lightning/pull/9096) |
There was a problem hiding this comment.
The link is broken
Can be fixed with another PR - no need to open one just for this
What does this PR do?
Pull changes from #8632
Fixes #6977
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃