Conversation
Bumps [pytorch-lightning](https://github.com/PyTorchLightning/pytorch-lightning) from 1.4.7 to 1.5.2. - [Release notes](https://github.com/PyTorchLightning/pytorch-lightning/releases) - [Changelog](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md) - [Commits](Lightning-AI/pytorch-lightning@1.4.7...1.5.2) --- updated-dependencies: - dependency-name: pytorch-lightning dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
|
Blocked on #71 |
|
Can you update the description with some more context about this? Major changes, things to pay attention to while reviewing, etc. |
…ors into ptl-1.5-support
|
@matthewdeng updated the description. Main thing to review is just a quick check to see if everything looks roughly correct and the comments make sense. For a more thorough review, you probably want to first read through source code of |
amogkam
left a comment
There was a problem hiding this comment.
Thanks for the review! Can you take another look please?
|
|
||
|
|
||
| @pytest.fixture | ||
| def ray_start_cluster_2_node_2_cpu(): |
There was a problem hiding this comment.
Ah that's right I removed the test that was using this. But I'll just leave the fixture in here for future use.
|
|
||
| def set_world_ranks(self, process_idx: int = 0): | ||
| """Set the appropriate rank attributes for the trainer.""" | ||
| if self._is_remote: |
There was a problem hiding this comment.
What happens if this is False? Should an error be raised?
There was a problem hiding this comment.
It should be a no-op. I'll add a comment about this. The reason that this is needed is because DDPSpawnPlugin calls set_world_ranks during __init__ because it assumes in a multi-node setup the correct address and port environment variables have already been set by the user before running the script. But in our case, we only want to call this method after the workers have been initialized.
|
|
||
| # All methods below are only executed in remote Ray workers. | ||
|
|
||
| def set_world_ranks(self, process_idx: int = 0): |
There was a problem hiding this comment.
nit: What's the reason for the default value?
There was a problem hiding this comment.
This is to match the signature of the superclass. If we don't have a default value, then when the superclass calls self.set_world_ranks() it will fail.
…ors into ptl-1.5-support
Various changes to support Pytorch Lightning 1.5.
Major change is the removal of the
RayClusterEnvironment. In PTL 1.5, the plugins leverage the cluster environment more heavily for specifying node rank, local rank, global rank, world size. With the RayPlugin, however, the ranks are not known until after the actors are launched and training begins. Thus, the cluster environment ended up adding more complexity than needed, and this PR removes the cluster environment and handles all of it internally in the plugin.Other changes include supporting various name changes, refactoring, and signature changes in the DDPSpawnPlugin and DDPSpawnShardedPlugin.
Closes #108, #101