[DCP] Fixes the stateless optimizer issue of distributed state_dict#135535
[DCP] Fixes the stateless optimizer issue of distributed state_dict#135535fegin wants to merge 5 commits intogh/fegin/288/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135535
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 5 Unrelated FailuresAs of commit 87dc47f with merge base 9b76449 ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| { | ||
| "use_composable": [True, False], | ||
| "optimizer_class": [torch.optim.Adam, torch.optim.AdamW], | ||
| "optimizer_class": [torch.optim.SGD], |
There was a problem hiding this comment.
Do we want to add torch.optim.SGD to the original list instead?
|
@pytorchbot mege |
|
❌ 🤖 pytorchbot command failed: Try |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -f "The failing test is not related." |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ytorch#135535) Some optimizers don't have states that can cause get_state_dict/set_state_dict behave incorrectly. This PR fixes the issues. fixes: pytorch#133415 Pull Request resolved: pytorch#135535 Approved by: https://github.com/wz337
…ytorch#135535) Some optimizers don't have states that can cause get_state_dict/set_state_dict behave incorrectly. This PR fixes the issues. fixes: pytorch#133415 Pull Request resolved: pytorch#135535 Approved by: https://github.com/wz337
#136000) [DCP] Fixes the stateless optimizer issue of distributed state_dict (#135535) Some optimizers don't have states that can cause get_state_dict/set_state_dict behave incorrectly. This PR fixes the issues. fixes: #133415 Pull Request resolved: #135535 Approved by: https://github.com/wz337 Co-authored-by: Chien-Chin Huang <chienchin@fb.com>
Stack from ghstack (oldest at bottom):
Some optimizers don't have states that can cause get_state_dict/set_state_dict behave incorrectly. This PR fixes the issues.
fixes: #133415
cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @LucasLLC @MeetVadakkanchery @mhorowitz @pradeepfn