Skip to content

[DSD] Add unittest to verify HSDP1 + broadcast_from_rank0 (#128755)#129255

Merged
atalman merged 1 commit intorelease/2.4from
chienchin/cherry-pr-128755
Jun 26, 2024
Merged

[DSD] Add unittest to verify HSDP1 + broadcast_from_rank0 (#128755)#129255
atalman merged 1 commit intorelease/2.4from
chienchin/cherry-pr-128755

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Jun 21, 2024

HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.

This test relies on the fix from #128446.

Differential Revision: D58621436

Pull Request resolved: #128755
Approved by: https://github.com/Skylion007, https://github.com/wz337
ghstack dependencies: #128685

(cherry picked from commit fe8558b)

Fixes #ISSUE_NUMBER

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @LucasLLC @MeetVadakkanchery @mhorowitz

HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.

This test relies on the fix from #128446.

Differential Revision: [D58621436](https://our.internmc.facebook.com/intern/diff/D58621436/)

Pull Request resolved: #128755
Approved by: https://github.com/Skylion007, https://github.com/wz337
ghstack dependencies: #128685

(cherry picked from commit fe8558b)
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129255

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 77266b6 with merge base b66e3f0 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: distributed_checkpoint oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category labels Jun 21, 2024
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 26, 2024
@atalman atalman merged commit 491e9e2 into release/2.4 Jun 26, 2024
@github-actions github-actions bot deleted the chienchin/cherry-pr-128755 branch July 27, 2024 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants