Skip to content

[DSD] Add unittest to verify HSDP1 + broadcast_from_rank0#128755

Closed
fegin wants to merge 3 commits intogh/fegin/255/basefrom
gh/fegin/255/head
Closed

[DSD] Add unittest to verify HSDP1 + broadcast_from_rank0#128755
fegin wants to merge 3 commits intogh/fegin/255/basefrom
gh/fegin/255/head

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Jun 14, 2024

Stack from ghstack (oldest at bottom):

HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.

This test relies on the fix from #128446.

Differential Revision: D58621436

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @LucasLLC @MeetVadakkanchery @mhorowitz

HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.

Differential Revision: [D58621436](https://our.internmc.facebook.com/intern/diff/D58621436/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128755

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 5 Unrelated Failures

As of commit ab39621 with merge base 73ba432 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58621436

fegin added a commit that referenced this pull request Jun 14, 2024
HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.

Differential Revision: [D58621436](https://our.internmc.facebook.com/intern/diff/D58621436/)

ghstack-source-id: 230387124
Pull Request resolved: #128755
@fegin fegin added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Jun 14, 2024
@fegin fegin requested review from LucasLLC, awgu and wz337 June 14, 2024 23:28
HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.

This test relies on the fix from #128446.

Differential Revision: [D58621436](https://our.internmc.facebook.com/intern/diff/D58621436/)

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k LucasLLC MeetVadakkanchery mhorowitz

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58621436

Copy link
Contributor

@wz337 wz337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.

This test relies on the fix from #128446.

Differential Revision: [D58621436](https://our.internmc.facebook.com/intern/diff/D58621436/)

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k LucasLLC MeetVadakkanchery mhorowitz

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58621436

fegin added a commit that referenced this pull request Jun 17, 2024
Pull Request resolved: #128755

HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.
ghstack-source-id: 230543540
@exported-using-ghexport

Differential Revision: [D58621436](https://our.internmc.facebook.com/intern/diff/D58621436/)
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

fegin added a commit that referenced this pull request Jun 21, 2024
HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.

This test relies on the fix from #128446.

Differential Revision: [D58621436](https://our.internmc.facebook.com/intern/diff/D58621436/)

Pull Request resolved: #128755
Approved by: https://github.com/Skylion007, https://github.com/wz337
ghstack dependencies: #128685

(cherry picked from commit fe8558b)
atalman pushed a commit that referenced this pull request Jun 26, 2024
…129255)

HSDP1 + broadcast_from_rank0 actually behaves differently from FSDP1 + broadcast_from_rank0. So we need an unittest to cover this use case.

This test relies on the fix from #128446.

Differential Revision: [D58621436](https://our.internmc.facebook.com/intern/diff/D58621436/)

Pull Request resolved: #128755
Approved by: https://github.com/Skylion007, https://github.com/wz337
ghstack dependencies: #128685

(cherry picked from commit fe8558b)
@github-actions github-actions bot deleted the gh/fegin/255/head branch July 19, 2024 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants