Skip to content

[DeviceMesh] Clean up the call into mesh_resouces to get root mesh#165787

Closed
fduwjj wants to merge 4 commits intogh/fduwjj/227/basefrom
gh/fduwjj/227/head
Closed

[DeviceMesh] Clean up the call into mesh_resouces to get root mesh#165787
fduwjj wants to merge 4 commits intogh/fduwjj/227/basefrom
gh/fduwjj/227/head

Conversation

@fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Oct 17, 2025

Stack from ghstack (oldest at bottom):

We moved the method to get root mesh into class in #164510. This is to further clean code up.

cc @H-Huang @awgu @wanchaol @fegin @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

Differential Revision: D85090191

@pytorch-bot pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels Oct 17, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165787

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 8 New Failures, 1 Cancelled Job, 7 Unrelated Failures

As of commit 47a12eb with merge base 61d9a51 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fduwjj added a commit that referenced this pull request Oct 17, 2025
@fduwjj fduwjj added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Oct 17, 2025
@albanD albanD removed their request for review October 17, 2025 20:26
@mikaylagawarecki mikaylagawarecki removed their request for review October 17, 2025 20:28
@fegin
Copy link
Contributor

fegin commented Oct 17, 2025

We should definitely import this PR to the internal code base first. I vaguely remember some internal code base also uses the global variable.

…oot mesh"


We moved the method to get root mesh into class in #164510. This is to further clean code up.


cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Oct 20, 2025
…oot mesh"


We moved the method to get root mesh into class in #164510. This is to further clean code up.


cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Oct 20, 2025
…oot mesh"


We moved the method to get root mesh into class in #164510. This is to further clean code up.


cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Oct 20, 2025
@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 20, 2025

@fduwjj has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 21, 2025

pytorchbot merge -i "failed test is not related, also no internal test failure. RCOM test can not be reproced"

@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 21, 2025

@pytorchbot merge -i "failed test is not related, also no internal test failure. RCOM test can not be reproced"

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 21, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: failed test is not related, also no internal test failure. RCOM test can not be reproced

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

Try @pytorchbot --help for more info.

@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 21, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 16 checks: periodic / linux-jammy-cuda12.8-py3.10-gcc9-debug / test (default, 1, 7, linux.4xlarge.nvidia.gpu, oncall:debug-build), periodic / linux-jammy-cuda12.8-py3.10-gcc9-debug / test (default, 5, 7, linux.4xlarge.nvidia.gpu, oncall:debug-build), periodic / linux-jammy-cuda12.8-py3.10-gcc9-debug / test (default, 4, 7, linux.4xlarge.nvidia.gpu, oncall:debug-build), periodic / linux-jammy-cuda12.8-py3.10-gcc9-debug / test (default, 2, 7, linux.4xlarge.nvidia.gpu, oncall:debug-build), periodic / linux-jammy-cuda12.8-py3.10-gcc9-debug / test (default, 3, 7, linux.4xlarge.nvidia.gpu, oncall:debug-build), periodic / linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, unstable), periodic / linux-jammy-cuda12.8-py3.10-gcc9-debug / test (default, 6, 7, linux.4xlarge.nvidia.gpu, oncall:debug-build), periodic / linux-jammy-cuda12.8-py3.10-gcc9-debug / test (default, 7, 7, linux.4xlarge.nvidia.gpu, oncall:debug-build), periodic / linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 3, 5, linux.g4dn.4xlarge.nvidia.gpu, unstable), periodic / linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 2, 5, linux.g4dn.4xlarge.nvidia.gpu, unstable), periodic / linux-jammy-rocm-py3.10 / test (distributed, 3, 3, linux.rocm.gpu.mi250.4, module:rocm, oncall:distributed), periodic / linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, unstable), periodic / linux-jammy-rocm-py3.10 / test (distributed, 1, 3, linux.rocm.gpu.mi250.4, module:rocm, oncall:distributed), periodic / linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 5, 5, linux.g4dn.4xlarge.nvidia.gpu, unstable), periodic / linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.mi250.4, module:rocm, oncall:distributed), periodic / linux-jammy-cuda13.0-py3.10-gcc11 / test (nogpu_AVX512, 1, 3, linux.4xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
…ytorch#165787)

We moved the method to get root mesh into class in pytorch#164510. This is to further clean code up.

Differential Revision: [D85090191](https://our.internmc.facebook.com/intern/diff/D85090191)
Pull Request resolved: pytorch#165787
Approved by: https://github.com/fegin
zhudada0120 pushed a commit to zhudada0120/pytorch that referenced this pull request Oct 22, 2025
…ytorch#165787)

We moved the method to get root mesh into class in pytorch#164510. This is to further clean code up.

Differential Revision: [D85090191](https://our.internmc.facebook.com/intern/diff/D85090191)
Pull Request resolved: pytorch#165787
Approved by: https://github.com/fegin
@github-actions github-actions bot deleted the gh/fduwjj/227/head branch November 21, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants