[DeviceMesh] Enable mesh universe concept in mesh comparison#165680
[DeviceMesh] Enable mesh universe concept in mesh comparison#165680fduwjj wants to merge 3 commits intogh/fduwjj/226/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165680
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New FailuresAs of commit cdacfcb with merge base 61d9a51 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
lw
left a comment
There was a problem hiding this comment.
Yes please!!! So happy we can finally do this! Go!!
Since we now use the same _rank_map everywhere, we can just use it to differentiate between different mesh universe and get rid of root mesh comparison. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
Since we now use the same _rank_map everywhere, we can just use it to differentiate between different mesh universe and get rid of root mesh comparison. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
…hat we don't need to compare root mesh (#166003) Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. Pull Request resolved: #166003 Approved by: https://github.com/Skylion007, https://github.com/fegin
…hat we don't need to compare root mesh (#166003) Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. Pull Request resolved: #166003 Approved by: https://github.com/Skylion007, https://github.com/fegin Internal: << DO NOT EDIT BELOW THIS LINE >> **GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee) **GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch) **GitHub Pull Request**: [#166003](#166003) Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/ This was imported as part of a Diff Train. Please review this as soon as possible. Since it is a direct copy of a commit on GitHub, there shouldn't be much to do. Below line forces Sandcastle to run only specified contbuilds. @build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore] #skipfbcodelongtail #disable_code_coverage @pytorch-oss-diff-train diff-train-source-id: 8625ffb Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/) ghstack-source-id: 318594805
…ace _flatten_mesh_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…h_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…ace _flatten_mesh_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…h_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…hat we don't need to compare root mesh (#166003) Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. Pull Request resolved: #166003 Approved by: https://github.com/Skylion007, https://github.com/fegin Internal: << DO NOT EDIT BELOW THIS LINE >> **GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee) **GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch) **GitHub Pull Request**: [#166003](#166003) Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/ This was imported as part of a Diff Train. Please review this as soon as possible. Since it is a direct copy of a commit on GitHub, there shouldn't be much to do. Below line forces Sandcastle to run only specified contbuilds. @build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore] #skipfbcodelongtail #disable_code_coverage @pytorch-oss-diff-train diff-train-source-id: 8625ffb ghstack-source-id: 318681631 Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
…hat we don't need to compare root mesh (#166003) Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. Pull Request resolved: #166003 Approved by: https://github.com/Skylion007, https://github.com/fegin Internal: << DO NOT EDIT BELOW THIS LINE >> **GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee) **GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch) **GitHub Pull Request**: [#166003](#166003) Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/ This was imported as part of a Diff Train. Please review this as soon as possible. Since it is a direct copy of a commit on GitHub, there shouldn't be much to do. Below line forces Sandcastle to run only specified contbuilds. @build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore] #skipfbcodelongtail #disable_code_coverage @pytorch-oss-diff-train diff-train-source-id: 8625ffb ghstack-source-id: 79a788c Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
…ace _flatten_mesh_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…h_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…ace _flatten_mesh_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…h_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…hat we don't need to compare root mesh (#166003) Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. Pull Request resolved: #166003 Approved by: https://github.com/Skylion007, https://github.com/fegin Internal: << DO NOT EDIT BELOW THIS LINE >> **GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee) **GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch) **GitHub Pull Request**: [#166003](#166003) Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/ This was imported as part of a Diff Train. Please review this as soon as possible. Since it is a direct copy of a commit on GitHub, there shouldn't be much to do. Below line forces Sandcastle to run only specified contbuilds. @build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore] #skipfbcodelongtail #disable_code_coverage @pytorch-oss-diff-train diff-train-source-id: 8625ffb ghstack-source-id: 318735710 Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
…hat we don't need to compare root mesh (#166003) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
|
instead of having an ID based way, we first merged |
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
Since we now use the same _rank_map everywhere, we can just use it to differentiate between different mesh universe and get rid of root mesh comparison.
cc @H-Huang @awgu @wanchaol @fegin @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci