[2/N][dtensor] Strided Sharding shard_to_replicate#130239
Closed
XilunWu wants to merge 10 commits intogh/XilunWu/87/basefrom
Closed
[2/N][dtensor] Strided Sharding shard_to_replicate#130239XilunWu wants to merge 10 commits intogh/XilunWu/87/basefrom
XilunWu wants to merge 10 commits intogh/XilunWu/87/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130239
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit bbd3f0a with merge base da32021 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
**Test** `pytest test/distributed/_tensor/test_utils.py -s -k strided_sharding` `pytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsets` cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu fegin wanchaol fduwjj wz337 tianyu-l wconstab chauhang d4l3k [ghstack-poisoned]
**Test** `pytest test/distributed/_tensor/test_utils.py -s -k strided_sharding` `pytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsets` cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu fegin wanchaol fduwjj wz337 tianyu-l wconstab chauhang d4l3k [ghstack-poisoned]
**Test** `pytest test/distributed/_tensor/test_utils.py -s -k strided_sharding` `pytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsets` cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu fegin wanchaol fduwjj wz337 tianyu-l wconstab chauhang d4l3k [ghstack-poisoned]
**Test** `pytest test/distributed/_tensor/test_utils.py -s -k strided_sharding` `pytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsets` cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu fegin wanchaol fduwjj wz337 tianyu-l wconstab chauhang d4l3k [ghstack-poisoned]
francograndegmailcom
pushed a commit
to francograndegmailcom/pytorch-pytorch
that referenced
this pull request
Jul 23, 2024
ghstack-source-id: 3a19915 Pull Request resolved: pytorch/pytorch#130239
**Test** `pytest test/distributed/_tensor/test_utils.py -s -k strided_sharding` `pytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsets` cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu fegin wanchaol fduwjj wz337 tianyu-l wconstab chauhang d4l3k [ghstack-poisoned]
** Summary ** This PR adds the necessary util function to `_StridedShard` for correct shard-to-replicate resharding. **Test** `pytest test/distributed/_tensor/test_utils.py -s -k strided_sharding` `pytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsets` cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu fegin wanchaol fduwjj wz337 tianyu-l wconstab chauhang d4l3k [ghstack-poisoned]
** Summary ** This PR adds the necessary util function to `_StridedShard` for correct shard-to-replicate resharding. **Test** `pytest test/distributed/_tensor/test_utils.py -s -k strided_sharding` `pytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsets` cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu fegin wanchaol fduwjj wz337 tianyu-l wconstab chauhang d4l3k [ghstack-poisoned]
Contributor
Author
|
@XilunWu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
** Summary ** This PR adds the necessary util function to `_StridedShard` for correct shard-to-replicate resharding. **Test** `pytest test/distributed/_tensor/test_utils.py -s -k strided_sharding` `pytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsets` cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse tianyu-l chauhang Differential Revision: [D60606117](https://our.internmc.facebook.com/intern/diff/D60606117) [ghstack-poisoned]
Contributor
Author
|
@XilunWu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
** Summary ** This PR adds the necessary util function to `_StridedShard` for correct shard-to-replicate resharding. **Test** `pytest test/distributed/_tensor/test_utils.py -s -k strided_sharding` `pytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsets` cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse tianyu-l chauhang Differential Revision: [D60606117](https://our.internmc.facebook.com/intern/diff/D60606117) [ghstack-poisoned]
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 7, 2024
**Summary** 1. change `compute_local_shape_and_global_offset` to correctly compute shape and offset for strided sharding placement (currently it only handles 2D and some 3D+ sharding). 2. Add a new property `num_shards_map` to `DTensorSpec` denoting how many shards each tensor dimension has. This is necessary for constructing `_StridedShard` placement when we call `distribute_tensor(dtensor_tp, dp_device_mesh, [Shard(0)])` and the `split_factor` argument will just be the number of shards on that sharding tensor dim. **Test** `test/distributed/_tensor/test_utils.py` Pull Request resolved: #132391 Approved by: https://github.com/wanchaol ghstack dependencies: #126697, #130239
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 7, 2024
**Test** `pytest test/distributed/_composable/fsdp/test_fully_shard_training.py` `pytest test/distributed/_composable/fsdp/test_fully_shard_state_dict.py` `pytest test/distributed/checkpoint/fsdp/test_fsdp_dsd.py` `pytest test/distributed/_composable/fsdp/test_fully_shard_init.py` Pull Request resolved: #131408 Approved by: https://github.com/fegin ghstack dependencies: #126697, #130239, #132391
pytorchmergebot
pushed a commit
that referenced
this pull request
Aug 8, 2024
…rrect full_tensor() result (#130760) Fixes issue #129229 #129206 **Summary** 1. Have `FSDP` choose `_StridedShard` placement for FSDP+TP sharding 2. Added a parity test to FSDP to ensure that FSDP+TP sharding (i.e. strided) and simply TP sharding (i.e. non-strided) has the same `full_tensor()` result 3. Re-enabled the tests that were disabled in #129519 **test** `pytest test/distributed/_composable/fsdp/` `pytest test/distributed/_composable/test_composability/test_2d_composability.py` `pytest test/distributed/checkpoint/fsdp/test_fsdp_dsd.py` Differential Revision: [D60606114](https://our.internmc.facebook.com/intern/diff/D60606114) Pull Request resolved: #130760 Approved by: https://github.com/wanchaol, https://github.com/fegin, https://github.com/wz337 ghstack dependencies: #126697, #130239, #132391, #131408
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
** Summary **
This PR adds the necessary util function to
_StridedShardfor correct shard-to-replicate resharding.Test
pytest test/distributed/_tensor/test_utils.py -s -k strided_shardingpytest test/distributed/_tensor/test_utils.py -s -k test_fsdp2_tp_2d_dtensor_local_shards_and_offsetscc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @tianyu-l @chauhang