[DataPipes] Add group support to the sharding_filter by VitalyFedyunin · Pull Request #88424 · pytorch/pytorch

VitalyFedyunin · 2022-11-03T15:35:03Z

Stack from ghstack (oldest at bottom):

-> [DataPipes] Add group support to the sharding_filter #88424

Differential Revision: D41006747

[ghstack-poisoned]

pytorch-bot · 2022-11-03T15:35:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88424

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4a1181d:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2022-11-03T15:35:09Z

The committers listed above are authorized under a signed CLA.

✅ login: VitalyFedyunin / name: Vitaly Fedyunin (693119f, e0e6d14, 9b774d6, b573c79, 3c625a8, bd07d57, 4a1181d)

[ghstack-poisoned]

ejguan

LGTM with a few nit comments.

ejguan · 2022-11-03T18:12:36Z

torch/utils/data/datapipes/iter/grouping.py

+
+    def __init__(self, source_datapipe: IterDataPipe, sharding_group_filter=None):
        self.source_datapipe = source_datapipe
+        self.sharding_group_filter = sharding_group_filter


Do we need an extra API to set sharding_group_filter?

Based on the implementation, it seems sharding_group_filter is an integer, could we change it set or list to support multiple filters?

I have no use-cases for it, but it will be trivial to change later if we require to.

ejguan · 2022-11-03T18:16:32Z

torch/utils/data/datapipes/iter/grouping.py

+            if self.sharding_group_filter is None:
+                sorted_sharding_groups.append(self.groups[key])
+            else:
+                if key == self.sharding_group_filter:
+                    sorted_sharding_groups.append(self.groups[key])


nit:

if self.sharding_group_filter is None or key == self.sharding_group_filter: sorted_sharding_groups.append(self.groups[key])

ejguan · 2022-11-03T18:20:20Z

torch/utils/data/datapipes/iter/grouping.py

-    def apply_sharding(self, num_of_instances, instance_id):
-        self.num_of_instances = num_of_instances
-        self.instance_id = instance_id
+    def apply_sharding(self, num_of_instances, instance_id, sharding_group=SHARDING_PRIORITIES.DEFAULT):


Super nit: Could we add a validation that instance_id < num_of_instances?

ejguan · 2022-11-03T18:21:43Z

test/test_datapipe.py

+
+        with self.assertRaises(Exception):
+            dp.apply_sharding(2, 1, sharding_group=SHARDING_PRIORITIES.DEFAULT)
+            dp.apply_sharding(5, 3, sharding_group=SHARDING_PRIORITIES.MULTIPROCESSING)


Could we add a separate context of self.assertRaises for the second Error?

[ghstack-poisoned]

VitalyFedyunin · 2022-11-03T23:03:10Z

@VitalyFedyunin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747) [ghstack-poisoned]

ghstack-source-id: 9bf016c Pull Request resolved: #88424

Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747) [ghstack-poisoned]

ghstack-source-id: a6a1a75 Pull Request resolved: #88424

VitalyFedyunin · 2022-11-07T15:33:05Z

@VitalyFedyunin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

VitalyFedyunin · 2022-11-07T15:38:24Z

/easycla

VitalyFedyunin · 2022-11-07T15:50:15Z

/easycla

facebook-github-bot · 2022-11-07T22:04:44Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2022-11-07T22:06:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: After pytorch/pytorch#88424 is landed, we are able to invoke `apply_sharding` by sharding levels (distributed or multiprocessing). Then, we are able to give fine-control on sharding by `ReadingService`. - For `DistributedReadingService`, we will only set sharding on the distributed level - For `PrototypeMPReadingService`, we will set distributed sharding in the main process and set mp sharding in the worker processes. Previously, we set sharding in each worker process based on both distributed and mp information. - `worker_init_fn` doesn't need `DistInfo` anymore. As, the `DataPipe` has been distributed sharded in the main process. - Combine `DistInfo` and `ExtraInfo` for `worker_reset_fn` to synchronize the distributed seeds across distributed workers and set worker-local seeds based on both distributed and mp information. Pull Request resolved: #916 Reviewed By: mingyuzh Differential Revision: D41776719 Pulled By: ejguan fbshipit-source-id: 6042da09f5e83019d536696237028ea20e67d110

Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747) Pull Request resolved: pytorch#88424 Approved by: https://github.com/ejguan

wenleix · 2023-02-04T00:02:30Z

torch/utils/data/graph_settings.py

                        raise RuntimeError('This implementation of sharding can be only applied once per instance of DataPipeline.',
                                           'Already applied to', already_applied_to, 'while trying to apply to', pipe)
-                    pipe.apply_sharding(num_of_instances, instance_id)
+                    pipe.apply_sharding(num_of_instances, instance_id, sharding_group=sharding_group)


noob question: does is_shardable and apply_sharding only exists in ShardingFilterIterDataPipe ?

Also if there is no ShardingFilterIterDataPipe, look like no sharding will happen? (shall we error in that case? ) :)

cc @ejguan

[DataPipes] Add group support to the sharding_filter

693119f

[ghstack-poisoned]

pytorch-bot bot added release notes: dataloader release notes category labels Nov 3, 2022

Update on "[DataPipes] Add group support to the sharding_filter"

e0e6d14

[ghstack-poisoned]

ejguan self-requested a review November 3, 2022 16:46

Update on "[DataPipes] Add group support to the sharding_filter"

9b774d6

[ghstack-poisoned]

Update on "[DataPipes] Add group support to the sharding_filter"

b573c79

[ghstack-poisoned]

ejguan approved these changes Nov 3, 2022

View reviewed changes

Update on "[DataPipes] Add group support to the sharding_filter"

3c625a8

[ghstack-poisoned]

Update on "[DataPipes] Add group support to the sharding_filter"

bd07d57

Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747) [ghstack-poisoned]

VitalyFedyunin added a commit that referenced this pull request Nov 4, 2022

[DataPipes] Add group support to the sharding_filter

207ac2c

ghstack-source-id: 9bf016c Pull Request resolved: #88424

Update on "[DataPipes] Add group support to the sharding_filter"

4a1181d

Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747) [ghstack-poisoned]

VitalyFedyunin added a commit that referenced this pull request Nov 7, 2022

[DataPipes] Add group support to the sharding_filter

8f974b3

ghstack-source-id: a6a1a75 Pull Request resolved: #88424

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2022

pytorchmergebot added the Merged label Nov 7, 2022

pytorchmergebot closed this in 9dadf8f Nov 7, 2022

ejguan mentioned this pull request Dec 6, 2022

Apply sharding based on priority & combine DistInfo and ExtraInfo meta-pytorch/data#916

Closed

wenleix reviewed Feb 4, 2023

View reviewed changes

ejguan mentioned this pull request Mar 17, 2023

(bug)(torch2.0/datapipes) Potentially backwards incompatible change with DataLoader and is_shardable Datapipes #96975

Closed

facebook-github-bot deleted the gh/VitalyFedyunin/116/head branch June 8, 2023 14:57

Conversation

VitalyFedyunin commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88424

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ejguan left a comment

Choose a reason for hiding this comment

Uh oh!

ejguan Nov 3, 2022

Choose a reason for hiding this comment

Uh oh!

VitalyFedyunin Nov 3, 2022

Choose a reason for hiding this comment

Uh oh!

ejguan Nov 3, 2022

Choose a reason for hiding this comment

Uh oh!

ejguan Nov 3, 2022

Choose a reason for hiding this comment

Uh oh!

ejguan Nov 3, 2022

Choose a reason for hiding this comment

Uh oh!

VitalyFedyunin commented Nov 3, 2022

Uh oh!

VitalyFedyunin commented Nov 7, 2022

Uh oh!

VitalyFedyunin commented Nov 7, 2022

Uh oh!

VitalyFedyunin commented Nov 7, 2022

Uh oh!

facebook-github-bot commented Nov 7, 2022

Uh oh!

pytorchmergebot commented Nov 7, 2022

Merge started

Uh oh!

wenleix Feb 4, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

VitalyFedyunin commented Nov 3, 2022 •

edited

Loading

pytorch-bot bot commented Nov 3, 2022 •

edited

Loading

linux-foundation-easycla bot commented Nov 3, 2022 •

edited

Loading