Make DistributedSampler stateful by ramanishsingh · Pull Request #1315 · meta-pytorch/data

ramanishsingh · 2024-08-21T16:58:34Z

Fixes #1269

Changes

torchdata/stateful_dataloader/sampler.py : Added new classes StatefulDistributedSampler and _StatefulDistributedSamplerIterator
test/stateful_dataloader/test_dataloader.py new tests for StatefulDistributedSampler

andrewkho · 2024-08-22T16:12:19Z

AI Store test can be safely ignored for now

andrewkho

Looks pretty good, but would like to simplify the code a bit and move the tests around as well

test/stateful_dataloader/test_dataloader.py

andrewkho · 2024-08-22T16:19:11Z

test/stateful_dataloader/test_dataloader.py

                    ls[i].append(next(its[i]))
            self.assertEqual(ls[0], ls[1])

+    def test_initialization_StatefulDistributedSampler(self):


Let's move all of these tests out to a new file called test_sampler.py. You can update https://github.com/pytorch/data/blob/main/.github/workflows/stateful_dataloader_ci.yml to call it in an additional step

Created here: https://github.com/pytorch/data/blob/stateful_distributedsampler/test/stateful_dataloader/test_sampler.py

Added new line here: https://github.com/pytorch/data/blob/cdc5d318bab9432383e808dd25489677dd36edf9/.github/workflows/stateful_dataloader_ci.yml#L81

andrewkho · 2024-08-22T16:20:16Z

test/stateful_dataloader/test_dataloader.py

+        from torchdata.stateful_dataloader.sampler import StatefulDistributedSampler
+
+        dataset = self.dataset
+        sampler = StatefulDistributedSampler(dataset, num_replicas=10, rank=0, shuffle=False, seed=42, drop_last=False)


For testing state_dict, let's have most of the tests set up with passing sampler + dataset to StatefulDataLoader so we can test that it works end-to-end

You might need to use a dummy Collate function to easily inspect elements, check the test_state_dict.py file for examples

New tests here: https://github.com/pytorch/data/blob/cdc5d318bab9432383e808dd25489677dd36edf9/test/stateful_dataloader/test_sampler.py#L173

andrewkho · 2024-08-22T16:22:55Z

torchdata/stateful_dataloader/sampler.py

+        self.next_yielded = None
+
+    def __iter__(self):
+


Is it possible to fork the DistributedSampler.__iter__ code here instead and just update, instead of having a separate Iterator class?

https://github.com/pytorch/data/blob/cdc5d318bab9432383e808dd25489677dd36edf9/torchdata/stateful_dataloader/sampler.py#L149

andrewkho · 2024-08-22T16:25:18Z

torchdata/stateful_dataloader/sampler.py

+        if self.sampler.shuffle:
+            # deterministically shuffle based on epoch and seed
+            g = torch.Generator()
+            g.manual_seed(self.sampler.seed + self.sampler.epoch)
+            indices = torch.randperm(len(self.sampler.dataset), generator=g).tolist()  # type: ignore[arg-type]
+        else:
+            indices = list(range(len(self.sampler.dataset)))  # type: ignore[arg-type]
+
+        if not self.sampler.drop_last:
+            # add extra samples to make it evenly divisible
+            padding_size = self.sampler.total_size - len(indices)
+            if padding_size <= len(indices):
+                indices += indices[:padding_size]
+            else:
+                indices += (indices * math.ceil(padding_size / len(indices)))[:padding_size]
+        else:
+            # remove tail of data to make it evenly divisible.
+            indices = indices[: self.sampler.total_size]
+        assert len(indices) == self.sampler.total_size
+
+        # subsample
+        indices = indices[self.sampler.rank : self.sampler.total_size : self.sampler.num_replicas]
+        assert len(indices) == self.sampler.num_samples
+
+        self.parent_iterator = iter(indices)
+        self.indices = list(self.parent_iterator)
+        self.current_index = 0


Is there a way to call the original code instead of forking it here?

andrewkho · 2024-08-22T16:26:55Z

torchdata/stateful_dataloader/sampler.py

+    def state_dict(self) -> Dict[str, Any]:
+        return self.sampler.state_dict()
+
+    def load_state_dict(self, state_dict: Dict[str, Any]) -> None:
+        self.sampler.load_state_dict(state_dict)


I don't think we need this both here and in the main sampler class, can we consolidate to have this in just one place?

andrewkho

Couple of suggestions, but looks great! very nice test suite.

When you're done making changes, please run the fbcode CI for media_dataloader

test/stateful_dataloader/test_sampler.py

torchdata/stateful_dataloader/sampler.py

Co-authored-by: Andrew Ho <andrewkh@meta.com>

facebook-github-bot · 2024-08-25T17:24:06Z

@ramanishsingh has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-26T16:27:12Z

@ramanishsingh has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

andrewkho

LGTM!

facebook-github-bot · 2024-09-06T19:25:25Z

This pull request was exported from Phabricator. Differential Revision: D61772177

ramanishsingh added 4 commits August 20, 2024 19:07

make distributed sampler stateful

b0a7963

run black

2e33f9c

add tests

6c6ef78

run black

b6c71d8

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 21, 2024

ramanishsingh requested review from andrewkho and gokulavasan August 21, 2024 16:59

ramanishsingh changed the title ~~Make DistributedSampling stateful~~ Make DistributedSampler stateful Aug 21, 2024

remove unncessary import

2e431ef

ramanishsingh removed request for andrewkho and gokulavasan August 21, 2024 17:07

meta-pytorch deleted a comment from pytorch-bot bot Aug 21, 2024

ramanishsingh added 3 commits August 21, 2024 10:09

run precommit

306431e

import math in sampler.py

fe08bfc

define methods in _StatefulDistributedSamplerIterator

0dce976

ramanishsingh requested review from andrewkho and gokulavasan August 21, 2024 19:16

ramanishsingh self-assigned this Aug 21, 2024

andrewkho suggested changes Aug 22, 2024

View reviewed changes

ramanishsingh added 3 commits August 22, 2024 11:02

remove unnecessary repetition of methods

fb0a187

run precommit

4d014e5

add tests

cdc5d31

ramanishsingh requested a review from andrewkho August 22, 2024 21:33

andrewkho reviewed Aug 24, 2024

View reviewed changes

test/stateful_dataloader/test_sampler.py Outdated Show resolved Hide resolved

torchdata/stateful_dataloader/sampler.py Outdated Show resolved Hide resolved

ramanishsingh and others added 4 commits August 25, 2024 07:33

updated dataloader types in tests

1851799

Update torchdata/stateful_dataloader/sampler.py

352c59f

Co-authored-by: Andrew Ho <andrewkh@meta.com>

add itertools import

a13679c

update tests

d93d3c4

remove unnecessary imports

c024661

andrewkho approved these changes Aug 28, 2024

View reviewed changes

ramanishsingh merged commit 8b6e903 into main Aug 28, 2024

facebook-github-bot added the fb-exported label Sep 6, 2024

Conversation

ramanishsingh commented Aug 21, 2024

Changes

Uh oh!

andrewkho commented Aug 22, 2024

Uh oh!

andrewkho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewkho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Aug 25, 2024

Uh oh!

facebook-github-bot commented Aug 26, 2024

Uh oh!

andrewkho left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants