[DataPipe] Snapshotting with simple fast-forwarding by NivekT · Pull Request #80250 · pytorch/pytorch

NivekT · 2022-06-24T21:41:30Z

Stack from ghstack:

This mostly completes the poor man's snapshotting implementation (named simple fast forward). This is the most basic version of snapshotting but it should work for all DataPipes. I will be adding more efficient implementation for different types of DataPipes in future PRs.

As of this implementation, the usage will something like:

rng = torch.Generator()
initial_rng_state = rng.get_state()
datapipe: IterDataPipe = ...
# Some usage of the DataPipe, here maybe yielding the first 5 values
n_iter = 5
it = iter(datapipe)
for _ in range(n_iter):
    next(it)serialized_graph = pickle.dumps(datapipe)

# The serialized object has most of the sufficient information for simple fast-forward (except for initial RNG state)
# It can be deserialized at a later point in time or by a different process
deserialized_graph = pickle.loads(serialized_graph)
# I think `DataLoader` should store `initial_rng_state` that can be saved by the API that we later use
rng_for_deserialized = torch.Generator()
rng_for_deserialized.set_state(initial_rng_state)
n_fastforward = deserialized_graph._number_of_samples_yielded
simple_fast_forward_graph(deserialized_graph, n_fastforward, rng=rng_for_deserialized)
# The while DataPipe graph should have the same state as before serialization, such that:
self.assertEqual(list(it), list(deserialized_graph))  # True

If this looks acceptable, I can modify DataLoader2 to remember things like initial_rng_state and to have methods save_snapshot that will return the (serialized graph, initial_rng) and restore_snapshot. This should work for single worker data loading.

In the long term, initial_rng_state may not be necessary if we are able to directly save/restore the buffer and RNG state of Shuffler (that is work in progress). However, initial_rng_state and simple fast-forward is still a good fall-back option for some edge cases where the buffer can't be stored.

[ghstack-poisoned]

facebook-github-bot · 2022-06-24T21:41:36Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/80250
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 7cc63df (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

[ghstack-poisoned]

NivekT · 2022-06-27T21:10:19Z

Squashed with the other PR.

ghstack-source-id: 6d3120b Pull Request resolved: #79479 [DataPipe] Snapshotting with simple fast-forwarding ghstack-source-id: 6d3120b Pull Request resolved: #80250

[DataPipe] Snapshotting with simple fast-forwarding

1edd919

[ghstack-poisoned]

NivekT mentioned this pull request Jun 24, 2022

[DataPipe] Count number of successful yields for IterDataPipe #79657

Closed

facebook-github-bot added the cla signed label Jun 24, 2022

NivekT mentioned this pull request Jun 24, 2022

[DataPipe] Simple graph snapshotting #79479

Closed

This was referenced Jun 24, 2022

[DataPipe] Full graph fast-forwarding #79658

Closed

[DataPipe] Basic snapshotting with IterableWrapper #70373

Closed

[DataPipe] Snapshotting prototype #70216

Closed

Update on "[DataPipe] Snapshotting with simple fast-forwarding"

7cc63df

[ghstack-poisoned]

NivekT added module: data torch.utils.data release notes: dataloader release notes category topic: new features topic category labels Jun 24, 2022

NivekT requested review from VitalyFedyunin and ejguan June 24, 2022 21:57

NivekT closed this Jun 27, 2022

facebook-github-bot deleted the gh/nivekt/54/head branch July 28, 2022 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataPipe] Snapshotting with simple fast-forwarding#80250

[DataPipe] Snapshotting with simple fast-forwarding#80250
NivekT wants to merge 2 commits intogh/nivekt/54/basefrom
gh/nivekt/54/head

NivekT commented Jun 24, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 24, 2022 •

edited

Loading

Uh oh!

NivekT commented Jun 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NivekT commented Jun 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

NivekT commented Jun 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NivekT commented Jun 24, 2022 •

edited

Loading

facebook-github-bot commented Jun 24, 2022 •

edited

Loading