Skip to content

Set shuffle to DataPipes with set_shuffle API#83741

Closed
ejguan wants to merge 3 commits intopytorch:masterfrom
ejguan:apply_randomness
Closed

Set shuffle to DataPipes with set_shuffle API#83741
ejguan wants to merge 3 commits intopytorch:masterfrom
ejguan:apply_randomness

Conversation

@ejguan
Copy link
Contributor

@ejguan ejguan commented Aug 19, 2022

This PR requires PR is landed: #83202

changes

  • For apply_shuffle_setting and apply_shuffle_seed, it makes sure it will apply shuffle setting to each of DataPipe that contains a method called set_shuffle or set_seed.
  • Change the API from apply_shuffle_seed to apply_random_seed.
  • Fix a bug that apply_shuffle_seed only accepts DataPipe that is hashable. After the PR, this function uses id to prevent seeding the same DataPipe multiple times per epoch.
  • Fix another bug from shuffler that reset with _enable=False would also reset _seed.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 19, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit b3f237dc6b (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone has a better name? like set_random_seed or reset_random_seed?

@ejguan ejguan added release notes: dataloader release notes category topic: improvements topic category labels Aug 19, 2022
@ejguan ejguan marked this pull request as draft August 22, 2022 17:08
@ejguan ejguan force-pushed the apply_randomness branch 2 times, most recently from 5d7fc82 to d24d4ae Compare August 22, 2022 19:21
@ejguan ejguan changed the title Set randomness to all DataPipes with set_seed API Set shuffle to DataPipes with set_shuffle API Aug 22, 2022
@ejguan ejguan force-pushed the apply_randomness branch 4 times, most recently from 7c939c1 to b3f237d Compare August 24, 2022 19:44
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 9, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/83741

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit aa54fc2:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan marked this pull request as ready for review September 9, 2022 16:17
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think we should find a place (probably in the doc) to communicate with users about these randomness methods (if we expect users to interact with them) and what the requirements are for custom DataPipes (e.g. having a method named set_seed) and etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading _is_shuffle_datapipe correctly, it requires both set_shuffle and set_seed to exist right?

Suggested change
to each `DataPipe` that has an API of ``set_shuffle``.
to each `DataPipe` that has an API of ``set_shuffle`` and ``set_seed``.

@ejguan
Copy link
Contributor Author

ejguan commented Sep 12, 2022

I think we should find a place (probably in the doc) to communicate with users about these randomness methods (if we expect users to interact with them) and what the requirements are for custom DataPipes (e.g. having a method named set_seed) and etc.

I will probably do it when we have DataLoader2 with the random control enabled.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan
Copy link
Contributor Author

ejguan commented Sep 12, 2022

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 additional jobs have failed, first few of them are: build

Details for Dev Infra team Raised by workflow job

@ejguan
Copy link
Contributor Author

ejguan commented Sep 13, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

mehtanirav pushed a commit that referenced this pull request Oct 4, 2022
This PR requires PR is landed: #83202

## changes
- For `apply_shuffle_setting` and `apply_shuffle_seed`, it makes sure it will apply shuffle setting to each of DataPipe that contains a method called `set_shuffle` or `set_seed`.
- Change the API from `apply_shuffle_seed` to `apply_random_seed`.
- Fix a bug that `apply_shuffle_seed` only accepts DataPipe that is hashable. After the PR, this function uses `id` to prevent seeding the same DataPipe multiple times per epoch.
- Fix another bug from `shuffler` that `reset` with `_enable=False` would also reset `_seed`.
Pull Request resolved: #83741
Approved by: https://github.com/NivekT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants