Set shuffle to DataPipes with set_shuffle API#83741
Set shuffle to DataPipes with set_shuffle API#83741ejguan wants to merge 3 commits intopytorch:masterfrom
Conversation
🔗 Helpful links
✅ No Failures (0 Pending)As of commit b3f237dc6b (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
torch/utils/data/graph_settings.py
Outdated
There was a problem hiding this comment.
Anyone has a better name? like set_random_seed or reset_random_seed?
5d7fc82 to
d24d4ae
Compare
7c939c1 to
b3f237d
Compare
b3f237d to
d676a0b
Compare
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/83741
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit aa54fc2: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
NivekT
left a comment
There was a problem hiding this comment.
LGTM! I think we should find a place (probably in the doc) to communicate with users about these randomness methods (if we expect users to interact with them) and what the requirements are for custom DataPipes (e.g. having a method named set_seed) and etc.
torch/utils/data/graph_settings.py
Outdated
There was a problem hiding this comment.
If I'm reading _is_shuffle_datapipe correctly, it requires both set_shuffle and set_seed to exist right?
| to each `DataPipe` that has an API of ``set_shuffle``. | |
| to each `DataPipe` that has an API of ``set_shuffle`` and ``set_seed``. |
I will probably do it when we have |
07b15a2 to
aa54fc2
Compare
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@pytorchbot merge -g |
|
@pytorchbot successfully started a merge job. Check the current status here. |
Merge failedReason: 1 additional jobs have failed, first few of them are: build Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
|
@pytorchbot successfully started a merge job. Check the current status here. |
This PR requires PR is landed: #83202 ## changes - For `apply_shuffle_setting` and `apply_shuffle_seed`, it makes sure it will apply shuffle setting to each of DataPipe that contains a method called `set_shuffle` or `set_seed`. - Change the API from `apply_shuffle_seed` to `apply_random_seed`. - Fix a bug that `apply_shuffle_seed` only accepts DataPipe that is hashable. After the PR, this function uses `id` to prevent seeding the same DataPipe multiple times per epoch. - Fix another bug from `shuffler` that `reset` with `_enable=False` would also reset `_seed`. Pull Request resolved: #83741 Approved by: https://github.com/NivekT
This PR requires PR is landed: #83202
changes
apply_shuffle_settingandapply_shuffle_seed, it makes sure it will apply shuffle setting to each of DataPipe that contains a method calledset_shuffleorset_seed.apply_shuffle_seedtoapply_random_seed.apply_shuffle_seedonly accepts DataPipe that is hashable. After the PR, this function usesidto prevent seeding the same DataPipe multiple times per epoch.shufflerthatresetwith_enable=Falsewould also reset_seed.