[DataLoader2] Deep copy DataPipe during initialization#786
[DataLoader2] Deep copy DataPipe during initialization#786NivekT wants to merge 3 commits intogh/NivekT/93/basefrom
Conversation
[ghstack-poisoned]
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after #746 (the other PR will have to rebase). Differential Revision: [D39741743](https://our.internmc.facebook.com/intern/diff/D39741743) [ghstack-poisoned]
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after #746 (the other PR will have to rebase). Differential Revision: [D39741743](https://our.internmc.facebook.com/intern/diff/D39741743) [ghstack-poisoned]
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Pull Request resolved: meta-pytorch#786 Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after meta-pytorch#746 (the other PR will have to rebase). Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D39741743 Pulled By: ejguan fbshipit-source-id: f1cdaca054e10bcb3672857b933732178b370535
Summary: Pull Request resolved: #786 Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after #746 (the other PR will have to rebase). Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D39741743 Pulled By: ejguan fbshipit-source-id: f1cdaca054e10bcb3672857b933732178b370535
Summary: Pull Request resolved: meta-pytorch#786 Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages. This can be merged before or after meta-pytorch#746 (the other PR will have to rebase). Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D39741743 Pulled By: ejguan fbshipit-source-id: f1cdaca054e10bcb3672857b933732178b370535
|
@NivekT Do you think we need to add BC breaking note to this PR? |
Actually yes, I think we should, even though I don't think users would intentionally rely on the state of DataPipes being shared among DataLoaders. |
Can you add a brief example in the bc breaking note that the same DataPipe used by multiple DLv2? |
|
Will do by the afternoon. |
Thank you so much! |
|
@ejguan Updated this PR's description. |
Stack from ghstack:
Deep copy DataPipe during initialization to avoid sharing states if the same DataPipe is being passed to multiple usages.
This can be merged before or after #746 (the other PR will have to rebase).
Differential Revision: D39741743
BC Breaking Note:
Previously, if a DataPipe is being passed to multiple DataLoaders, the DataPipe's state can be altered by any of those DataLoaders. In some cases, that may raise exception due to the single iterator constraint; in other cases, some behaviors can be changed due to the adapters (e.g. shuffling) of another DataLoader.
Consider the code snippet below:
Prior to this change, since the DataPipe objects are identical for each DataLoader2, it would've violated the single iterator constraint and raised an exception.
After this change, since the DataPipe objects are deep copied within the
DataLoader2s, they are no longer the same object within each. You can iterate through them simultaneously without issue: