[DataLoader] Replacing traverse function with traverse_datapipes#85667
[DataLoader] Replacing traverse function with traverse_datapipes#85667NivekT wants to merge 4 commits intogh/nivekt/54/basefrom
traverse function with traverse_datapipes#85667Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85667
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 1 PendingAs of commit 5f5449b: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
traverse function with traverse_datapipes
…datapipes`"
This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead. While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes `only_datapipe` the warning to be raise.
What we would like to do:
1. Deprecate the key word arg `only_datapipe`
2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future
3. Not raise a warning when users are using the function correctly
This creates a paradox where the users want to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`), but:
- they cannot do so because the default behavior hasn't changed yet, so they must use `only_datapipe=True`
- if they use `only_datapipe=True`, eventually the kwarg will go away and it will become a runtime error and they also get a warning in the present
IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.
[ghstack-poisoned]
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Thank you for adding this PR! I was debating with myself between your approach or doing a BC-breaking change on |
…datapipes`" This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead. While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised. ``` /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13. warnings.warn(msg, FutureWarning) ``` A few things we'd like to do: 1. Deprecate the key word arg `only_datapipe` 2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future 3. Do not raise a warning when users are using the function correctly This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`): - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True` - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative. If this looks right, I will send a follow up PR in `TorchData`. Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183) [ghstack-poisoned]
|
I updated the name to |
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ejguan
left a comment
There was a problem hiding this comment.
Thank you, LGTM. Do you want to open corresponding PRs for both TorchData and TorchVision?
…datapipes`" This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead. While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised. ``` /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13. warnings.warn(msg, FutureWarning) ``` A few things we'd like to do: 1. Deprecate the key word arg `only_datapipe` 2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future 3. Do not raise a warning when users are using the function correctly This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`): - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True` - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative. If this looks right, I will send a follow up PR in `TorchData`. Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183) [ghstack-poisoned]
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Will do shortly. |
…ps`" CI is expected to fail for now. This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal. [ghstack-poisoned]
* Replace torch.utils.data.graph.traverse with traverse_dps [ghstack-poisoned] * Update on "Replace `torch.utils.data.graph.traverse` with `traverse_dps`" CI is expected to fail for now. This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal. [ghstack-poisoned] Co-authored-by: Philip Meier <github.pmeier@posteo.de>
…ytorch#85667) This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead. While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised. ``` /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13. warnings.warn(msg, FutureWarning) ``` A few things we'd like to do: 1. Deprecate the key word arg `only_datapipe` 2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future 3. Do not raise a warning when users are using the function correctly This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`): - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True` - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative. If this looks right, I will send a follow up PR in `TorchData`. Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183) Pull Request resolved: pytorch#85667 Approved by: https://github.com/ejguan
This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal. Differential Revision: [D39860692](https://our.internmc.facebook.com/intern/diff/D39860692) [ghstack-poisoned]
#6657) Summary: * Replace torch.utils.data.graph.traverse with traverse_dps [ghstack-poisoned] * Update on "Replace `torch.utils.data.graph.traverse` with `traverse_dps`" CI is expected to fail for now. This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal. [ghstack-poisoned] Reviewed By: YosuaMichael Differential Revision: D39885426 fbshipit-source-id: 3079b5ad4343bb3a4b2e514461472aced8fb6998 Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Summary: Pull Request resolved: #793 This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D39860692 Pulled By: NivekT fbshipit-source-id: 4fd992acfa39d0877575d739c50949fe9d52056f
…85667) This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead. While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised. ``` /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13. warnings.warn(msg, FutureWarning) ``` A few things we'd like to do: 1. Deprecate the key word arg `only_datapipe` 2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future 3. Do not raise a warning when users are using the function correctly This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`): - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True` - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative. If this looks right, I will send a follow up PR in `TorchData`. Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183) Pull Request resolved: #85667 Approved by: https://github.com/ejguan
Stack from ghstack:
traversefunction withtraverse_datapipes#85667This PR deprecates
traversefunction and replaces it withtraverse_datapipesinstead.While use
DataLoader, I realized that it is raisingFutureWarningeven though I am not explicitly usingtraverse. What is happening is thatDataLoaderinvokestraverse(dp, only_datapipe=True), and the usage of the keyword causes theonly_datapipewarning to be raised.A few things we'd like to do:
only_datapipeonly_datapipe=Falsetoonly_datapipe=Truein the futureThis creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call
traverse(dp)withoutonly_datapipe):traversehasn't changed yet, so they must useonly_datapipe=Trueonly_datapipe=True, eventually the kwarg will go away and cause a runtime error; they also get aFutureWarningin the presentIIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.
If this looks right, I will send a follow up PR in
TorchData.Differential Revision: D39832183