Skip to content

[DataLoader] Replacing traverse function with traverse_datapipes#85667

Closed
NivekT wants to merge 4 commits intogh/nivekt/54/basefrom
gh/nivekt/54/head
Closed

[DataLoader] Replacing traverse function with traverse_datapipes#85667
NivekT wants to merge 4 commits intogh/nivekt/54/basefrom
gh/nivekt/54/head

Conversation

@NivekT
Copy link
Contributor

@NivekT NivekT commented Sep 26, 2022

Stack from ghstack:

This PR deprecates traverse function and replaces it with traverse_datapipes instead.

While use DataLoader, I realized that it is raising FutureWarning even though I am not explicitly using traverse. What is happening is that DataLoader invokes traverse(dp, only_datapipe=True), and the usage of the keyword causes the only_datapipe warning to be raised.

/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13.
  warnings.warn(msg, FutureWarning)

A few things we'd like to do:

  1. Deprecate the key word arg only_datapipe
  2. Change the default behavior from only_datapipe=False to only_datapipe=True in the future
  3. Do not raise a warning when users are using the function correctly

This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call traverse(dp) without only_datapipe):

  • they cannot do so because the default behavior of traverse hasn't changed yet, so they must use only_datapipe=True
  • if they use only_datapipe=True, eventually the kwarg will go away and cause a runtime error; they also get a FutureWarning in the present

IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.

If this looks right, I will send a follow up PR in TorchData.

Differential Revision: D39832183

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 26, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85667

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 1 Pending

As of commit 5f5449b:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Sep 26, 2022
@NivekT NivekT requested a review from ejguan September 26, 2022 21:53
@NivekT NivekT added the topic: deprecation topic category label Sep 26, 2022
@NivekT NivekT changed the title [DataLoader] Replacing traverse function with traverse_datapipes [DataLoader] Replacing traverse function with traverse_datapipes Sep 26, 2022
…datapipes`"


This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead. While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes `only_datapipe` the warning to be raise.

What we would like to do:
1. Deprecate the key word arg `only_datapipe`
2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future
3. Not raise a warning when users are using the function correctly

This creates a paradox where the users want to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`), but:
    - they cannot do so because the default behavior hasn't changed yet, so they must use `only_datapipe=True`
    - if they use `only_datapipe=True`, eventually the kwarg will go away and it will become a runtime error and they also get a warning in the present

IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.


[ghstack-poisoned]
@NivekT
Copy link
Contributor Author

NivekT commented Sep 26, 2022

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan
Copy link
Contributor

ejguan commented Sep 27, 2022

Thank you for adding this PR! I was debating with myself between your approach or doing a BC-breaking change on traverse function in #85654.
Since your approach won't cause any BC-breaking change, it might be better. And, in order to align the name with other graph functions in https://github.com/pytorch/data/blob/main/torchdata/dataloader2/graph.py, do you think traverse_dp would be a better name?

…datapipes`"


This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead.

While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised.

```
/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13.
  warnings.warn(msg, FutureWarning)
```

A few things we'd like to do:
1. Deprecate the key word arg `only_datapipe`
2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future
3. Do not raise a warning when users are using the function correctly

This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`):
  - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True`
  - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present

IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.

If this looks right, I will send a follow up PR in `TorchData`.

Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183)

[ghstack-poisoned]
@NivekT
Copy link
Contributor Author

NivekT commented Sep 27, 2022

I updated the name to traverse_dps to match find_dps in the file that you linked. Thanks for pointing that out.

@NivekT
Copy link
Contributor Author

NivekT commented Sep 27, 2022

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, LGTM. Do you want to open corresponding PRs for both TorchData and TorchVision?

…datapipes`"


This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead.

While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised.

```
/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13.
  warnings.warn(msg, FutureWarning)
```

A few things we'd like to do:
1. Deprecate the key word arg `only_datapipe`
2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future
3. Do not raise a warning when users are using the function correctly

This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`):
  - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True`
  - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present

IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.

If this looks right, I will send a follow up PR in `TorchData`.

Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183)

[ghstack-poisoned]
@NivekT
Copy link
Contributor Author

NivekT commented Sep 27, 2022

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@NivekT
Copy link
Contributor Author

NivekT commented Sep 27, 2022

Thank you, LGTM. Do you want to open corresponding PRs for both TorchData and TorchVision?

Will do shortly.

NivekT added a commit to pytorch/vision that referenced this pull request Sep 27, 2022
…ps`"


CI is expected to fail for now. This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal.



[ghstack-poisoned]
pmeier added a commit to pytorch/vision that referenced this pull request Sep 28, 2022
* Replace torch.utils.data.graph.traverse with traverse_dps

[ghstack-poisoned]

* Update on "Replace `torch.utils.data.graph.traverse` with `traverse_dps`"


CI is expected to fail for now. This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal.



[ghstack-poisoned]

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
drisspg pushed a commit to drisspg/pytorch that referenced this pull request Sep 29, 2022
…ytorch#85667)

This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead.

While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised.

```
/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13.
  warnings.warn(msg, FutureWarning)
```

A few things we'd like to do:
1. Deprecate the key word arg `only_datapipe`
2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future
3. Do not raise a warning when users are using the function correctly

This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`):
  - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True`
  - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present

IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.

If this looks right, I will send a follow up PR in `TorchData`.

Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183)
Pull Request resolved: pytorch#85667
Approved by: https://github.com/ejguan
NivekT added a commit to meta-pytorch/data that referenced this pull request Sep 29, 2022
This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal.

Differential Revision: [D39860692](https://our.internmc.facebook.com/intern/diff/D39860692)

[ghstack-poisoned]
facebook-github-bot pushed a commit to pytorch/vision that referenced this pull request Sep 29, 2022
#6657)

Summary:
* Replace torch.utils.data.graph.traverse with traverse_dps

[ghstack-poisoned]

* Update on "Replace `torch.utils.data.graph.traverse` with `traverse_dps`"

CI is expected to fail for now. This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal.

[ghstack-poisoned]

Reviewed By: YosuaMichael

Differential Revision: D39885426

fbshipit-source-id: 3079b5ad4343bb3a4b2e514461472aced8fb6998

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
@facebook-github-bot facebook-github-bot deleted the gh/nivekt/54/head branch October 1, 2022 14:19
facebook-github-bot pushed a commit to meta-pytorch/data that referenced this pull request Oct 3, 2022
Summary:
Pull Request resolved: #793

This should be merged only after pytorch/pytorch#85667 has been merged into nightly and internal.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D39860692

Pulled By: NivekT

fbshipit-source-id: 4fd992acfa39d0877575d739c50949fe9d52056f
mehtanirav pushed a commit that referenced this pull request Oct 4, 2022
…85667)

This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead.

While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised.

```
/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13.
  warnings.warn(msg, FutureWarning)
```

A few things we'd like to do:
1. Deprecate the key word arg `only_datapipe`
2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future
3. Do not raise a warning when users are using the function correctly

This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`):
  - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True`
  - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present

IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.

If this looks right, I will send a follow up PR in `TorchData`.

Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183)
Pull Request resolved: #85667
Approved by: https://github.com/ejguan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants