DataPipe naming convension update#51262
Closed
glaringlee wants to merge 3 commits intogh/glaringlee/40/basefrom
Closed
DataPipe naming convension update#51262glaringlee wants to merge 3 commits intogh/glaringlee/40/basefrom
glaringlee wants to merge 3 commits intogh/glaringlee/40/basefrom
Conversation
[ghstack-poisoned]
Contributor
💊 CI failures summary and remediationsAs of commit be10ffb (more details on the Dr. CI page):
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
ejguan
approved these changes
Jan 28, 2021
Contributor
ejguan
left a comment
There was a problem hiding this comment.
LGTM and one comment below.
| @@ -0,0 +1 @@ | |||
| from torch.utils.data.datapipes.iter import * | |||
Contributor
There was a problem hiding this comment.
Just considering future map datapipe with same name, we should keep the API only within iter.
This PR is to change the naming convention of what previously called 'dataset'. Instead, we named them DataPipe. This PR is specifically for ListDirFilesIterableDataset and LoadFilesFromDiskIterableDataset. And we provide the following way to import them. 1. partial import as a `datapipes` module ``` import torch.utils.data.datapipes as dp dp.iter.ListDirFiles dp.iter.LoadFilesFromDisk ``` 2. direct import the DataPipe class ``` from torch.utils.data.datapipes import ListDirFiles, LoadFilesFromDisk ``` This PR also added the support for recursively scanning the folders. Next step will be Tar/Zip/Gz dataset -> datapipe Differential Revision: [D26120628](https://our.internmc.facebook.com/intern/diff/D26120628) [ghstack-poisoned]
This PR is to change the naming convention of what previously called 'dataset'. Instead, we named them DataPipe. This PR is specifically for ListDirFilesIterableDataset and LoadFilesFromDiskIterableDataset. And we provide the following way to import them. 1. partial import as a `datapipes` module ``` import torch.utils.data.datapipes as dp dp.iter.ListDirFiles dp.iter.LoadFilesFromDisk ``` 2. direct import the DataPipe class ``` from torch.utils.data.datapipes import ListDirFiles, LoadFilesFromDisk ``` This PR also added the support for recursively scanning the folders. Next step will be Tar/Zip/Gz dataset -> datapipe Differential Revision: [D26120628](https://our.internmc.facebook.com/intern/diff/D26120628) [ghstack-poisoned]
Contributor
|
@glaringlee merged this pull request in 5ed0ad4. |
facebook-github-bot
pushed a commit
that referenced
this pull request
Feb 24, 2021
Summary: The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future. Tagging the following people to ask what to do to fix these `EXE002` warnings: - #50629 authored by jaglinux, approved by rohan-varma - `test/distributed/test_c10d.py` - #51262 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/__init__.py` - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py` - `torch/utils/data/datapipes/iter/listdirfiles.py` - `torch/utils/data/datapipes/iter/__init__.py` - `torch/utils/data/datapipes/utils/__init__.py` - `torch/utils/data/datapipes/utils/common.py` - #51398 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromtar.py` - #51599 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromzip.py` - #51704 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/routeddecoder.py` - `torch/utils/data/datapipes/utils/decoder.py` - #51709 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/groupbykey.py` Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang? Pull Request resolved: #52750 Test Plan: The **Lint / flake8-py3** job in GitHub Actions: - [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly - [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings - [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in #52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix Reviewed By: walterddr, janeyx99 Differential Revision: D26637307 Pulled By: samestep fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
walterddr
pushed a commit
to walterddr/pytorch
that referenced
this pull request
Feb 25, 2021
Summary: The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future. Tagging the following people to ask what to do to fix these `EXE002` warnings: - pytorch#50629 authored by jaglinux, approved by rohan-varma - `test/distributed/test_c10d.py` - pytorch#51262 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/__init__.py` - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py` - `torch/utils/data/datapipes/iter/listdirfiles.py` - `torch/utils/data/datapipes/iter/__init__.py` - `torch/utils/data/datapipes/utils/__init__.py` - `torch/utils/data/datapipes/utils/common.py` - pytorch#51398 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromtar.py` - pytorch#51599 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromzip.py` - pytorch#51704 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/routeddecoder.py` - `torch/utils/data/datapipes/utils/decoder.py` - pytorch#51709 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/groupbykey.py` Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang? Pull Request resolved: pytorch#52750 Test Plan: The **Lint / flake8-py3** job in GitHub Actions: - [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly - [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings - [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix Reviewed By: walterddr, janeyx99 Differential Revision: D26637307 Pulled By: samestep fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
malfet
pushed a commit
that referenced
this pull request
Feb 26, 2021
Summary: The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future. Tagging the following people to ask what to do to fix these `EXE002` warnings: - #50629 authored by jaglinux, approved by rohan-varma - `test/distributed/test_c10d.py` - #51262 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/__init__.py` - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py` - `torch/utils/data/datapipes/iter/listdirfiles.py` - `torch/utils/data/datapipes/iter/__init__.py` - `torch/utils/data/datapipes/utils/__init__.py` - `torch/utils/data/datapipes/utils/common.py` - #51398 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromtar.py` - #51599 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromzip.py` - #51704 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/routeddecoder.py` - `torch/utils/data/datapipes/utils/decoder.py` - #51709 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/groupbykey.py` Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang? Pull Request resolved: #52750 Test Plan: The **Lint / flake8-py3** job in GitHub Actions: - [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly - [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings - [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in #52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix Reviewed By: walterddr, janeyx99 Differential Revision: D26637307 Pulled By: samestep fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804 Co-authored-by: Sam Estep <sestep@fb.com>
aocsa
pushed a commit
to Quansight/pytorch
that referenced
this pull request
Mar 15, 2021
Summary: The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future. Tagging the following people to ask what to do to fix these `EXE002` warnings: - pytorch#50629 authored by jaglinux, approved by rohan-varma - `test/distributed/test_c10d.py` - pytorch#51262 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/__init__.py` - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py` - `torch/utils/data/datapipes/iter/listdirfiles.py` - `torch/utils/data/datapipes/iter/__init__.py` - `torch/utils/data/datapipes/utils/__init__.py` - `torch/utils/data/datapipes/utils/common.py` - pytorch#51398 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromtar.py` - pytorch#51599 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromzip.py` - pytorch#51704 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/routeddecoder.py` - `torch/utils/data/datapipes/utils/decoder.py` - pytorch#51709 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/groupbykey.py` Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang? Pull Request resolved: pytorch#52750 Test Plan: The **Lint / flake8-py3** job in GitHub Actions: - [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly - [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings - [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix Reviewed By: walterddr, janeyx99 Differential Revision: D26637307 Pulled By: samestep fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
xsacha
pushed a commit
to xsacha/pytorch
that referenced
this pull request
Mar 31, 2021
Summary: The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future. Tagging the following people to ask what to do to fix these `EXE002` warnings: - pytorch#50629 authored by jaglinux, approved by rohan-varma - `test/distributed/test_c10d.py` - pytorch#51262 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/__init__.py` - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py` - `torch/utils/data/datapipes/iter/listdirfiles.py` - `torch/utils/data/datapipes/iter/__init__.py` - `torch/utils/data/datapipes/utils/__init__.py` - `torch/utils/data/datapipes/utils/common.py` - pytorch#51398 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromtar.py` - pytorch#51599 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromzip.py` - pytorch#51704 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/routeddecoder.py` - `torch/utils/data/datapipes/utils/decoder.py` - pytorch#51709 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/groupbykey.py` Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang? Pull Request resolved: pytorch#52750 Test Plan: The **Lint / flake8-py3** job in GitHub Actions: - [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly - [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings - [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix Reviewed By: walterddr, janeyx99 Differential Revision: D26637307 Pulled By: samestep fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
laurentdupin
pushed a commit
to laurentdupin/pytorch
that referenced
this pull request
Apr 24, 2026
Summary: Pull Request resolved: pytorch#51262 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26120628 Pulled By: glaringlee fbshipit-source-id: 6855a0dd6d4a93ff93adce1039960ffd7057a827
laurentdupin
pushed a commit
to laurentdupin/pytorch
that referenced
this pull request
Apr 24, 2026
Summary: The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future. Tagging the following people to ask what to do to fix these `EXE002` warnings: - pytorch#50629 authored by jaglinux, approved by rohan-varma - `test/distributed/test_c10d.py` - pytorch#51262 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/__init__.py` - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py` - `torch/utils/data/datapipes/iter/listdirfiles.py` - `torch/utils/data/datapipes/iter/__init__.py` - `torch/utils/data/datapipes/utils/__init__.py` - `torch/utils/data/datapipes/utils/common.py` - pytorch#51398 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromtar.py` - pytorch#51599 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromzip.py` - pytorch#51704 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/routeddecoder.py` - `torch/utils/data/datapipes/utils/decoder.py` - pytorch#51709 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/groupbykey.py` Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang? Pull Request resolved: pytorch#52750 Test Plan: The **Lint / flake8-py3** job in GitHub Actions: - [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly - [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings - [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix Reviewed By: walterddr, janeyx99 Differential Revision: D26637307 Pulled By: samestep fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
This PR is to change the naming convention of what previously called 'dataset'. Instead, we named them DataPipe.
This PR is specifically for ListDirFilesIterableDataset and LoadFilesFromDiskIterableDataset.
And we provide the following way to import them.
datapipesmoduleThis PR also added the support for recursively scanning the folders.
Next step will be Tar/Zip/Gz dataset -> datapipe
Differential Revision: D26120628