Skip to content

DataPipe naming convension update#51262

Closed
glaringlee wants to merge 3 commits intogh/glaringlee/40/basefrom
gh/glaringlee/40/head
Closed

DataPipe naming convension update#51262
glaringlee wants to merge 3 commits intogh/glaringlee/40/basefrom
gh/glaringlee/40/head

Conversation

@glaringlee
Copy link
Copy Markdown
Contributor

@glaringlee glaringlee commented Jan 28, 2021

Stack from ghstack:

This PR is to change the naming convention of what previously called 'dataset'. Instead, we named them DataPipe.
This PR is specifically for ListDirFilesIterableDataset and LoadFilesFromDiskIterableDataset.

And we provide the following way to import them.

  1. partial import as a datapipes module
import torch.utils.data.datapipes as dp
dp.iter.ListDirFiles
dp.iter.LoadFilesFromDisk
  1. direct import the DataPipe class
from torch.utils.data.datapipes import ListDirFiles, LoadFilesFromDisk

This PR also added the support for recursively scanning the folders.

Next step will be Tar/Zip/Gz dataset -> datapipe

Differential Revision: D26120628

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Jan 28, 2021

💊 CI failures summary and remediations

As of commit be10ffb (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

glaringlee pushed a commit that referenced this pull request Jan 28, 2021
ghstack-source-id: 1b8f619
Pull Request resolved: #51262
Copy link
Copy Markdown
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and one comment below.

Comment thread torch/utils/data/datapipes/__init__.py Outdated
@@ -0,0 +1 @@
from torch.utils.data.datapipes.iter import *
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just considering future map datapipe with same name, we should keep the API only within iter.

This PR is to change the naming convention of what previously called 'dataset'. Instead, we named them DataPipe. 
This PR is specifically for ListDirFilesIterableDataset and LoadFilesFromDiskIterableDataset.

And we provide the following way to import them.
1. partial import as a `datapipes` module
```
import torch.utils.data.datapipes as dp
dp.iter.ListDirFiles
dp.iter.LoadFilesFromDisk
```
2. direct import the DataPipe class
```
from torch.utils.data.datapipes import ListDirFiles, LoadFilesFromDisk
```

This PR also added the support for recursively scanning the folders.

Next step will be Tar/Zip/Gz dataset -> datapipe

Differential Revision: [D26120628](https://our.internmc.facebook.com/intern/diff/D26120628)

[ghstack-poisoned]
glaringlee pushed a commit that referenced this pull request Jan 28, 2021
ghstack-source-id: 3900ce9
Pull Request resolved: #51262
This PR is to change the naming convention of what previously called 'dataset'. Instead, we named them DataPipe. 
This PR is specifically for ListDirFilesIterableDataset and LoadFilesFromDiskIterableDataset.

And we provide the following way to import them.
1. partial import as a `datapipes` module
```
import torch.utils.data.datapipes as dp
dp.iter.ListDirFiles
dp.iter.LoadFilesFromDisk
```
2. direct import the DataPipe class
```
from torch.utils.data.datapipes import ListDirFiles, LoadFilesFromDisk
```

This PR also added the support for recursively scanning the folders.

Next step will be Tar/Zip/Gz dataset -> datapipe

Differential Revision: [D26120628](https://our.internmc.facebook.com/intern/diff/D26120628)

[ghstack-poisoned]
glaringlee pushed a commit that referenced this pull request Jan 28, 2021
ghstack-source-id: 9282fed
Pull Request resolved: #51262
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@glaringlee merged this pull request in 5ed0ad4.

@facebook-github-bot facebook-github-bot deleted the gh/glaringlee/40/head branch February 1, 2021 15:19
facebook-github-bot pushed a commit that referenced this pull request Feb 24, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- #50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- #51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- #51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- #51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- #51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- #51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: #52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in #52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
walterddr pushed a commit to walterddr/pytorch that referenced this pull request Feb 25, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- pytorch#50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- pytorch#51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- pytorch#51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- pytorch#51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- pytorch#51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- pytorch#51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: pytorch#52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
malfet pushed a commit that referenced this pull request Feb 26, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- #50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- #51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- #51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- #51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- #51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- #51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: #52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in #52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804

Co-authored-by: Sam Estep <sestep@fb.com>
aocsa pushed a commit to Quansight/pytorch that referenced this pull request Mar 15, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- pytorch#50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- pytorch#51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- pytorch#51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- pytorch#51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- pytorch#51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- pytorch#51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: pytorch#52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- pytorch#50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- pytorch#51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- pytorch#51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- pytorch#51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- pytorch#51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- pytorch#51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: pytorch#52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary: Pull Request resolved: pytorch#51262

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26120628

Pulled By: glaringlee

fbshipit-source-id: 6855a0dd6d4a93ff93adce1039960ffd7057a827
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- pytorch#50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- pytorch#51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- pytorch#51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- pytorch#51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- pytorch#51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- pytorch#51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: pytorch#52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants