Skip to content

move RoutedDecoder Dataset to DataPipe#51704

Closed
glaringlee wants to merge 4 commits intogh/glaringlee/43/basefrom
gh/glaringlee/43/head
Closed

move RoutedDecoder Dataset to DataPipe#51704
glaringlee wants to merge 4 commits intogh/glaringlee/43/basefrom
gh/glaringlee/43/head

Conversation

@glaringlee
Copy link
Copy Markdown
Contributor

@glaringlee glaringlee commented Feb 4, 2021

Stack from ghstack:

This is to make RoutedDecoder a DataPipe, currently using Nvidia's decode, we will refactor this later. We need this for now to run basic benchmark.

Differential Revision: D26245910

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Feb 4, 2021

💊 CI failures summary and remediations

As of commit ede75c1 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

glaringlee pushed a commit that referenced this pull request Feb 4, 2021
ghstack-source-id: e71cd1b
Pull Request resolved: #51704
Copy link
Copy Markdown
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nit comments below.

Comment on lines +252 to +270
def decode(self, datapipe):
result = {}
# single data tuple(pathname, data stream)
if isinstance(datapipe, tuple):
datapipe = [datapipe]

if datapipe is not None:
for k, v in datapipe:
# TODO: xinyu, figure out why Nvidia do this?
if k[0] == "_":
if isinstance(v, bytes):
v = v.decode("utf-8")
result[k] = v
continue
result[k] = self.decode1(k, v)
return result

def __call__(self, datapipe):
return self.decode(datapipe)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change datapipe to data, this name makes the logic confusing, as the input is actually one element for each iteration rather than the datapipe itself.
Besides, is there any benefit using dict as result?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change datapipe to data.
It is not clear whether dict has benefit at the moment, I am not sure whether we have any case that we need to decode a batch of image at once and need to locate a specific record within a dict. Let's keep this, we can change the code later to meet our needs.

This is to make RoutedDecoder a DataPipe, currently using Nvidia's decode, we will refactor this later. We need this for now to run basic benchmark.

Differential Revision: [D26245910](https://our.internmc.facebook.com/intern/diff/D26245910)

[ghstack-poisoned]
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 5, 2021

Codecov Report

Merging #51704 (446bb2e) into gh/glaringlee/43/base (1aaddd8) will increase coverage by 42.10%.
The diff coverage is 56.20%.

@@                    Coverage Diff                     @@
##           gh/glaringlee/43/base   #51704       +/-   ##
==========================================================
+ Coverage                  38.38%   80.48%   +42.10%     
==========================================================
  Files                        508     1950     +1442     
  Lines                      64094   213208   +149114     
==========================================================
+ Hits                       24600   171598   +146998     
- Misses                     39494    41610     +2116     

lixinyu added 2 commits February 8, 2021 10:19
This is to make RoutedDecoder a DataPipe, currently using Nvidia's decode, we will refactor this later. We need this for now to run basic benchmark.

Differential Revision: [D26245910](https://our.internmc.facebook.com/intern/diff/D26245910)

[ghstack-poisoned]
This is to make RoutedDecoder a DataPipe, currently using Nvidia's decode, we will refactor this later. We need this for now to run basic benchmark.

Differential Revision: [D26245910](https://our.internmc.facebook.com/intern/diff/D26245910)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@glaringlee merged this pull request in 482b94a.

@facebook-github-bot facebook-github-bot deleted the gh/glaringlee/43/head branch February 12, 2021 15:18
facebook-github-bot pushed a commit that referenced this pull request Feb 24, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- #50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- #51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- #51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- #51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- #51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- #51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: #52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in #52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
walterddr pushed a commit to walterddr/pytorch that referenced this pull request Feb 25, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- pytorch#50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- pytorch#51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- pytorch#51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- pytorch#51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- pytorch#51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- pytorch#51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: pytorch#52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
malfet pushed a commit that referenced this pull request Feb 26, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- #50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- #51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- #51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- #51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- #51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- #51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: #52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in #52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804

Co-authored-by: Sam Estep <sestep@fb.com>
aocsa pushed a commit to Quansight/pytorch that referenced this pull request Mar 15, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- pytorch#50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- pytorch#51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- pytorch#51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- pytorch#51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- pytorch#51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- pytorch#51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: pytorch#52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary: Pull Request resolved: pytorch#51704

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26245910

Pulled By: glaringlee

fbshipit-source-id: 91e3c9f8a6c1209c1a1a752ba29a80dbd9bf4119
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- pytorch#50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- pytorch#51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- pytorch#51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- pytorch#51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- pytorch#51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- pytorch#51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: pytorch#52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary: Pull Request resolved: pytorch#51704

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26245910

Pulled By: glaringlee

fbshipit-source-id: 91e3c9f8a6c1209c1a1a752ba29a80dbd9bf4119
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future.

Tagging the following people to ask what to do to fix these `EXE002` warnings:

- pytorch#50629 authored by jaglinux, approved by rohan-varma
  - `test/distributed/test_c10d.py`
- pytorch#51262 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/__init__.py`
  - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py`
  - `torch/utils/data/datapipes/iter/listdirfiles.py`
  - `torch/utils/data/datapipes/iter/__init__.py`
  - `torch/utils/data/datapipes/utils/__init__.py`
  - `torch/utils/data/datapipes/utils/common.py`
- pytorch#51398 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromtar.py`
- pytorch#51599 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/readfilesfromzip.py`
- pytorch#51704 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/routeddecoder.py`
  - `torch/utils/data/datapipes/utils/decoder.py`
- pytorch#51709 authored by glaringlee, approved by ejguan
  - `torch/utils/data/datapipes/iter/groupbykey.py`

Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang?

Pull Request resolved: pytorch#52750

Test Plan:
The **Lint / flake8-py3** job in GitHub Actions:

- [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly
- [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings
- [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in pytorch#52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix

Reviewed By: walterddr, janeyx99

Differential Revision: D26637307

Pulled By: samestep

fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants