using foreach to reduce kernel by 000Justin000 · Pull Request #4904 · facebookresearch/fairseq

000Justin000 · 2022-12-12T20:08:53Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
No
Did you read the contributor guideline?
Yes
Did you make sure to update the docs?
Yes
Did you write any new necessary tests?
No

What does this PR do?

The existing Adam optimizer used by speech models is not efficient as it has many small operators (as shown in the below picture). As the speech team does not use latest PyTorch version Adam optimizer with the multi_tensor (foreach fusion) support, this diff is to support multi_tensor Adam implementation (by fusing many smaller operators into fewer number of operators via PyTorch foreach APIs) for the customized adam_sam optimizer used by the speech team.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

cbalioglu · 2022-12-14T16:57:57Z

-                p.grad.data.mul_(c)
+                params_with_grad.append(p.grad.data)
+        # foreach reduces gpu kernel launch
+        torch._foreach_mul_(params_with_grad, c)


Are we safe with _foreach_mul_ here? As far as I know it is not part of PyTorch's public API.

cbalioglu

Per our offline discussion with Junteng, approving it.

* fix imports referencing moved metrics.py file * Make representation computation branchless in TransformerEncoderBase (#4818) Summary: We want to make the computation branchless here because fairseq code may be exported and traced for deployment purposes, and tracing mechanisms can break the correctness for a captured program if it's dependent on input data. In this diff we try to rewrite the code to remove one branch so that tracer can proceed here and preserve the correct semantics of the model. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: * Fix Torchscript typing in transformer_encoder.py (#4847) * Add Generative Spoken Dialogue Language Modeling (#4879) * Update deprecated torch.qr in glow.py example (#4685) torch.qr is deprecated for a long time and is being removed by pytorch/pytorch#70989. This PR makes the example compatible with new and old PyTorch versions. * Emotion Conversion Paper Open Source (#4895) * data2vec v2.0 (#4903) data2v2c 2.0 Co-authored-by: Arun Babu <arbabu@fb.com> Co-authored-by: Wei-Ning Hsu <wnhsu@csail.mit.edu> * remove missing config entries when loading task from checkpoint (#4905) * make apex optional (#4906) * Add file to generate manifests for stop dataset. (#4891) * Update STOP dataset README to include proper link. (#4892) * Update README.md (#4893) * using foreach to reduce kernel (#4904) * using foreach to reduce kernel * set reproducibility to looser threshold * revert optimzer * update * update * update * update * update * update * update Co-authored-by: juntengjia <juntengjia@fb.com> * Update README.md to add data2vec blog post (#4913) * Update README.md * Update config to fix circleci failure (#4949) https://app.circleci.com/pipelines/github/fairinternal/fairseq-py/12635/workflows/3befbae2-79c4-458d-9fc4-aad4484183b4/jobs/26767 * Generative Spoken Dialogue Language Modeling Paper Open Source (#4957) * wav2vec2_laser (#4968) * ASR BLEU tool copied from ust branch into main (#4914) * Add transcript option for asr-bleu (#4981) --------- Co-authored-by: zhxchen17 <zhxchen17@outlook.com> Co-authored-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Nguyen Tu Anh <nguyentuanh208@gmail.com> Co-authored-by: Sergii Dymchenko <kit1980@gmail.com> Co-authored-by: Felix Kreuk <felixkreuk@gmail.com> Co-authored-by: Alexei Baevski <alexei.b@gmail.com> Co-authored-by: padentomasello <pdtomasello@gmail.com> Co-authored-by: Junteng Jia <juntengjia@hotmail.com> Co-authored-by: juntengjia <juntengjia@fb.com> Co-authored-by: arbabu123 <arbabu@fb.com> Co-authored-by: dianaml0 <82468439+dianaml0@users.noreply.github.com> Co-authored-by: Pierre Andrews <mortimer@fb.com> Co-authored-by: Ilia Kulikov <kulikov@cs.nyu.edu> Co-authored-by: Xutai Ma <xutaima@gmail.com>

* using foreach to reduce kernel * set reproducibility to looser threshold * revert optimzer * update * update * update * update * update * update * update Co-authored-by: juntengjia <juntengjia@fb.com>

* fix imports referencing moved metrics.py file * Make representation computation branchless in TransformerEncoderBase (facebookresearch#4818) Summary: We want to make the computation branchless here because fairseq code may be exported and traced for deployment purposes, and tracing mechanisms can break the correctness for a captured program if it's dependent on input data. In this diff we try to rewrite the code to remove one branch so that tracer can proceed here and preserve the correct semantics of the model. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: * Fix Torchscript typing in transformer_encoder.py (facebookresearch#4847) * Add Generative Spoken Dialogue Language Modeling (facebookresearch#4879) * Update deprecated torch.qr in glow.py example (facebookresearch#4685) torch.qr is deprecated for a long time and is being removed by pytorch/pytorch#70989. This PR makes the example compatible with new and old PyTorch versions. * Emotion Conversion Paper Open Source (facebookresearch#4895) * data2vec v2.0 (facebookresearch#4903) data2v2c 2.0 Co-authored-by: Arun Babu <arbabu@fb.com> Co-authored-by: Wei-Ning Hsu <wnhsu@csail.mit.edu> * remove missing config entries when loading task from checkpoint (facebookresearch#4905) * make apex optional (facebookresearch#4906) * Add file to generate manifests for stop dataset. (facebookresearch#4891) * Update STOP dataset README to include proper link. (facebookresearch#4892) * Update README.md (facebookresearch#4893) * using foreach to reduce kernel (facebookresearch#4904) * using foreach to reduce kernel * set reproducibility to looser threshold * revert optimzer * update * update * update * update * update * update * update Co-authored-by: juntengjia <juntengjia@fb.com> * Update README.md to add data2vec blog post (facebookresearch#4913) * Update README.md * Update config to fix circleci failure (facebookresearch#4949) https://app.circleci.com/pipelines/github/fairinternal/fairseq-py/12635/workflows/3befbae2-79c4-458d-9fc4-aad4484183b4/jobs/26767 * Generative Spoken Dialogue Language Modeling Paper Open Source (facebookresearch#4957) * wav2vec2_laser (facebookresearch#4968) * ASR BLEU tool copied from ust branch into main (facebookresearch#4914) * Add transcript option for asr-bleu (facebookresearch#4981) --------- Co-authored-by: zhxchen17 <zhxchen17@outlook.com> Co-authored-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Nguyen Tu Anh <nguyentuanh208@gmail.com> Co-authored-by: Sergii Dymchenko <kit1980@gmail.com> Co-authored-by: Felix Kreuk <felixkreuk@gmail.com> Co-authored-by: Alexei Baevski <alexei.b@gmail.com> Co-authored-by: padentomasello <pdtomasello@gmail.com> Co-authored-by: Junteng Jia <juntengjia@hotmail.com> Co-authored-by: juntengjia <juntengjia@fb.com> Co-authored-by: arbabu123 <arbabu@fb.com> Co-authored-by: dianaml0 <82468439+dianaml0@users.noreply.github.com> Co-authored-by: Pierre Andrews <mortimer@fb.com> Co-authored-by: Ilia Kulikov <kulikov@cs.nyu.edu> Co-authored-by: Xutai Ma <xutaima@gmail.com>

using foreach to reduce kernel

03f2fb4

000Justin000 requested a review from alexeib December 12, 2022 20:08

facebook-github-bot added the CLA Signed label Dec 12, 2022

000Justin000 requested a review from dianaml0 December 12, 2022 20:15

dianaml0 requested a review from cbalioglu December 12, 2022 20:30

juntengjia added 2 commits December 12, 2022 18:46

set reproducibility to looser threshold

46a137d

revert optimzer

0577780

cbalioglu reviewed Dec 14, 2022

View reviewed changes

000Justin000 and others added 8 commits December 14, 2022 14:20

Merge branch 'main' into foreach

5816620

update

cab679a

update

120fc26

update

fbfc077

update

7241ba3

update

d5717d3

update

77ed208

update

534ff85

cbalioglu approved these changes Dec 15, 2022

View reviewed changes

000Justin000 merged commit 3c1abb5 into main Dec 17, 2022

cbalioglu deleted the foreach branch December 20, 2022 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using foreach to reduce kernel#4904

using foreach to reduce kernel#4904
000Justin000 merged 11 commits intomainfrom
foreach

000Justin000 commented Dec 12, 2022 •

edited

Loading

Uh oh!

cbalioglu Dec 14, 2022

Uh oh!

cbalioglu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

000Justin000 commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

What does this PR do?

PR review

Did you have fun?

Uh oh!

cbalioglu Dec 14, 2022

Choose a reason for hiding this comment

Uh oh!

cbalioglu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

000Justin000 commented Dec 12, 2022 •

edited

Loading