Skip to content
This repository was archived by the owner on Mar 20, 2026. It is now read-only.

Make representation computation branchless in TransformerEncoderBase#4818

Merged
alexeib merged 2 commits intofacebookresearch:mainfrom
zhxchen17:main
Nov 2, 2022
Merged

Make representation computation branchless in TransformerEncoderBase#4818
alexeib merged 2 commits intofacebookresearch:mainfrom
zhxchen17:main

Conversation

@zhxchen17
Copy link
Copy Markdown
Contributor

@zhxchen17 zhxchen17 commented Oct 20, 2022

Summary:
We want to make the computation branchless here because fairseq code may be exported and traced for deployment purposes, and tracing mechanisms can break the correctness for a captured program if it's dependent on input data. In this diff we try to rewrite the code to remove one branch so that tracer can proceed here and preserve the correct semantics of the model.

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@zhxchen17
Copy link
Copy Markdown
Contributor Author

cc @suo

@suo
Copy link
Copy Markdown
Contributor

suo commented Oct 20, 2022

@dianaml0 do you mind taking a look at this?

@tugsbayasgalan
Copy link
Copy Markdown

cc @alexeib

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't has_pads a boolean? 'bool' object doesn't have 'type_as' attribute!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_pads is a Tensor object with a bool scalar, which could be converted by type_as method on Tensor type. That's my understanding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, if the device is "xla" for src_token, it would be a bool which is rare but possible.

Let me update the PR to handle that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moslehpour updated. mind looking again?

Copy link
Copy Markdown
Contributor

@moslehpour moslehpour Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. This should work!
Just one last comment, can you confirm if torchscript support type casting with type_as?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moslehpour sure, I just tried that:

@torch.jit.script
def foo(x, y):
    return x.type_as(y)

a = foo(torch.tensor(True), torch.ones(3, 2))
print(a)

and seems Torchscript can give us correct result:

tensor(1.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Thanks for checking it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Thanks for checking it.

Summary:
We want to make the computation branchless here because fairseq code may be
exported and traced for deployment purposes, and tracing mechanisms can
break the correctness for a captured program if it's dependent on input data.
In this diff we try to rewrite the code to remove one branch so that tracer
can proceed here and preserve the correct semantics of the model.

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:
@alexeib alexeib merged commit 59d966a into facebookresearch:main Nov 2, 2022
@alexeib
Copy link
Copy Markdown
Contributor

alexeib commented Nov 2, 2022

thanks!

@alexeib
Copy link
Copy Markdown
Contributor

alexeib commented Nov 2, 2022

the tests are failing now because of this change:

Variable 'has_pads' is annotated with type Tensor but is being assigned to a value of type bool:
File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/site-packages/fairseq/models/transformer/transformer_encoder.py", line 205
# compute padding mask
encoder_padding_mask = src_tokens.eq(self.padding_idx)
has_pads: Tensor = (
~~~~~~~~ <--- HERE
torch.tensor(src_tokens.device.type == "xla") or encoder_padding_mask.any()
)
'TransformerEncoderBase.forward_scriptable' is being compiled since it was called from 'TransformerEncoderBase.forward'
File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/site-packages/fairseq/models/transformer/transformer_encoder.py", line 165
Only populated if return_all_hiddens is True.
"""
return self.forward_scriptable(
~~~~~~~~~~~~~~~~~~~~~~~~
src_tokens, src_lengths, return_all_hiddens, token_embeddings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

@zhxchen17
Copy link
Copy Markdown
Contributor Author

the tests are failing now because of this change:

Variable 'has_pads' is annotated with type Tensor but is being assigned to a value of type bool: File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/site-packages/fairseq/models/transformer/transformer_encoder.py", line 205 # compute padding mask encoder_padding_mask = src_tokens.eq(self.padding_idx) has_pads: Tensor = ( ~~~~~~~~ <--- HERE torch.tensor(src_tokens.device.type == "xla") or encoder_padding_mask.any() ) 'TransformerEncoderBase.forward_scriptable' is being compiled since it was called from 'TransformerEncoderBase.forward' File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/site-packages/fairseq/models/transformer/transformer_encoder.py", line 165 Only populated if return_all_hiddens is True. """ return self.forward_scriptable( ~~~~~~~~~~~~~~~~~~~~~~~~ src_tokens, src_lengths, return_all_hiddens, token_embeddings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

@alexeib I'm looking into this issue, and I'll try to send in a fix real quick.

cbalioglu pushed a commit that referenced this pull request Feb 23, 2023
* fix imports referencing moved metrics.py file

* Make representation computation branchless in TransformerEncoderBase (#4818)

Summary:
We want to make the computation branchless here because fairseq code may be
exported and traced for deployment purposes, and tracing mechanisms can
break the correctness for a captured program if it's dependent on input data.
In this diff we try to rewrite the code to remove one branch so that tracer
can proceed here and preserve the correct semantics of the model.

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix Torchscript typing in transformer_encoder.py (#4847)

* Add Generative Spoken Dialogue Language Modeling (#4879)

* Update deprecated torch.qr in glow.py example (#4685)

torch.qr is deprecated for a long time and is being removed by pytorch/pytorch#70989.

This PR makes the example compatible with new and old PyTorch versions.

* Emotion Conversion Paper Open Source (#4895)

* data2vec v2.0 (#4903)

data2v2c 2.0
Co-authored-by: Arun Babu <arbabu@fb.com>
Co-authored-by: Wei-Ning Hsu <wnhsu@csail.mit.edu>

* remove missing config entries when loading task from checkpoint (#4905)

* make apex optional (#4906)

* Add file to generate manifests for stop dataset. (#4891)

* Update STOP dataset README to include proper link. (#4892)

* Update README.md (#4893)

* using foreach to reduce kernel (#4904)

* using foreach to reduce kernel

* set reproducibility to looser threshold

* revert optimzer

* update

* update

* update

* update

* update

* update

* update

Co-authored-by: juntengjia <juntengjia@fb.com>

* Update README.md to add data2vec blog post (#4913)

* Update README.md

* Update config to fix circleci failure (#4949)

https://app.circleci.com/pipelines/github/fairinternal/fairseq-py/12635/workflows/3befbae2-79c4-458d-9fc4-aad4484183b4/jobs/26767

* Generative Spoken Dialogue Language Modeling Paper Open Source (#4957)

* wav2vec2_laser (#4968)

* ASR BLEU tool copied from ust branch into main (#4914)

* Add transcript option for asr-bleu (#4981)

---------

Co-authored-by: zhxchen17 <zhxchen17@outlook.com>
Co-authored-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Nguyen Tu Anh <nguyentuanh208@gmail.com>
Co-authored-by: Sergii Dymchenko <kit1980@gmail.com>
Co-authored-by: Felix Kreuk <felixkreuk@gmail.com>
Co-authored-by: Alexei Baevski <alexei.b@gmail.com>
Co-authored-by: padentomasello <pdtomasello@gmail.com>
Co-authored-by: Junteng Jia <juntengjia@hotmail.com>
Co-authored-by: juntengjia <juntengjia@fb.com>
Co-authored-by: arbabu123 <arbabu@fb.com>
Co-authored-by: dianaml0 <82468439+dianaml0@users.noreply.github.com>
Co-authored-by: Pierre Andrews <mortimer@fb.com>
Co-authored-by: Ilia Kulikov <kulikov@cs.nyu.edu>
Co-authored-by: Xutai Ma <xutaima@gmail.com>
lwb2099 pushed a commit to lwb2099/fairseq that referenced this pull request Apr 26, 2023
…acebookresearch#4818)

Summary:
We want to make the computation branchless here because fairseq code may be
exported and traced for deployment purposes, and tracing mechanisms can
break the correctness for a captured program if it's dependent on input data.
In this diff we try to rewrite the code to remove one branch so that tracer
can proceed here and preserve the correct semantics of the model.

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:
lwb2099 pushed a commit to lwb2099/fairseq that referenced this pull request Apr 26, 2023
* fix imports referencing moved metrics.py file

* Make representation computation branchless in TransformerEncoderBase (facebookresearch#4818)

Summary:
We want to make the computation branchless here because fairseq code may be
exported and traced for deployment purposes, and tracing mechanisms can
break the correctness for a captured program if it's dependent on input data.
In this diff we try to rewrite the code to remove one branch so that tracer
can proceed here and preserve the correct semantics of the model.

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix Torchscript typing in transformer_encoder.py (facebookresearch#4847)

* Add Generative Spoken Dialogue Language Modeling (facebookresearch#4879)

* Update deprecated torch.qr in glow.py example (facebookresearch#4685)

torch.qr is deprecated for a long time and is being removed by pytorch/pytorch#70989.

This PR makes the example compatible with new and old PyTorch versions.

* Emotion Conversion Paper Open Source (facebookresearch#4895)

* data2vec v2.0 (facebookresearch#4903)

data2v2c 2.0
Co-authored-by: Arun Babu <arbabu@fb.com>
Co-authored-by: Wei-Ning Hsu <wnhsu@csail.mit.edu>

* remove missing config entries when loading task from checkpoint (facebookresearch#4905)

* make apex optional (facebookresearch#4906)

* Add file to generate manifests for stop dataset. (facebookresearch#4891)

* Update STOP dataset README to include proper link. (facebookresearch#4892)

* Update README.md (facebookresearch#4893)

* using foreach to reduce kernel (facebookresearch#4904)

* using foreach to reduce kernel

* set reproducibility to looser threshold

* revert optimzer

* update

* update

* update

* update

* update

* update

* update

Co-authored-by: juntengjia <juntengjia@fb.com>

* Update README.md to add data2vec blog post (facebookresearch#4913)

* Update README.md

* Update config to fix circleci failure (facebookresearch#4949)

https://app.circleci.com/pipelines/github/fairinternal/fairseq-py/12635/workflows/3befbae2-79c4-458d-9fc4-aad4484183b4/jobs/26767

* Generative Spoken Dialogue Language Modeling Paper Open Source (facebookresearch#4957)

* wav2vec2_laser (facebookresearch#4968)

* ASR BLEU tool copied from ust branch into main (facebookresearch#4914)

* Add transcript option for asr-bleu (facebookresearch#4981)

---------

Co-authored-by: zhxchen17 <zhxchen17@outlook.com>
Co-authored-by: zhxchen17 <zhxchen17@fb.com>
Co-authored-by: Nguyen Tu Anh <nguyentuanh208@gmail.com>
Co-authored-by: Sergii Dymchenko <kit1980@gmail.com>
Co-authored-by: Felix Kreuk <felixkreuk@gmail.com>
Co-authored-by: Alexei Baevski <alexei.b@gmail.com>
Co-authored-by: padentomasello <pdtomasello@gmail.com>
Co-authored-by: Junteng Jia <juntengjia@hotmail.com>
Co-authored-by: juntengjia <juntengjia@fb.com>
Co-authored-by: arbabu123 <arbabu@fb.com>
Co-authored-by: dianaml0 <82468439+dianaml0@users.noreply.github.com>
Co-authored-by: Pierre Andrews <mortimer@fb.com>
Co-authored-by: Ilia Kulikov <kulikov@cs.nyu.edu>
Co-authored-by: Xutai Ma <xutaima@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants