[pytorch] Disable fast path in MultiheadAttention in Export#106824
[pytorch] Disable fast path in MultiheadAttention in Export#106824guangy10 wants to merge 1 commit intopytorch:mainfrom
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D48169806 |
|
This pull request was exported from Phabricator. Differential Revision: D48169806 |
2a841cc to
f446781
Compare
|
Fixed failures in test_transformers and test_jit_legacy |
…106824) Summary: Pull Request resolved: pytorch#106824 We are seeing `aten._native_multi_head_attention` op (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime. Two proposed solutions: 1. Disable fast path while tracing to leverage the non-optimized path to get decomp, that way, the blamed op won't show up in the exported graph 2. Add a decomp rule for `aten._native_multi_head_attention` After discussing with kimishpatel and bdhirsh, pytorch#1 is preferred and verified it could immediately unblock the critical model enablement work for PP. Test Plan: CI Reviewed By: kimishpatel Differential Revision: D48169806 fbshipit-source-id: e82be1ab24659a976554775c7c362a00c827416e
|
This pull request was exported from Phabricator. Differential Revision: D48169806 |
f446781 to
098b67e
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud |
|
Verified that these tests are broken in the base/trunk. |
|
Verified those tests are broken on base/trunk. Those tests are actually flaky tests. If run it one-by-one, you may see it pass sometimes. Anyway, failures are not relevant to this PR |
|
@pytorchbot merge -ic |
|
|
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
The @pytorchbot merge -i |
Merge failedReason: 2 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral), trunk / linux-bionic-cuda12.1-py3.10-gcc9 / test (nogpu_AVX512, 1, 1, linux.2xlarge) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -i |
|
the Inference Fastpath context manager in #107014 might offer a more general solution to controlling what kernels are being used (similar to the sdp_kernel context manager) |
| return False | ||
|
|
||
|
|
||
| def _is_make_fx_tracing(): |
There was a problem hiding this comment.
Why is this code in torch.nn ?
That doesn't look like the right place to put it and there are 100% chance this is going to get broken if proxy mode is modified as no-one will expect to update this file.
Could you please move this utility within fx?
There was a problem hiding this comment.
That other PR is not removing this utility though?
There was a problem hiding this comment.
I will defer to @bdhirsh for the final answer but next to the make_fx API would be my first suggestion
|
Maybe the right answer for this fine grained control of what algorithms to use is a connect manager similar to what we already do for sdpa?
See #107163 for a possible implementation
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: albanD ***@***.***>
Sent: Tuesday, August 15, 2023 2:26:56 PM
To: pytorch/pytorch ***@***.***>
Cc: Michael Gschwind ***@***.***>; Comment ***@***.***>
Subject: Re: [pytorch/pytorch] [pytorch] Disable fast path in MultiheadAttention in Export (PR #106824)
@albanD commented on this pull request. In torch/nn/modules/activation. py: > @@ -895,6 +895,14 @@ def _arg_requires_grad(x: Optional[torch. Tensor]) -> bool: return False +def _is_make_fx_tracing(): That other PR is not removing this utility
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
ZjQcmQRYFpfptBannerEnd
@albanD commented on this pull request.
________________________________
In torch/nn/modules/activation.py<#106824 (comment)>:
@@ -895,6 +895,14 @@ def _arg_requires_grad(x: Optional[torch.Tensor]) -> bool:
return False
+def _is_make_fx_tracing():
That other PR is not removing this utility though?
—
Reply to this email directly, view it on GitHub<#106824 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AOT4XHNPO2WX63KRSNCMJZ3XVPSSBANCNFSM6AAAAAA3JE64L4>.
You are receiving this because you commented.Message ID: ***@***.***>
|
|
@mikekgfb the PR you pointed to requires context manager to trigger certain path. In that it maybe a user controlled context manager. Case here is different in that, user, when exporting the model, is not making the choice. In fact it is not users decision at all. Which path, fast or slow decomposed, to take is based on tracing mode. |
…106824) Summary: We are seeing `aten._native_multi_head_attention` op (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime. Two proposed solutions: 1. Disable fast path while tracing to leverage the non-optimized path to get decomp, that way, the blamed op won't show up in the exported graph 2. Add a decomp rule for `aten._native_multi_head_attention` After discussing with kimishpatel and bdhirsh, pytorch#1 is preferred and verified it could immediately unblock the critical model enablement work for PP. Test Plan: CI Differential Revision: D48169806 Pull Request resolved: pytorch#106824 Approved by: https://github.com/kimishpatel
…MultiHeadAttention for strict export" In #106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the #106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in #164062 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]
…on for strict export" In #106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the #106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in #164062 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]
…ct export (#164721) In #106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the #106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in #164062 Pull Request resolved: #164721 Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan
…ct export (pytorch#164721) In pytorch#106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the pytorch#106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in pytorch#164062 Pull Request resolved: pytorch#164721 Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan
Summary:
We are seeing
aten._native_multi_head_attentionop (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime.Two proposed solutions:
aten._native_multi_head_attentionAfter discussing with kimishpatel and bdhirsh, #1 is preferred and verified it could immediately unblock the critical model enablement work for PP.
Test Plan: CI
Differential Revision: D48169806