[pytorch] Disable fast path in MultiheadAttention in Export by guangy10 · Pull Request #106824 · pytorch/pytorch

guangy10 · 2023-08-08T22:57:26Z

Summary:
We are seeing aten._native_multi_head_attention op (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime.

Two proposed solutions:

Disable fast path while tracing to leverage the non-optimized path to get decomp, that way, the blamed op won't show up in the exported graph
Add a decomp rule for aten._native_multi_head_attention

After discussing with kimishpatel and bdhirsh, #1 is preferred and verified it could immediately unblock the critical model enablement work for PP.

Test Plan: CI

Differential Revision: D48169806

pytorch-bot · 2023-08-08T22:57:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106824

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm CI upgrade in progress

❌ 23 New Failures, 3 Unrelated Failures

As of commit 098b67e:

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-08-08T22:58:03Z

This pull request was exported from Phabricator. Differential Revision: D48169806

facebook-github-bot · 2023-08-09T20:43:00Z

This pull request was exported from Phabricator. Differential Revision: D48169806

guangy10 · 2023-08-09T20:44:27Z

Fixed failures in test_transformers and test_jit_legacy

…106824) Summary: Pull Request resolved: pytorch#106824 We are seeing `aten._native_multi_head_attention` op (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime. Two proposed solutions: 1. Disable fast path while tracing to leverage the non-optimized path to get decomp, that way, the blamed op won't show up in the exported graph 2. Add a decomp rule for `aten._native_multi_head_attention` After discussing with kimishpatel and bdhirsh, pytorch#1 is preferred and verified it could immediately unblock the critical model enablement work for PP. Test Plan: CI Reviewed By: kimishpatel Differential Revision: D48169806 fbshipit-source-id: e82be1ab24659a976554775c7c362a00c827416e

facebook-github-bot · 2023-08-09T20:51:51Z

This pull request was exported from Phabricator. Differential Revision: D48169806

guangy10 · 2023-08-09T21:56:55Z

@pytorchbot merge

pytorchmergebot · 2023-08-09T21:58:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-08-09T22:34:24Z

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

guangy10 · 2023-08-09T22:58:46Z

Verified that these tests are broken in the base/trunk.

FAILED [0.0076s] test/test_autograd.py::TestAutogradFallback::test_base_does_not_require_grad_mode_nothing - RuntimeError: new_fn INTERNAL ASSERT FAILED at "/home/guangyang/pytorch/torch/csrc/autograd/variable.cpp":176, please report a bug to PyTorch.
FAILED [0.0010s] test/test_autograd.py::TestAutogradFallback::test_base_does_not_require_grad_mode_warn - RuntimeError: This is not allowed since there's already a kernel registered from python overriding foo's behavior for CPU dispatch key and _test_autograd_fallback namespace.
FAILED [0.0009s] test/test_autograd.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_nothing - RuntimeError: This is not allowed since there's already a kernel registered from python overriding foo's behavior for CPU dispatch key and _test_autograd_fallback namespace.
FAILED [0.0030s] test/test_autograd.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_warn - RuntimeError: Tried to register an operator (_test_autograd_fallback::foo(Tensor self) -> Tensor) with the same name and overload name multiple times. Each overload's schema should on...

guangy10 · 2023-08-09T23:00:25Z

Verified those tests are broken on base/trunk.

FAILED [0.0586s] test/dynamo/test_dynamic_shapes.py::DynamicShapesReproTests::test_dynamic_shapes_float_guard_dynamic_shapes - Failed: Unexpected success
FAILED [0.6717s] test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_aot_sequence_nr - AssertionError: 'SeqN[542 chars]aten.ones_like.default|\n12|aten.expand.defaul[368 chars]t|\n' != 'SeqN[542 chars]aten.expand.default|\n12|aten.div.Scalar|\n11|[340 chars]t|\n'
FAILED [0.4966s] test/dynamo/test_dynamic_shapes.py::DynamicShapesAotAutogradFallbackTests::test_aot_sequence_nr_dynamic_shapes - AssertionError: 'SeqN[542 chars]aten.ones_like.default|\n12|aten.expand.defaul[368 chars]t|\n' != 'SeqN[542 chars]aten.expand.default|\n12|aten.div.Scalar|\n11|[340 chars]t|\n'

Those tests are actually flaky tests. If run it one-by-one, you may see it pass sometimes. Anyway, failures are not relevant to this PR

guangy10 · 2023-08-09T23:01:14Z

@pytorchbot merge -ic

pytorch-bot · 2023-08-09T23:01:18Z

-ic flag is deprecated, please use -i instead for the same effect.

pytorchmergebot · 2023-08-09T23:03:47Z

Merge started

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-08-09T23:24:03Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-bionic-cuda11.8-py3.10-gcc9 / test (distributed, 1, 3, linux.8xlarge.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

guangy10 · 2023-08-09T23:36:35Z

The distributed/test_distributed_spawn.py failures doesn't seem to be relevant.

@pytorchbot merge -i

pytorchmergebot · 2023-08-09T23:38:09Z

Merge started

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-08-09T23:48:24Z

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral), trunk / linux-bionic-cuda12.1-py3.10-gcc9 / test (nogpu_AVX512, 1, 1, linux.2xlarge)

Details for Dev Infra team

Raised by workflow job

guangy10 · 2023-08-09T23:50:54Z

@pytorchbot merge -i

pytorchmergebot · 2023-08-09T23:53:08Z

Merge started

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

mikekgfb · 2023-08-12T22:37:07Z

the Inference Fastpath context manager in #107014 might offer a more general solution to controlling what kernels are being used (similar to the sdp_kernel context manager)

albanD · 2023-08-15T21:13:36Z

torch/nn/modules/activation.py

    return False


+def _is_make_fx_tracing():


Why is this code in torch.nn ?
That doesn't look like the right place to put it and there are 100% chance this is going to get broken if proxy mode is modified as no-one will expect to update this file.
Could you please move this utility within fx?

FYI @bdhirsh

Yeah, it is a hack suggested by @bdhirsh to temporarily unblock Executorch during model tracing. #107014 seems to plan to handle this in a nicer way.

That other PR is not removing this utility though?

Agree with @albanD we should mvoe this within fx. @albanD any suggestions on where it can go?

I will defer to @bdhirsh for the final answer but next to the make_fx API would be my first suggestion

mikekgfb · 2023-08-16T14:16:50Z

Maybe the right answer for this fine grained control of what algorithms to use is a connect manager similar to what we already do for sdpa? See #107163 for a possible implementation Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________ From: albanD ***@***.***> Sent: Tuesday, August 15, 2023 2:26:56 PM To: pytorch/pytorch ***@***.***> Cc: Michael Gschwind ***@***.***>; Comment ***@***.***> Subject: Re: [pytorch/pytorch] [pytorch] Disable fast path in MultiheadAttention in Export (PR #106824) @albanD commented on this pull request. In torch/nn/modules/activation. py: > @@ -895,6 +895,14 @@ def _arg_requires_grad(x: Optional[torch. Tensor]) -> bool: return False +def _is_make_fx_tracing(): That other PR is not removing this utility ZjQcmQRYFpfptBannerStart This Message Is From an External Sender ZjQcmQRYFpfptBannerEnd @albanD commented on this pull request.

________________________________ In torch/nn/modules/activation.py<#106824 (comment)>:

@@ -895,6 +895,14 @@ def _arg_requires_grad(x: Optional[torch.Tensor]) -> bool:

return False +def _is_make_fx_tracing(): That other PR is not removing this utility though? — Reply to this email directly, view it on GitHub<#106824 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AOT4XHNPO2WX63KRSNCMJZ3XVPSSBANCNFSM6AAAAAA3JE64L4>. You are receiving this because you commented.Message ID: ***@***.***>

kimishpatel · 2023-08-16T15:06:51Z

@mikekgfb the PR you pointed to requires context manager to trigger certain path. In that it maybe a user controlled context manager. Case here is different in that, user, when exporting the model, is not making the choice. In fact it is not users decision at all. Which path, fast or slow decomposed, to take is based on tracing mode.

…106824) Summary: We are seeing `aten._native_multi_head_attention` op (not in core Aten op set) is left in the exported graph and causes problems in the downstream at runtime. Two proposed solutions: 1. Disable fast path while tracing to leverage the non-optimized path to get decomp, that way, the blamed op won't show up in the exported graph 2. Add a decomp rule for `aten._native_multi_head_attention` After discussing with kimishpatel and bdhirsh, pytorch#1 is preferred and verified it could immediately unblock the critical model enablement work for PP. Test Plan: CI Differential Revision: D48169806 Pull Request resolved: pytorch#106824 Approved by: https://github.com/kimishpatel

…MultiHeadAttention for strict export" In #106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the #106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in #164062 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]

…on for strict export" In #106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the #106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in #164062 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]

…ct export (#164721) In #106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the #106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in #164062 Pull Request resolved: #164721 Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan

…ct export (pytorch#164721) In pytorch#106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the pytorch#106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in pytorch#164062 Pull Request resolved: pytorch#164721 Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan

guangy10 requested review from albanD, jbschlosser and mikaylagawarecki as code owners August 8, 2023 22:57

facebook-github-bot added the fb-exported label Aug 8, 2023

guangy10 added keep-going Don't stop on first failure, keep running tests until the end module: export release notes: export labels Aug 8, 2023

guangy10 changed the title ~~[pytorch] Disable fast path for export~~ [pytorch] Disable fast path in MultiheadAttention in Export Aug 8, 2023

guangy10 requested review from bdhirsh and kimishpatel August 8, 2023 23:10

kimishpatel approved these changes Aug 9, 2023

View reviewed changes

guangy10 force-pushed the export-D48169806 branch from 2a841cc to f446781 Compare August 9, 2023 20:43

guangy10 force-pushed the export-D48169806 branch from f446781 to 098b67e Compare August 9, 2023 20:51

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 9, 2023

pytorchmergebot added the merging label Aug 9, 2023

pytorchmergebot removed the merging label Aug 9, 2023

pytorchmergebot added the merging label Aug 9, 2023

pytorchmergebot removed the merging label Aug 9, 2023

pytorchmergebot added the merging label Aug 9, 2023

pytorchmergebot removed the merging label Aug 9, 2023

pytorchmergebot added the merging label Aug 9, 2023

pytorchmergebot added Merged and removed merging labels Aug 10, 2023

pytorchmergebot closed this in 0b57581 Aug 10, 2023

guangy10 deleted the export-D48169806 branch August 10, 2023 16:34

mikekgfb mentioned this pull request Aug 12, 2023

Create fastpath backend context manager, similar to SDPA kernel backend manager #107014

Closed

albanD reviewed Aug 15, 2023

View reviewed changes

mikekgfb mentioned this pull request Aug 17, 2023

Create fastpath backend context manager, similar to SDPA kernel backend manager #107163

Closed

This was referenced Oct 8, 2025

[export][dynamo] Fallback to slowpath for MultiHeadAttention for strict export #164721

Closed

[strict-export][install_free_tensors] Tracker for issues #164062

Open

Conversation

guangy10 commented Aug 8, 2023

Uh oh!

pytorch-bot bot commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106824

❗ 1 Active SEVs

❌ 23 New Failures, 3 Unrelated Failures

Uh oh!

facebook-github-bot commented Aug 8, 2023

Uh oh!

facebook-github-bot commented Aug 9, 2023

Uh oh!

guangy10 commented Aug 9, 2023

Uh oh!

facebook-github-bot commented Aug 9, 2023

Uh oh!

guangy10 commented Aug 9, 2023

Uh oh!

pytorchmergebot commented Aug 9, 2023

Merge started

Uh oh!

pytorchmergebot commented Aug 9, 2023

Merge failed

Uh oh!

guangy10 commented Aug 9, 2023

Uh oh!

guangy10 commented Aug 9, 2023

Uh oh!

guangy10 commented Aug 9, 2023

Uh oh!

pytorch-bot bot commented Aug 9, 2023

Uh oh!

pytorchmergebot commented Aug 9, 2023

Merge started

Uh oh!

pytorchmergebot commented Aug 9, 2023

Merge failed

Uh oh!

guangy10 commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorchmergebot commented Aug 9, 2023

Merge started

Uh oh!

pytorchmergebot commented Aug 9, 2023

Merge failed

Uh oh!

guangy10 commented Aug 9, 2023

Uh oh!

pytorchmergebot commented Aug 9, 2023

Merge started

Uh oh!

mikekgfb commented Aug 12, 2023

Uh oh!

albanD Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

albanD Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

guangy10 Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

albanD Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

kimishpatel Aug 16, 2023

Choose a reason for hiding this comment

Uh oh!

albanD Aug 16, 2023

Choose a reason for hiding this comment

Uh oh!

mikekgfb commented Aug 16, 2023 via email

Uh oh!

kimishpatel commented Aug 16, 2023

Uh oh!

Reviewers

Assignees

pytorch-bot bot commented Aug 8, 2023 •

edited

Loading

guangy10 commented Aug 9, 2023 •

edited

Loading