Create fastpath backend context manager, similar to SDPA kernel backend manager by mikekgfb · Pull Request #107163 · pytorch/pytorch

mikekgfb · 2023-08-14T19:17:29Z

Summary:

create context manager infrastructure with aten-managed context with room for 64 settings
- as Python builtin for Python eager and torch.compile
- as TS builtins guarded by torch.hit.is_scripting() for legacy compatibility with TS legacy code. (since "normal" Python builtins that are part of the Pytorch distro are not accessible in TS.)
Ability to start with True or False initialization (we always init the state to 0, but will invert some bits based on
desired startup/default polarity of a flag)
Create fastpath backend context manager using 4 context bits from 1
(similar to SDPA kernel backend manager, to give users instant familiarity with the mechanism)
Use context manager in FSDP test code (this is prep for a future update that broadens use of fastpath, and would
otherwise break this FSDP test when the executions are performed using different kernels, showing divergence in
the test, not thru an error, but the use of different kernels, with different FP rounding characteristics etc)

Test Plan: sandcastle, github

Differential Revision: D48325593

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @kiukchung @d4l3k @LucasLLC

pytorch-bot · 2023-08-14T19:17:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107163

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 47d7333 with merge base 1d95644 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-08-14T19:18:04Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-14T21:13:49Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-14T23:32:49Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-15T07:13:51Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-15T07:21:56Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-15T07:28:05Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-16T22:32:55Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-16T22:37:49Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-16T22:43:49Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-17T03:27:42Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-17T03:36:36Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-17T04:45:58Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-08-17T04:52:53Z

This pull request was exported from Phabricator. Differential Revision: D48325593

mikaylagawarecki

Hey @mikekgfb, I haven't done an in-depth review of this but I have two broad questions

I take it that the intent is for this context manager to be able to disable the fastpaths for nn.MHA and nn.Transformer. However, while the flags in the context manager for SDPA toggle between the 3 backends that implement SDPA (math, mem-eff and flash), it appears to me like the flags in this context manager might be a bit more nuanced.

is the term ATFP widely used to refer to the combination of nn.MHA/nn.Transformer? I am curious about the rationale for combining these into one single context manager as well as whether the naming will make this discoverable?
Wanted to understand the meanings of the kwargs to this context manager. Granted that this is probably intended to be a tool for power users who have good context on this but I think it is important that we clearly establish the rationale/use-cases for each the arguments to understand the design here.

math: mirrors the context manager for sdpa
enable_nested_tensor: gives the ability to override the TransformerEncoder(enable_nested_tensor) flag given at construction time during runtime (
enable_mha gives the ability to disable the MHA sparsity fast path during runtime (doesn't have a corresponding flag in MHA constructor)
enable_encoder disables the sparsity fast path for TransformerEncoderLayer at runtime (seems like naming might be confusing here)

What is the rationale for including both flags that override arguments to constructors of certain modules as well as other flags that directly disable the fast path for certain modules?

mikekgfb · 2023-08-17T22:34:57Z

What is the rationale for including both flags that override arguments to constructors of certain modules as well as other flags that directly disable the fast path for certain modules?

All flags disable the mode dynamically, starting and ending at the scope of the context manager, regardless of whatever settings. IN some ways you might think of that similar to how SDPA context manager works -- it enables or disables specific kernels, but they can also be not accessible on some hardware etc. So context manager is one of many decision criteria, but it's a surefire way to disable it. (The context manager check is always done in forward, so the operation of the context manager is completely dynamic based on scoping with the context manager, similar to what the SDPA context manager does. So you can, for example, build a model that uses enable_nested_tensors=True, and can run with nested tensors, but you can also turn that behavior off without rebuilding the model by using the config manager. )

reasons why users might want to disable fastpath:

1 - it's broken (hopefully not)

2 - performance is not good

3 - numerical equivalence is important (this was the original starting point for building this -- #106668 enables fastpath for inputs that are not batch_first. However, by doing this, it breaks the test_fsdp_core which runs the model with model.eval and model w/ no_grad() and exects them to be bit-accurate the same. This only works if we use the same kernels)

4 - users don't want the fastpath for other reasons, e.g., the recent #106824. We can keep adding additional conditions for each case, or just have a framework for user control.

The naming is an interesting question - fp stands for the (inference) "fastpath". We have captured all these features as Accelerated Transformer (previously known as Better Transformer) - so that's where the at comes from to make allowance that "fastpath" isn't super descriptive.

I think they fit reasonably well together, and give users control over what e2e consists of interlocking features. I can see other ways to dividing them, or I could also see the case for creating it as _atfp_* to introduce it first as a non-public interface to gain some experience. Wdyt?

PS: it's very deliberately modeled after https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html to give users a cognitive familiarity rather than introducing a new way of handling this.

mikekgfb · 2023-09-05T01:54:13Z

@mikaylagawarecki I’ll be appreciate to get your suggestion for naming. I agree that ATFP isn’t particularly obvious - I was concerned that fastpath alone was too generic (there are many operators that might have a fastpath. Totally not clear it refers to transformers. We pulso call it the longer name transformer_fastpath_ You asked about why some operators were disabling construction time settings. I think when we first introduced these settings, we wanted to give users control. Since then it turns out that this may not be enough, i.e., a model might be constructed once, but then users may want to run it without the fastpath. Two examples: A - A recent change disabled fastpath diff scripted models because on edge devices, they don’t include the operator. Takeaway: they build the model with fastpath (or received one that was so built), and the current “fix” disables the fastest for every torchscript user of i understand things correctly! B - For the FSDP unit tests, the model is built with enable nested tensor, but accuracy checks are performed to check bitwise accuracy. That is bound to fail because it will trigger different rounding even if the execution itself is mathematically equivalent. See fail for #106668 which is resolved by the present PR.

facebook-github-bot · 2023-09-08T07:11:58Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-09-08T16:16:16Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-09-08T18:53:30Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-09-08T22:29:07Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-09-10T20:56:31Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-09-10T20:57:13Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-09-10T22:55:30Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-09-10T22:58:29Z

This pull request was exported from Phabricator. Differential Revision: D48325593

github-actions · 2023-11-10T04:33:46Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

facebook-github-bot · 2023-11-10T21:24:39Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-11-30T20:05:09Z

This pull request was exported from Phabricator. Differential Revision: D48325593

facebook-github-bot · 2023-11-30T20:18:50Z

This pull request was exported from Phabricator. Differential Revision: D48325593

…nd manager (pytorch#107163) Summary: Create fastpath backend context manager, similar to SDPA kernel backend manager ghstack-source-id: 208858046 exported-using-ghexport Test Plan: sandcastle, github Reviewed By: osalpekar Differential Revision: D48325593

facebook-github-bot · 2023-12-01T02:16:42Z

This pull request was exported from Phabricator. Differential Revision: D48325593

mikekgfb requested review from H-Huang, awgu, mrshenli, rohan-varma and zhaojuanmao as code owners August 14, 2023 19:17

mikekgfb requested review from albanD, fduwjj, fegin, jbschlosser, kwen2501, mikaylagawarecki and wanchaol as code owners August 14, 2023 19:17

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Aug 14, 2023

facebook-github-bot added the fb-exported label Aug 14, 2023

mikekgfb force-pushed the export-D48325593 branch from 4c9e130 to 7111078 Compare August 14, 2023 21:14

mikekgfb force-pushed the export-D48325593 branch from 7111078 to 3396504 Compare August 14, 2023 23:32

mikekgfb force-pushed the export-D48325593 branch from 3396504 to 93d6b22 Compare August 15, 2023 07:14

mikekgfb force-pushed the export-D48325593 branch from 93d6b22 to 596db1a Compare August 15, 2023 07:22

mikekgfb force-pushed the export-D48325593 branch from e50e303 to d3b27fd Compare August 16, 2023 22:33

mikaylagawarecki reviewed Aug 17, 2023

View reviewed changes

mikekgfb mentioned this pull request Sep 8, 2023

Enabling Transformer fast path for not batch_first (MHA, TE, TEL) #106668

Closed

mikekgfb mentioned this pull request Nov 11, 2023

[Tracker] Move nested tensors to beta #112398

Open

52 tasks

osalpekar approved these changes Nov 30, 2023

View reviewed changes

Conversation

mikekgfb commented Aug 14, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107163

✅ No Failures

Uh oh!

facebook-github-bot commented Aug 14, 2023

Uh oh!

facebook-github-bot commented Aug 14, 2023

Uh oh!

facebook-github-bot commented Aug 14, 2023

Uh oh!

facebook-github-bot commented Aug 15, 2023

Uh oh!

facebook-github-bot commented Aug 15, 2023

Uh oh!

facebook-github-bot commented Aug 15, 2023

Uh oh!

facebook-github-bot commented Aug 16, 2023

Uh oh!

facebook-github-bot commented Aug 16, 2023

Uh oh!

facebook-github-bot commented Aug 16, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

facebook-github-bot commented Aug 17, 2023

Uh oh!

mikaylagawarecki left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikekgfb commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikekgfb commented Sep 5, 2023 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Sep 8, 2023

Uh oh!

facebook-github-bot commented Sep 8, 2023

Uh oh!

facebook-github-bot commented Sep 8, 2023

Uh oh!

facebook-github-bot commented Sep 8, 2023

Uh oh!

facebook-github-bot commented Sep 10, 2023

Uh oh!

facebook-github-bot commented Sep 10, 2023

Uh oh!

facebook-github-bot commented Sep 10, 2023

Uh oh!

facebook-github-bot commented Sep 10, 2023

Uh oh!

github-actions bot commented Nov 10, 2023

Uh oh!

facebook-github-bot commented Nov 10, 2023

Uh oh!

facebook-github-bot commented Nov 30, 2023

Uh oh!

facebook-github-bot commented Nov 30, 2023

Uh oh!

facebook-github-bot commented Dec 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mikekgfb commented Aug 14, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 14, 2023 •

edited

Loading

mikaylagawarecki left a comment •

edited

Loading

mikekgfb commented Aug 17, 2023 •

edited

Loading

mikekgfb commented Sep 5, 2023 via email •

edited

Loading