Skip to content

Create fastpath backend context manager, similar to SDPA kernel backend manager#107163

Closed
mikekgfb wants to merge 1 commit intopytorch:mainfrom
mikekgfb:export-D48325593
Closed

Create fastpath backend context manager, similar to SDPA kernel backend manager#107163
mikekgfb wants to merge 1 commit intopytorch:mainfrom
mikekgfb:export-D48325593

Conversation

@mikekgfb
Copy link
Contributor

@mikekgfb mikekgfb commented Aug 14, 2023

Summary:

  1. create context manager infrastructure with aten-managed context with room for 64 settings
    • as Python builtin for Python eager and torch.compile
    • as TS builtins guarded by torch.hit.is_scripting() for legacy compatibility with TS legacy code. (since "normal" Python builtins that are part of the Pytorch distro are not accessible in TS.)
  2. Ability to start with True or False initialization (we always init the state to 0, but will invert some bits based on
    desired startup/default polarity of a flag)
  3. Create fastpath backend context manager using 4 context bits from 1
    (similar to SDPA kernel backend manager, to give users instant familiarity with the mechanism)
  4. Use context manager in FSDP test code (this is prep for a future update that broadens use of fastpath, and would
    otherwise break this FSDP test when the executions are performed using different kernels, showing divergence in
    the test, not thru an error, but the use of different kernels, with different FP rounding characteristics etc)

Test Plan: sandcastle, github

Differential Revision: D48325593

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @kiukchung @d4l3k @LucasLLC

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 14, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107163

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 47d7333 with merge base 1d95644 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

4 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

Copy link
Contributor

@mikaylagawarecki mikaylagawarecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mikekgfb, I haven't done an in-depth review of this but I have two broad questions

I take it that the intent is for this context manager to be able to disable the fastpaths for nn.MHA and nn.Transformer. However, while the flags in the context manager for SDPA toggle between the 3 backends that implement SDPA (math, mem-eff and flash), it appears to me like the flags in this context manager might be a bit more nuanced.

  1. is the term ATFP widely used to refer to the combination of nn.MHA/nn.Transformer? I am curious about the rationale for combining these into one single context manager as well as whether the naming will make this discoverable?

  2. Wanted to understand the meanings of the kwargs to this context manager. Granted that this is probably intended to be a tool for power users who have good context on this but I think it is important that we clearly establish the rationale/use-cases for each the arguments to understand the design here.

  • math: mirrors the context manager for sdpa
  • enable_nested_tensor: gives the ability to override the TransformerEncoder(enable_nested_tensor) flag given at construction time during runtime (
  • enable_mha gives the ability to disable the MHA sparsity fast path during runtime (doesn't have a corresponding flag in MHA constructor)
  • enable_encoder disables the sparsity fast path for TransformerEncoderLayer at runtime (seems like naming might be confusing here)

What is the rationale for including both flags that override arguments to constructors of certain modules as well as other flags that directly disable the fast path for certain modules?

@mikekgfb
Copy link
Contributor Author

mikekgfb commented Aug 17, 2023

What is the rationale for including both flags that override arguments to constructors of certain modules as well as other flags that directly disable the fast path for certain modules?

All flags disable the mode dynamically, starting and ending at the scope of the context manager, regardless of whatever settings. IN some ways you might think of that similar to how SDPA context manager works -- it enables or disables specific kernels, but they can also be not accessible on some hardware etc. So context manager is one of many decision criteria, but it's a surefire way to disable it. (The context manager check is always done in forward, so the operation of the context manager is completely dynamic based on scoping with the context manager, similar to what the SDPA context manager does. So you can, for example, build a model that uses enable_nested_tensors=True, and can run with nested tensors, but you can also turn that behavior off without rebuilding the model by using the config manager. )

reasons why users might want to disable fastpath:

1 - it's broken (hopefully not)

2 - performance is not good

3 - numerical equivalence is important (this was the original starting point for building this -- #106668 enables fastpath for inputs that are not batch_first. However, by doing this, it breaks the test_fsdp_core which runs the model with model.eval and model w/ no_grad() and exects them to be bit-accurate the same. This only works if we use the same kernels)

4 - users don't want the fastpath for other reasons, e.g., the recent #106824. We can keep adding additional conditions for each case, or just have a framework for user control.

The naming is an interesting question - fp stands for the (inference) "fastpath". We have captured all these features as Accelerated Transformer (previously known as Better Transformer) - so that's where the at comes from to make allowance that "fastpath" isn't super descriptive.

I think they fit reasonably well together, and give users control over what e2e consists of interlocking features. I can see other ways to dividing them, or I could also see the case for creating it as _atfp_* to introduce it first as a non-public interface to gain some experience. Wdyt?

PS: it's very deliberately modeled after https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html to give users a cognitive familiarity rather than introducing a new way of handling this.

@mikekgfb
Copy link
Contributor Author

mikekgfb commented Sep 5, 2023 via email

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

6 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

…nd manager (pytorch#107163)

Summary:

Create fastpath backend context manager, similar to SDPA kernel backend manager
ghstack-source-id: 208858046
exported-using-ghexport

Test Plan: sandcastle, github

Reviewed By: osalpekar

Differential Revision: D48325593
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48325593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fb-exported oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category release notes: nn release notes category Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants