Enabling Transformer fast path for not batch_first (MHA, TE, TEL) by mikekgfb · Pull Request #106668 · pytorch/pytorch

mikekgfb · 2023-08-05T20:20:52Z

Summary: The fast path for the forward() method in MultiheadAttention, TE, TEL only accepted batch_first = True. This diff enables fast path for batch_first=False as well.

Differential Revision: D48095703

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @kiukchung @d4l3k @LucasLLC

pytorch-bot · 2023-08-05T20:20:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106668

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 36 New Failures

As of commit 05eb0fe with merge base 81adbb6 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
pull / linux-docs / build-docs-python-false (gh)
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 1, 3, linux.8xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 2, 3, linux.8xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 3, 3, linux.8xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 2, 5, linux.4xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 4, 5, linux.4xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 5, 5, linux.4xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
pull / linux-focal-py3.11-clang10 / test (crossref, 2, 2, linux.2xlarge) (gh)
pull / linux-focal-py3.11-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
pull / linux-focal-py3.11-clang10 / test (default, 2, 3, linux.2xlarge) (gh)
pull / linux-focal-py3.11-clang10 / test (default, 3, 3, linux.2xlarge) (gh)
pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 2, linux.2xlarge) (gh)
pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 2, linux.2xlarge) (gh)
pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2, linux.2xlarge) (gh)
pull / linux-focal-py3.8-clang10 / test (crossref, 2, 2, linux.2xlarge) (gh)
pull / linux-focal-py3.8-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
pull / linux-focal-py3.8-clang10 / test (default, 2, 3, linux.2xlarge) (gh)
pull / linux-focal-py3.8-clang10 / test (default, 3, 3, linux.2xlarge) (gh)
pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 2, linux.2xlarge) (gh)
pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 2, linux.2xlarge) (gh)
pull / linux-jammy-py3.10-clang15-asan / test (default, 2, 6, linux.4xlarge) (gh)
pull / linux-jammy-py3.10-clang15-asan / test (default, 3, 6, linux.4xlarge) (gh)
pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6, linux.4xlarge) (gh)
pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, linux.4xlarge) (gh)
pull / linux-jammy-py3.10-clang15-asan / test (default, 6, 6, linux.4xlarge) (gh)
pull / linux-jammy-py3.8-gcc11 / test (default, 2, 3, linux.2xlarge) (gh)
pull / linux-jammy-py3.8-gcc11 / test (default, 3, 3, linux.2xlarge) (gh)
pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-08-05T20:21:32Z

This pull request was exported from Phabricator. Differential Revision: D48095703

facebook-github-bot · 2023-08-06T02:21:59Z

This pull request was exported from Phabricator. Differential Revision: D48095703

facebook-github-bot · 2023-08-06T04:43:42Z

This pull request was exported from Phabricator. Differential Revision: D48095703

mikekgfb · 2023-08-06T19:22:43Z

@wconstab any suggestion how to handle the FSDP fail? I think (i.e., conjecture) the underlying cause is a numerical difference between fastpath vs. standard execution -- I think this is becawe avoided running into this so far because this test ran with batch_first=False, and until this diff we used the fastpath with batch_first=True only. (fastpath is only on for inference with no_grad(), so eval() vs eval+no_grad() exercises two different paths)

cc: @rohan-varma @awgu @mrshenli @drisspg

wconstab · 2023-08-08T23:32:31Z

missed this comment before, but it sounds like you've found out that its an issue of eval vs train accuracy for this model, is that right?

facebook-github-bot · 2023-08-10T16:47:10Z

This pull request was exported from Phabricator. Differential Revision: D48095703

mikekgfb · 2023-08-11T06:23:51Z

missed this comment before, but it sounds like you've found out that its an issue of eval vs train accuracy for this model, is that right?

The underlying issue is that model.eval() vs model.eval() with no_grad() triggers different computational kernels. The only way to control this in a sane way is via a backend context manager for choosing between these backends. I added this in #107014

facebook-github-bot · 2023-09-08T07:15:47Z

This pull request was exported from Phabricator. Differential Revision: D48095703

mikekgfb · 2023-09-08T07:30:33Z

missed this comment before, but it sounds like you've found out that its an issue of eval vs train accuracy for this model, is that right?

Yep - the way that "validation" is performed is that the test runs with train mode, and with eval/no_grad -- the latter triggers fastpath with the present bathc, but because we;re looking at different implementations, we can't expect bit-exact answers.

One possible solution is a context manager that gives more control over the kernel chosen, similar for what we do for SDPA - #107163 is an implementation of this context manager for backend selection

facebook-github-bot · 2023-09-08T17:14:42Z

This pull request was exported from Phabricator. Differential Revision: D48095703

facebook-github-bot · 2023-09-08T22:28:17Z

This pull request was exported from Phabricator. Differential Revision: D48095703

github-actions · 2023-11-08T00:48:31Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

facebook-github-bot · 2023-11-10T21:24:58Z

This pull request was exported from Phabricator. Differential Revision: D48095703

facebook-github-bot · 2023-11-10T23:45:14Z

This pull request was exported from Phabricator. Differential Revision: D48095703

…nd manager (pytorch#107163) Summary: Create fastpath backend context manager, similar to SDPA kernel backend manager Test Plan: sandcastle, github Differential Revision: D48325593

…torch#106668) Summary: The fast path for the `forward()` method in `MultiheadAttention`, `TE`, `TEL` only accepted `batch_first = True`. This diff enables fast path for `batch_first=False` as well. Test Plan: sandcastle, github CI/CD Differential Revision: D48095703

facebook-github-bot · 2023-11-30T20:19:54Z

This pull request was exported from Phabricator. Differential Revision: D48095703

mikekgfb requested review from albanD, jbschlosser and mikaylagawarecki as code owners August 5, 2023 20:20

facebook-github-bot added the fb-exported label Aug 5, 2023

mikekgfb force-pushed the export-D48095703 branch from 63203ec to 3a59a69 Compare August 6, 2023 02:21

mikekgfb force-pushed the export-D48095703 branch from 3a59a69 to 3719ea0 Compare August 6, 2023 04:43

This was referenced Aug 6, 2023

Enable transformer.py fastpath for not batch_first for TE & TEL #106659

Closed

Enabling Transformer fast path for not batch_first #106462

Closed

mikekgfb requested a review from drisspg August 6, 2023 19:29

albanD removed their request for review August 6, 2023 23:43

mikekgfb force-pushed the export-D48095703 branch from 3719ea0 to 79fee71 Compare August 10, 2023 16:47

mikekgfb mentioned this pull request Aug 17, 2023

Create fastpath backend context manager, similar to SDPA kernel backend manager #107163

Closed

mikekgfb force-pushed the export-D48095703 branch from 79fee71 to d8ee9bc Compare September 8, 2023 07:15

mikekgfb requested review from H-Huang, awgu, mrshenli, rohan-varma and zhaojuanmao as code owners September 8, 2023 07:15

mikekgfb requested review from fduwjj, fegin and wz337 as code owners September 8, 2023 07:15

mikekgfb added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix release notes: nn release notes category labels Sep 8, 2023

mikekgfb force-pushed the export-D48095703 branch from d8ee9bc to c7ecb35 Compare September 8, 2023 17:14

mikekgfb force-pushed the export-D48095703 branch from c7ecb35 to 21ea697 Compare September 8, 2023 22:28

github-actions bot added the Stale label Nov 8, 2023

mikekgfb force-pushed the export-D48095703 branch from 21ea697 to ddd757f Compare November 10, 2023 21:24

mikekgfb force-pushed the export-D48095703 branch from ddd757f to abdabaf Compare November 10, 2023 23:45

mikekgfb mentioned this pull request Nov 11, 2023

[Tracker] Move nested tensors to beta #112398

Open

52 tasks

Michael Gschwind added 2 commits November 30, 2023 12:18

Create fastpath backend context manager, similar to SDPA kernel backe…

ef72f11

…nd manager (pytorch#107163) Summary: Create fastpath backend context manager, similar to SDPA kernel backend manager Test Plan: sandcastle, github Differential Revision: D48325593

mikekgfb force-pushed the export-D48095703 branch from abdabaf to 05eb0fe Compare November 30, 2023 20:19

github-actions bot added the module: distributed label Nov 30, 2023

albanD added oncall: distributed Add this issue/PR to distributed oncall triage queue and removed module: distributed labels Dec 8, 2023

github-actions bot closed this Jan 7, 2024

Conversation

mikekgfb commented Aug 5, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106668

❌ 36 New Failures

Uh oh!

facebook-github-bot commented Aug 5, 2023

Uh oh!

facebook-github-bot commented Aug 6, 2023

Uh oh!

facebook-github-bot commented Aug 6, 2023

Uh oh!

mikekgfb commented Aug 6, 2023

Uh oh!

wconstab commented Aug 8, 2023

Uh oh!

facebook-github-bot commented Aug 10, 2023

Uh oh!

mikekgfb commented Aug 11, 2023

Uh oh!

facebook-github-bot commented Sep 8, 2023

Uh oh!

mikekgfb commented Sep 8, 2023

Uh oh!

facebook-github-bot commented Sep 8, 2023

Uh oh!

facebook-github-bot commented Sep 8, 2023

Uh oh!

github-actions bot commented Nov 8, 2023

Uh oh!

facebook-github-bot commented Nov 10, 2023

Uh oh!

facebook-github-bot commented Nov 10, 2023

Uh oh!

facebook-github-bot commented Nov 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mikekgfb commented Aug 5, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 5, 2023 •

edited

Loading