[SDPA] Guard mem efficient attention in deterministic mode by drisspg · Pull Request #91979 · pytorch/pytorch

drisspg · 2023-01-11T00:03:35Z

Summary

Memory efficient attention is a non deterministic algorithm.

This PR ensures that the sdp_choice will allow for mem-efficient to be used as the backend to SDPA if we are in warn only mode. Otherwise if we have enabled determinism and and set warn_only to False sdp_choice will not return memory efficient attention as the backend.

pytorch-bot · 2023-01-11T00:03:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91979

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

win-vs2019-cpu-py3 / build workflows failing consistently with linker crash

✅ No Failures

As of commit 4152ee6:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg · 2023-01-11T03:51:21Z

@pytorchbot merge

pytorchmergebot · 2023-01-11T03:53:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

### Changelist * Change Windows TORCH_CUDA_ARCH_LIST from `7.0` to `8.6` to compatible with NVIDIA A10G TPU * Correctly disable some tests that requires flash attention, which is not available on Windows at the moment. This has been fixed by #91979 * G5 runner has `AMD EPYC 7R32` CPU, not an Intel one * This seems to change the behavior of `GetDefaultMobileCPUAllocator` in `cpu_profiling_allocator_test`. This might need to be investigated further (TODO: TRACKING ISSUE). In the meantime, the test has been updated accordingly to use `GetDefaultCPUAllocator` correctly instead of `GetDefaultMobileCPUAllocator` for mobile build * Also one periodic test `test_cpu_gpu_parity_nn_Conv3d_cuda_float32` fails with Tensor not close error when comparing grad tensors between CPU and GPU. This is fixed by turning off TF32 for the test. ### Performance gain * (CURRENT) p3.2xlarge - https://hud.pytorch.org/tts shows each Windows CUDA shards (1-5 + functorch) takes about 2 hours to finish (duration) * (NEW RUNNER) g5.4xlarge - The very rough estimation of the duration is 1h30m for each shard, meaning a half an hour gain (**25%**) ### Pricing On demand hourly rate: * (CURRENT) p3.2xlarge: $3.428. Total = Total hours spent on Windows CUDA tests * 3.428 * (NEW RUNNER) g5.4xlarge: $2.36. Total = Total hours spent on Windows CUDA tests * Duration gain (0.75) * 2.36 So the current runner is not only more expensive but is also slower. Switching to G5 runners for Windows should cut down the cost by (3.428 - 0.75 * 2.36) / 3.428 = **~45%** ### Rolling out pytorch/test-infra#1376 needs to be reviewed and approved to ensure the capacity of the runner before PR can be merged. Pull Request resolved: #91727 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/seemethere

add tests and implement

4152ee6

drisspg marked this pull request as ready for review January 11, 2023 00:03

drisspg requested a review from cpuhrsch January 11, 2023 00:04

drisspg added the topic: not user facing topic category label Jan 11, 2023

cpuhrsch approved these changes Jan 11, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 11, 2023

pytorchmergebot added the Merged label Jan 11, 2023

pytorchmergebot closed this in 92855a2 Jan 11, 2023

drisspg mentioned this pull request Jan 11, 2023

Unit test for is_causal Better Transformers #91900

Closed

huydhn mentioned this pull request Jan 12, 2023

Switch Windows CI jobs to G5 runners #91727

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDPA] Guard mem efficient attention in deterministic mode#91979

[SDPA] Guard mem efficient attention in deterministic mode#91979
drisspg wants to merge 1 commit intopytorch:masterfrom
drisspg:sdpa_fallback_to_math_in_deterministic_mode

drisspg commented Jan 11, 2023

Uh oh!

pytorch-bot bot commented Jan 11, 2023 •

edited

Loading

Uh oh!

drisspg commented Jan 11, 2023

Uh oh!

pytorchmergebot commented Jan 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

drisspg commented Jan 11, 2023

Summary

Uh oh!

pytorch-bot bot commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91979

❗ 1 Active SEVs

✅ No Failures

Uh oh!

drisspg commented Jan 11, 2023

Uh oh!

pytorchmergebot commented Jan 11, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Jan 11, 2023 •

edited

Loading