Enable XPU path for FlexAttention#143553
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143553
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 4 Unrelated FailuresAs of commit 29dbb36 with merge base d153af7 ( NEW FAILURE - The following job has failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
@pytorchbot rebase |
|
You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra. |
enable flex attention TMA flag on xpu by default
Hi, @liangan1 @hoshibara Thanks for mentioning this and refinements in this PR for FlexAttention related UTs on common devices. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / build Details for Dev Infra teamRaised by workflow job |
| # USE TMA = false by default | ||
| cur_kernel_options.setdefault("USE_TMA", False) | ||
|
|
||
| if torch.xpu.is_available(): |
There was a problem hiding this comment.
The tensor descriptor does not support all Q, K, V memory layout. The Q, K, V has to be contiguous on last dim to use the tensor descriptor.
Use the can_use_tma in the condition as if torch.xpu.is_available() and can_use_tma(query, key, value):.
|
@pytorchbot label ciflow/xpu |
|
@pytorchbot merge |
|
Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra. |
|
@pytorchbot label ciflow/trunk |
|
To add these label(s) (ciflow/trunk) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page). This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
This reverts commit e6ae1ed.
enable tma case for xpu
|
Hi @EikanWang |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 4 checks: xpu / linux-jammy-xpu-n-py3.10 / test (default, 1, 8, linux.idc.xpu), xpu / linux-jammy-xpu-n-py3.10 / test (default, 5, 8, linux.idc.xpu), xpu / linux-jammy-xpu-n-py3.10 / test (default, 3, 8, linux.idc.xpu), xpu / linux-jammy-xpu-n-py3.10 / test (default, 6, 8, linux.idc.xpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
|
||
| if not _has_sufficient_memory(_device, size_bytes): | ||
| # TODO: Memory availability checks for Intel GPU | ||
| if device != "xpu" and not _has_sufficient_memory(_device, size_bytes): |
There was a problem hiding this comment.
Here changed the logic of largeTensorTest. It disabled largeTensorTest on XPU device which results in the failure of python test/dynamo/test_aot_autograd_cache.py AOTAutogradCacheTests.test_autograd_inductor_guards_device_xpu_float16_requires_grad_True.
There was a problem hiding this comment.
In e6ae1ed, we attempted to complete the sufficient memory check for XPU, but it caused some previously skipped cases to fail.
This issue needs a new PR to fix.
#RFC153024
Motivation
What does this PR do?
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @yf225 @ColinPeppler @desertfire