Skip to content

[BE] [Inductor] Re-Land Support TMA before strict 3.4 cutoff#160747

Closed
njriasan wants to merge 1 commit intopytorch:mainfrom
njriasan:export-D80348643
Closed

[BE] [Inductor] Re-Land Support TMA before strict 3.4 cutoff#160747
njriasan wants to merge 1 commit intopytorch:mainfrom
njriasan:export-D80348643

Conversation

@njriasan
Copy link
Contributor

@njriasan njriasan commented Aug 15, 2025

Summary: Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs.

Test Plan:
Testing the previously failing test inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda

Rollback Plan:

Differential Revision: D80348643

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160747

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 922e96a with merge base 58f9a3d (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80348643

@njriasan njriasan added the topic: not user facing topic category label Aug 15, 2025
njriasan added a commit to njriasan/pytorch that referenced this pull request Aug 15, 2025
…#160747)

Summary:

Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs.

Test Plan:
Testing the previously failing test `inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda`

Rollback Plan:

Differential Revision: D80348643
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80348643

@njriasan njriasan added ciflow/h100 better-engineering Relatively self-contained tasks for better engineering contributors labels Aug 15, 2025
@njriasan njriasan requested a review from NikhilAPatel August 15, 2025 20:11
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 15, 2025
@njriasan
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor

malfet commented Aug 17, 2025

@pytorchbot revert -m "Looks like this breaks rocm, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=rocm%20%2F%20linux-jammy-rocm-py3.10" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@njriasan your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Aug 17, 2025
@njriasan
Copy link
Contributor Author

https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=rocm%20%2F%20linux-jammy-rocm-py3.10

Thanks! I'll rerun this test on rocm. I expect this PR shouldn't impact rocm because TMA isn't supported on AMD, but that probably points to an existing bug.

@njriasan
Copy link
Contributor Author

Doing a deeper dive it looks like this test is just broken. In particular I believe this check is setting expected_num_block_pointers=6 because of TMA, which only works for Nvidia and not AMD. I'll update this test to work better.

@njriasan
Copy link
Contributor Author

Doing a deeper dive it looks like this test is just broken. In particular I believe this check is setting expected_num_block_pointers=6 because of TMA, which only works for Nvidia and not AMD. I'll update this test to work better.

Actually it seems like these tests may not a more rigorous check to disable them on AMD. I'll add that check.

pytorch-bot bot pushed a commit that referenced this pull request Aug 19, 2025
Summary:

Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs.

Test Plan:
Testing the previously failing test `inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda`

Rollback Plan:

Differential Revision: D80348643
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80348643

…#160747)

Summary:

Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs.

Test Plan:
Testing the previously failing test `inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda`

Rollback Plan:

Differential Revision: D80348643
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80348643

@njriasan
Copy link
Contributor Author

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@njriasan
Copy link
Contributor Author

I see rocm failures on CI, but they seem unrelated and I'm not convinced these tests should be running. Before reverting this PR can we try fixing the CI issues in a followup PR. What I saw with the last failure is that it was broken on main but it just didn't run for some reason.

@njriasan
Copy link
Contributor Author

I added a fix here: #160974.

Copy link
Collaborator

@jeffdaily jeffdaily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix in follow-up PR or I will revert this. Breaking ROCm CI with this typo.

not (
HAS_CUDA_AND_TRITON
and torch.cuda.get_device_capability()[0] >= 9
and torch.hip.version is None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be torch.version.hip not hip.version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #160981.

jeffdaily added a commit to ROCm/pytorch that referenced this pull request Aug 19, 2025
broke rocm inductor tests
pytorchmergebot pushed a commit that referenced this pull request Aug 19, 2025
broke rocm inductor tests

Fixes #ISSUE_NUMBER

Pull Request resolved: #160981
Approved by: https://github.com/jeffdaily, https://github.com/Skylion007

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
can-gaa-hou pushed a commit to can-gaa-hou/pytorch that referenced this pull request Aug 22, 2025
…#160747)

Summary: Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs.

Test Plan:
Testing the previously failing test `inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda`

Rollback Plan:

Differential Revision: D80348643

Pull Request resolved: pytorch#160747
Approved by: https://github.com/NikhilAPatel
can-gaa-hou pushed a commit to can-gaa-hou/pytorch that referenced this pull request Aug 22, 2025
…#160747)

Summary: Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs.

Test Plan:
Testing the previously failing test `inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda`

Rollback Plan:

Differential Revision: D80348643

Pull Request resolved: pytorch#160747
Approved by: https://github.com/NikhilAPatel
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…#160747)

Summary: Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs.

Test Plan:
Testing the previously failing test `inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda`

Rollback Plan:

Differential Revision: D80348643

Pull Request resolved: pytorch#160747
Approved by: https://github.com/NikhilAPatel
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…#160747)

Summary: Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs.

Test Plan:
Testing the previously failing test `inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda`

Rollback Plan:

Differential Revision: D80348643

Pull Request resolved: pytorch#160747
Approved by: https://github.com/NikhilAPatel
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
broke rocm inductor tests

Fixes #ISSUE_NUMBER

Pull Request resolved: pytorch#160981
Approved by: https://github.com/jeffdaily, https://github.com/Skylion007

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

better-engineering Relatively self-contained tasks for better engineering contributors ci-no-td Do not run TD on this PR ciflow/h100 ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: inductor Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants