[AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 #163988
[AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 #163988nWEIdia wants to merge 2 commits intopytorch:mainfrom
Conversation
…UDA13 Wheel Build See also pytorch#163972
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163988
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit 1a3aea4 with merge base 5880996 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Binary size information: (this PR's artifact, python3.12) Check functionality: (e.g. on THOR) |
|
Torch Compile now expect ptxas to be: /usr/local/lib/python3.12/dist-packages/torch/_inductor/bin/ptxas I would just change the expected directory to be /usr/local/lib/python3.12/dist-packages/torch/bin/ptxas again to reduce packaging risks. |
to /usr/local/lib/python3.12/dist-packages/torch/bin/ptxas
|
Test Results on THOR with the latest wheels: gh run download 18053880730 -n manywheel-py3_12-cuda-aarch64-13_0 root@:/workspace/pytorch# python test/inductor/test_control_flow.py CondTests.test_cond_mismatched_branch_output_size_device_cuda_dynamic_False
|
|
On the other device: warnings.warn(
|
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot cherry-pick --onto release/2.9 --fixes "Critical CI fix" -c critical |
…63988) See also #163972, which was intended to be this PR. Triton (release/3.5.x) by default ships CUDA12.8 ptxas. This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark. Fixes #163801 Test Plan: Check binary size increase against nightly or v2.9RC Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression. Reference: #119750 and pytorch/builder@5c814e2 Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary. However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available) Pull Request resolved: #163988 Approved by: https://github.com/atalman (cherry picked from commit 3b4ad4a)
Cherry picking #163988The cherry pick PR is at #164236 and it is linked with issue Critical CI fix. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
…64236) [AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 (#163988) See also #163972, which was intended to be this PR. Triton (release/3.5.x) by default ships CUDA12.8 ptxas. This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark. Fixes #163801 Test Plan: Check binary size increase against nightly or v2.9RC Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression. Reference: #119750 and pytorch/builder@5c814e2 Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary. However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available) Pull Request resolved: #163988 Approved by: https://github.com/atalman (cherry picked from commit 3b4ad4a) Co-authored-by: Wei Wang <weiwan@nvidia.com>
…4716) The ptxas bundling was introduced in #163988 to workaround issues users may face due to #163801 Fortunately, on the triton upstream side, triton-lang/triton@884fdae finally landed, which is means #163801 is permanently fixed. In addition, pytorch's triton commit pin has been updated via #178821 We can now roll back #163801 . In between, we unified the arm sbsa build with x86, so revert won't work. Manually reverting the export. Test plan: download and check the binary size to confirm 1) ptxas is gone from both x86 and sbsa (even though I only added to sbsa cu13 initially) 2) unit test that ran on #163988 should still pass. Pull Request resolved: #174716 Approved by: https://github.com/tinglvv, https://github.com/atalman
See also #163972, which was intended to be this PR.
Triton (release/3.5.x) by default ships CUDA12.8 ptxas.
This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark.
Fixes #163801
Test Plan:
Check binary size increase against nightly or v2.9RC
Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression.
Reference: #119750 and pytorch/builder@5c814e2
Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary.
However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available)
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @ptrblck @eqy @tinglvv @atalman @malfet