Skip to content

[ROCm] Enable StaticCudaLauncher for ROCm#166492

Closed
chinmaydk99 wants to merge 7 commits intopytorch:mainfrom
chinmaydk99:ck-staticudalauncher
Closed

[ROCm] Enable StaticCudaLauncher for ROCm#166492
chinmaydk99 wants to merge 7 commits intopytorch:mainfrom
chinmaydk99:ck-staticudalauncher

Conversation

@chinmaydk99
Copy link
Contributor

@chinmaydk99 chinmaydk99 commented Oct 29, 2025

This PR enables ROCm/HIP support for PyTorch's StaticCudaLauncher, which provides static compilation and launching of Triton kernels. The implementation has been tested on AMD MI300 and MI200 hardware.

Changes

Python (torch/_inductor/runtime/)

  • static_cuda_launcher.py: Added ROCm detection, .hsaco binary support, and ROCm-specific scratch parameter handling
  • triton_heuristics.py: Updated device type checks to support both cuda and hip

C++ (torch/csrc/)

  • Module.cpp: Enabled StaticCudaLauncher for ROCm builds
  • inductor/static_cuda_launcher.cpp: Added HIP API equivalents for all CUDA driver calls
  • inductor/static_cuda_launcher.h: Updated header guard

Tests (test/inductor/)

  • test_static_cuda_launcher.py: Removed @skipIfRocm decorators and updated binary file handling

Enabled Unit Tests
All tests in test/inductor/test_static_cuda_launcher.py now pass on ROCm:

  1. Fixes Test: TestStaticCudaLauncher.test_basic #168654.
  2. Fixes Test: TestStaticCudaLauncher.test_unsigned_integers #168655.
  3. Fixes Test: TestStaticCudaLauncher.test_signed_integers #168656.
  4. Fixes Test: TestStaticCudaLauncher.test_basic_1arg #168657.
  5. Fixes Test: TestStaticCudaLauncher.test_constexpr #168658.
  6. Fixes Test: TestStaticCudaLauncher.test_implied_constant #168659.
  7. Fixes Test: TestStaticCudaLauncher.test_kernel_no_args #168660.
  8. Fixes Test: TestStaticCudaLauncher.test_high_shared_mem #168661.
  9. Fixes Test: TestStaticCudaLauncher.test_too_high_shared_mem #168662.
  10. Fixes Test: TestStaticCudaLauncher.test_kernel_empty_tensor #168663.
  11. Fixes Test: TestStaticCudaLauncher.test_kernel_many_args #168664.
  12. Fixes Test: TestStaticTritonCompileResult.test_basic_compile #168666
  13. Fixes Test: TestStaticTritonCompileResult.test_incompatible_code #168667
  14. Fixes Test: TestStaticTritonCompileResult.test_static_launch_user_defined_triton_kernels #168668
  15. Fixes Test: TestStaticTritonCompileResult.test_empty_tensor #168669
  16. Fixes Test: TestStaticTritonCompileResult.test_any #168670
  17. Fixes Test: TestStaticTritonCompileResult.test_disable_static_cuda_launcher #168671
  18. Fixes Test: TestFxGraphCache.test_remote_cache_load_function #168575

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @jerrymannil @xinyazhang @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @dllehr-amd @chenyang78

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166492

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Unrelated Failures

As of commit 92ae60a with merge base 823edb4 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 29, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@pytorch-bot pytorch-bot bot added module: inductor module: rocm AMD GPU support for Pytorch oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: releng release notes category labels Oct 29, 2025
@jataylo jataylo added ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Oct 29, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 29, 2025

To add the ciflow label ciflow/inductor please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 29, 2025

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 29, 2025

To add the ciflow label ciflow/inductor-rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Oct 29, 2025
@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 29, 2025
@jataylo jataylo added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Oct 29, 2025
@pytorch-bot pytorch-bot bot removed ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Oct 29, 2025
@chinmaydk99
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/166492/head returned non-zero exit code 1

Rebasing (1/6)
Auto-merging torch/_inductor/runtime/static_cuda_launcher.py
CONFLICT (content): Merge conflict in torch/_inductor/runtime/static_cuda_launcher.py
Auto-merging torch/_inductor/runtime/triton_heuristics.py
Auto-merging torch/csrc/inductor/static_cuda_launcher.cpp
error: could not apply 700c9a8e466... Enabling StaticCudaLauncher for ROCm
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 700c9a8e466... # Enabling StaticCudaLauncher for ROCm

Raised by https://github.com/pytorch/pytorch/actions/runs/20824947914

@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "only failures were flaky or broken trunk; meta internal diff also failed but we have approval to land anyway"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@yangw-dev
Copy link
Contributor

@pytorchbot revert -m "sorry, a previous commit introduce the new test file breaks internal system, please rebase after #169121 's revert and reland" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@chinmaydk99 your PR has been successfully reverted.

@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "linter infra error, other failures are known flaky"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: PR #166492 has not been reviewed yet

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@chinmaydk99
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch push -f https://github.com/chinmaydk99/pytorch.git pull/166492/head:ck-staticudalauncher returned non-zero exit code 128

remote: The 'AMD' enterprise forbids access via a personal access tokens (classic) if the token's lifetime is greater than 366 days. Please adjust your token's lifetime at the following URL: https://github.com/settings/tokens/779664343
fatal: unable to access 'https://github.com/chinmaydk99/pytorch.git/': The requested URL returned error: 403

Raised by https://github.com/pytorch/pytorch/actions/runs/21394151706

@jeffdaily
Copy link
Collaborator

@pytorchbot merge -i

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 2, 2026

This PR needs to be approved by an authorized maintainer before merge.

@jeffdaily
Copy link
Collaborator

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "Dr CI unrelated failures; lint good"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/inductor-rocm-mi200 Trigger "inductor" config CI on ROCm MI200 ciflow/inductor-rocm-mi300 Trigger "inductor" config CI on ROCm MI300/MI325 ciflow/rocm-mi200 Trigger "default" config CI on ROCm MI200 ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor module: rocm AMD GPU support for Pytorch oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: releng release notes category Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet