At least one of ROCM_HOME or CUDA_HOME must be None by jithunnair-amd · Pull Request #152236 · pytorch/pytorch

jithunnair-amd · 2025-04-26T00:14:50Z

Copied description by @hj-wei from
ROCm#1809

Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

pytorch-bot · 2025-04-26T00:14:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152236

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 294242e with merge base 1ff3c22 ():

NEW FAILURE - The following job has failed:

Lint / Link checks / lint-urls / linux-job (gh)
RuntimeError: Command docker exec -t 63f451840a3c333897401d044f3e8929eaea5032c5d4bb25399a2fdb598f02fe /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jeffdaily · 2025-04-30T17:15:21Z

What about the following instead?

diff --git a/torch/utils/cpp_extension.py b/torch/utils/cpp_extension.py
index adfa1e864a..13b0f0fd96 100644
--- a/torch/utils/cpp_extension.py
+++ b/torch/utils/cpp_extension.py
@@ -233,14 +233,14 @@ CUDA_NOT_FOUND_MESSAGE = '''
 CUDA was not found on the system, please set the CUDA_HOME or the CUDA_PATH
 environment variable or add NVCC to your system PATH. The extension compilation will fail.
 '''
-ROCM_HOME = _find_rocm_home()
+ROCM_HOME = _find_rocm_home() if (torch.cuda._is_compiled() and torch.version.hip) else None
 HIP_HOME = _join_rocm_home('hip') if ROCM_HOME else None
 IS_HIP_EXTENSION = True if ((ROCM_HOME is not None) and (torch.version.hip is not None)) else False
 ROCM_VERSION = None
 if torch.version.hip is not None:
     ROCM_VERSION = tuple(int(v) for v in torch.version.hip.split('.')[:2])

-CUDA_HOME = _find_cuda_home() if torch.cuda._is_compiled() else None
+CUDA_HOME = _find_cuda_home() if (torch.cuda._is_compiled() and torch.version.cuda) else None
 CUDNN_HOME = os.environ.get('CUDNN_HOME') or os.environ.get('CUDNN_PATH')
 SYCL_HOME = _find_sycl_home() if torch.xpu._is_compiled() else None

jithunnair-amd · 2025-05-08T22:18:37Z

@pytorchbot merge -f "link lint failure is unrelated"

pytorchmergebot · 2025-05-08T22:20:11Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels Apr 26, 2025

jithunnair-amd requested a review from jeffdaily April 26, 2025 00:15

pytorchbot added the open source label Apr 26, 2025

jeffdaily added the topic: not user facing topic category label Apr 30, 2025

jithunnair-amd force-pushed the set_cuda_home branch from 833e955 to c34aebc Compare May 7, 2025 20:41

Set ROCM_HOME or CUDA_HOME depending on torch version

294242e

jithunnair-amd force-pushed the set_cuda_home branch from c34aebc to 294242e Compare May 7, 2025 20:45

jeffdaily approved these changes May 8, 2025

View reviewed changes

jithunnair-amd marked this pull request as ready for review May 8, 2025 17:40

jithunnair-amd requested review from ezyang, fmassa and soumith as code owners May 8, 2025 17:40

pytorchmergebot added the merging label May 8, 2025

pytorchmergebot added the Merged label May 8, 2025

pytorchmergebot closed this in 88b5677 May 8, 2025

pytorchmergebot removed the merging label May 8, 2025

jithunnair-amd deleted the set_cuda_home branch May 9, 2025 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

At least one of ROCM_HOME or CUDA_HOME must be None#152236

At least one of ROCM_HOME or CUDA_HOME must be None#152236
jithunnair-amd wants to merge 1 commit intopytorch:mainfrom
ROCm:set_cuda_home

jithunnair-amd commented Apr 26, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Apr 26, 2025 •

edited

Loading

Uh oh!

jeffdaily commented Apr 30, 2025

Uh oh!

jithunnair-amd commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jithunnair-amd commented Apr 26, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152236

❌ 1 New Failure

Uh oh!

jeffdaily commented Apr 30, 2025

Uh oh!

jithunnair-amd commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jithunnair-amd commented Apr 26, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 26, 2025 •

edited

Loading