FEAT:at least one of ROCM_HOME or CUDA_HOME must be None. by kairos-yu · Pull Request #1809 · ROCm/pytorch

kairos-yu · 2024-12-30T02:22:42Z

Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM),
see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed

rocm-repo-management-api · 2024-12-30T02:25:48Z

Jenkins build for cb064aefc88dae25e9f1e54eabb29ad83f23aeca commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-01-02T20:25:40Z

Jenkins build for cb064aefc88dae25e9f1e54eabb29ad83f23aeca commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

This PR is a release/2.5-based version of #1809 Copied description by @hj-wei from #1809 > Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM), see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57 > but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed

jithunnair-amd · 2025-01-06T16:30:40Z

@hj-wei Sorry, we do not merge PRs into main branch in our ROCm fork, so as to keep it an exact replica of upstream. I have added this change to our release/2.5 branch: #1814, but I'd request you to file this PR on upstream (pytorch/pytorch) main, since it would allow community users to also benefit from this change.

This PR is a release/2.5-based version of #1809 Copied description by @hj-wei from #1809 > Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM), see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57 > but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed

This PR is a release/2.5-based version of #1809 Copied description by @hj-wei from #1809 > Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM), see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57 > but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed (cherry picked from commit e814ee8)

This PR is a release/2.5-based version of #1809 Copied description by @hj-wei from #1809 > Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM), see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57 > but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed (cherry picked from commit e814ee8) (cherry picked from commit 66dfe13)

Copied description by @hj-wei from #1809 > Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM), see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57 > but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed

Copied description by @hj-wei from ROCm#1809 > Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM), see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57 > but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed Pull Request resolved: #152236 Approved by: https://github.com/jeffdaily

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None.

cb064ae

jithunnair-amd changed the base branch from main to release/2.5 January 6, 2025 16:14

jithunnair-amd requested review from jataylo, jeffdaily, jithunnair-amd and pruthvistony as code owners January 6, 2025 16:14

jithunnair-amd changed the base branch from release/2.5 to main January 6, 2025 16:20

jithunnair-amd mentioned this pull request Jan 6, 2025

[release/2.5] FEAT:at least one of ROCM_HOME or CUDA_HOME must be None. #1814

Merged

jithunnair-amd closed this Jan 6, 2025

jithunnair-amd mentioned this pull request Apr 26, 2025

At least one of ROCM_HOME or CUDA_HOME must be None pytorch/pytorch#152236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None.#1809

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None.#1809
kairos-yu wants to merge 1 commit intoROCm:mainfrom
kairos-yu:hjwei_dev

kairos-yu commented Dec 30, 2024

Uh oh!

rocm-repo-management-api bot commented Dec 30, 2024 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Jan 2, 2025 •

edited

Loading

Uh oh!

jithunnair-amd commented Jan 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kairos-yu commented Dec 30, 2024

Uh oh!

rocm-repo-management-api bot commented Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jithunnair-amd commented Jan 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rocm-repo-management-api bot commented Dec 30, 2024 •

edited

Loading

rocm-repo-management-api bot commented Jan 2, 2025 •

edited

Loading