Skip to content

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None.#1809

Closed
kairos-yu wants to merge 1 commit intoROCm:mainfrom
kairos-yu:hjwei_dev
Closed

FEAT:at least one of ROCM_HOME or CUDA_HOME must be None.#1809
kairos-yu wants to merge 1 commit intoROCm:mainfrom
kairos-yu:hjwei_dev

Conversation

@kairos-yu
Copy link

Hi all, I manually generating nvcc to bypass NVIDIA component checks(Megatron-LM),
see https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

but it can lead to incorrect CUDA_HOME configurations. This can cause initialization anomalies in downstream libraries like DeepSpeed

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Dec 30, 2024

Jenkins build for cb064aefc88dae25e9f1e54eabb29ad83f23aeca commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jan 2, 2025

Jenkins build for cb064aefc88dae25e9f1e54eabb29ad83f23aeca commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jithunnair-amd jithunnair-amd changed the base branch from main to release/2.5 January 6, 2025 16:14
@jithunnair-amd jithunnair-amd changed the base branch from release/2.5 to main January 6, 2025 16:20
jithunnair-amd added a commit that referenced this pull request Jan 6, 2025
This PR is a release/2.5-based version of
#1809

Copied description by @hj-wei from
#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed
@jithunnair-amd
Copy link
Collaborator

@hj-wei Sorry, we do not merge PRs into main branch in our ROCm fork, so as to keep it an exact replica of upstream. I have added this change to our release/2.5 branch: #1814, but I'd request you to file this PR on upstream (pytorch/pytorch) main, since it would allow community users to also benefit from this change.

rocm-mici pushed a commit that referenced this pull request Jan 6, 2025
This PR is a release/2.5-based version of
#1809

Copied description by @hj-wei from
#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed
jithunnair-amd added a commit that referenced this pull request Feb 19, 2025
This PR is a release/2.5-based version of
#1809

Copied description by @hj-wei from
#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed

(cherry picked from commit e814ee8)
jithunnair-amd added a commit that referenced this pull request Feb 20, 2025
This PR is a release/2.5-based version of
#1809

Copied description by @hj-wei from
#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed

(cherry picked from commit e814ee8)
jithunnair-amd added a commit that referenced this pull request Apr 23, 2025
This PR is a release/2.5-based version of
#1809

Copied description by @hj-wei from
#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed

(cherry picked from commit e814ee8)
(cherry picked from commit 66dfe13)
jithunnair-amd added a commit that referenced this pull request Apr 25, 2025
This PR is a release/2.5-based version of
#1809

Copied description by @hj-wei from
#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed

(cherry picked from commit e814ee8)
(cherry picked from commit 66dfe13)
jithunnair-amd added a commit that referenced this pull request Apr 26, 2025
Copied description by @hj-wei from
#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request May 8, 2025
Copied description by @hj-wei from
ROCm#1809

> Hi all, I manually generating nvcc to bypass NVIDIA component
checks(Megatron-LM),
see
https://github.com/NVIDIA/Megatron-LM/blob/2da43ef4c1b9e76f03b7567360cf7390e877f1b6/megatron/legacy/fused_kernels/__init__.py#L57

> but it can lead to incorrect CUDA_HOME configurations. This can cause
initialization anomalies in downstream libraries like DeepSpeed

Pull Request resolved: #152236
Approved by: https://github.com/jeffdaily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants