Skip to content

[Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993#34543

Merged
vllm-bot merged 1 commit intovllm-project:mainfrom
ROCm:akaratza_fix_basic_tests
Feb 14, 2026
Merged

[Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993#34543
vllm-bot merged 1 commit intovllm-project:mainfrom
ROCm:akaratza_fix_basic_tests

Conversation

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

@AndreasKaratzas AndreasKaratzas commented Feb 13, 2026

  • PR [Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation #32993 added UVA (Unified Virtual Addressing) support for CPU weight offloading but only included platform checks for CUDA and XPU in get_accelerator_view_from_cpu_tensor, causing a ValueError on ROCm.
  • The C++ layer already works on ROCm since cuda_view.cu is hipified automatically and the op registration is not ROCm-guarded. Only the Python-level platform check was missing.
  • Fix: add current_platform.is_rocm() to the existing CUDA branch, since ROCm shares the same device type and dispatch key.

Test plan

  • tests/kernels/core/test_uva.py
  • tests/basic_correctness/test_cpu_offload.py
  • tests/quantization/test_cpu_offload.py

…cpu_tensor

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug that prevented ROCm from using Unified Virtual Addressing (UVA) for CPU weight offloading. The fix is correct and straightforward, adding the necessary platform check for ROCm. I've included one suggestion to use an existing helper function, which will improve code maintainability.

Comment thread vllm/utils/torch_utils.py
assert cpu_tensor.is_pinned(), "CPU tensor must be pinned"
return torch.ops._C.get_xpu_view_from_cpu_tensor(cpu_tensor)
elif current_platform.is_cuda():
elif current_platform.is_cuda() or current_platform.is_rocm():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For better maintainability and to align with existing patterns in this file (e.g., in aux_stream), it's preferable to use the is_cuda_alike() helper method. This method encapsulates the check for both CUDA and ROCm platforms, making the code cleaner and easier to update if more CUDA-like platforms are supported in the future.

Suggested change
elif current_platform.is_cuda() or current_platform.is_rocm():
elif current_platform.is_cuda_alike():

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more declarative. I think it's better.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreasKaratzas let's follow gemini suggestion to be elif current_platform.is_cuda_alike():. It is a known convention in vLLM. If it works for both cuda and rocm, we use elif current_platform.is_cuda_alike():

Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AndreasKaratzas!

I was also wondering about is_cuda_alike() since we use that elsewhere. But don't feel too strongly and would be good to fix the CI asap!

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 13, 2026
@njhill njhill enabled auto-merge (squash) February 13, 2026 23:51
@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

AndreasKaratzas commented Feb 13, 2026

Thanks @AndreasKaratzas!

I was also wondering about is_cuda_alike() since we use that elsewhere. But don't feel too strongly and would be good to fix the CI asap!

Yeah, I don't feel that strongly either tbh 😅 It's just that so far sometimes I've found it more useful for more explicitly defined conditionals, although cuda_alike is pretty well defined (at least at the moment). Tough one 😅

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

@njhill Entrypoints Integration (Responses API) is not affected by this PR. It is a flaky TG and is addressed in
#33949

Btw, if you got time to review/approve #33949 as well, would be great 😅

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

@njhill V1 e2e + Engine is also a known failure I think.

@vllm-bot vllm-bot merged commit a0638d0 into vllm-project:main Feb 14, 2026
49 of 53 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Feb 14, 2026
@AndreasKaratzas AndreasKaratzas deleted the akaratza_fix_basic_tests branch February 14, 2026 04:10
jiangkuaixue123 pushed a commit to jiangkuaixue123/vllm that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants