Skip to content

[Perf][1/n] Eliminate various GPU<->CPU syncs#41429

Merged
njhill merged 7 commits into
vllm-project:mainfrom
njhill:fix-gpucpu-syncs1
May 11, 2026
Merged

[Perf][1/n] Eliminate various GPU<->CPU syncs#41429
njhill merged 7 commits into
vllm-project:mainfrom
njhill:fix-gpucpu-syncs1

Conversation

@njhill

@njhill njhill commented May 1, 2026

Copy link
Copy Markdown
Member

Fix first batch of unnecessary gpu/cpu syncs, found via #40561:

Should benefit the following features:

  • MRv1 specific logprob_token_ids impl
  • MRv1 bad_words sampling parameter impl
  • MRv1 fast prefill
  • MRv1 pooling (preprocessing)
  • GPU ngram spec decoding (batch reorder)
  • DP native all-reduce
  • MRv2 penalties
  • Mamba-1 prefill

Signed-off-by: Nick Hill <nickhill123@gmail.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements several optimizations to reduce CPU-GPU synchronization across the vLLM codebase, primarily by moving tensor construction to the CPU and utilizing non-blocking transfers. Key improvements include the introduction of a platform-aware PIN_MEMORY constant to handle WSL environments, the use of index_fill_ and slice assignments to avoid implicit syncs, and the migration of sequence length processing to CPU-resident tensors. Review feedback highlights the need to use the existing pin_memory configuration in the Sampler class for WSL compatibility and recommends explicitly pinning tensors in the GPUModelRunner to ensure that non-blocking transfers are effective.

Comment thread vllm/v1/sample/sampler.py
Comment thread vllm/v1/worker/gpu_model_runner.py Outdated
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill requested review from tdoublep and tomeras91 as code owners May 1, 2026 00:20
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 1, 2026
@njhill njhill requested a review from yewentao256 May 11, 2026 18:05

@yewentao256 yewentao256 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

) -> None:
prompt_bin_mask[expanded_idx_mapping] = 0
output_bin_counts[expanded_idx_mapping] = 0
# Use index_fill_ instead of `tensor[idx] = 0` to avoid sync.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Use index_fill_ instead of `tensor[idx] = 0` to avoid sync.

Seems redundant comment

@njhill njhill enabled auto-merge (squash) May 11, 2026 19:47
@njhill njhill merged commit bbee532 into vllm-project:main May 11, 2026
79 checks passed
@njhill njhill deleted the fix-gpucpu-syncs1 branch May 11, 2026 20:56
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants