[Perf][1/n] Eliminate various GPU<->CPU syncs by njhill · Pull Request #41429 · vllm-project/vllm

njhill · 2026-05-01T00:13:50Z

Fix first batch of unnecessary gpu/cpu syncs, found via #40561:

Should benefit the following features:

MRv1 specific logprob_token_ids impl
MRv1 bad_words sampling parameter impl
MRv1 fast prefill
MRv1 pooling (preprocessing)
GPU ngram spec decoding (batch reorder)
DP native all-reduce
MRv2 penalties
Mamba-1 prefill

Signed-off-by: Nick Hill <nickhill123@gmail.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request implements several optimizations to reduce CPU-GPU synchronization across the vLLM codebase, primarily by moving tensor construction to the CPU and utilizing non-blocking transfers. Key improvements include the introduction of a platform-aware PIN_MEMORY constant to handle WSL environments, the use of index_fill_ and slice assignments to avoid implicit syncs, and the migration of sequence length processing to CPU-resident tensors. Review feedback highlights the need to use the existing pin_memory configuration in the Sampler class for WSL compatibility and recommends explicitly pinning tensors in the GPUModelRunner to ensure that non-blocking transfers are effective.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

yewentao256

LGTM, thanks for the work!

yewentao256 · 2026-05-11T18:35:26Z

 ) -> None:
-    prompt_bin_mask[expanded_idx_mapping] = 0
-    output_bin_counts[expanded_idx_mapping] = 0
+    # Use index_fill_ instead of `tensor[idx] = 0` to avoid sync.


Suggested change

# Use index_fill_ instead of `tensor[idx] = 0` to avoid sync.

Seems redundant comment

Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

[Perf] Eliminate various GPU<->CPU syncs

c786980

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill requested review from 22quinn, MatthewBonanni, WoosukKwon, benchislett, houseroad and luccafong as code owners May 1, 2026 00:13

claude Bot reviewed May 1, 2026

View reviewed changes

mergify Bot added speculative-decoding v1 labels May 1, 2026

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Comment thread vllm/v1/sample/sampler.py

Comment thread vllm/v1/worker/gpu_model_runner.py Outdated

add mamba_mixer.py

6cb67f6

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill requested review from tdoublep and tomeras91 as code owners May 1, 2026 00:20

address review comments

66b1578

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 1, 2026

njhill mentioned this pull request May 1, 2026

[Core][WIP] Check for GPU<->CPU sync during CI #40561

Closed

njhill added 3 commits May 5, 2026 11:02

Merge branch 'main' into fix-gpucpu-syncs1

50f0946

Merge branch 'main' into fix-gpucpu-syncs1

bcb9228

Merge branch 'main' into fix-gpucpu-syncs1

fd05e50

njhill requested a review from yewentao256 May 11, 2026 18:05

yewentao256 approved these changes May 11, 2026

View reviewed changes

Merge branch 'main' into fix-gpucpu-syncs1

46de02f

njhill enabled auto-merge (squash) May 11, 2026 19:47

njhill merged commit bbee532 into vllm-project:main May 11, 2026
79 checks passed

njhill deleted the fix-gpucpu-syncs1 branch May 11, 2026 20:56

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026

[Perf][1/n] Eliminate various GPU<->CPU syncs (vllm-project#41429)

2b48b5e

Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[Perf][1/n] Eliminate various GPU<->CPU syncs (vllm-project#41429)

b4d1d05

Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026

[Perf][1/n] Eliminate various GPU<->CPU syncs (vllm-project#41429)

0042336

Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026

[Perf][1/n] Eliminate various GPU<->CPU syncs (vllm-project#41429)

07565eb

Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf][1/n] Eliminate various GPU<->CPU syncs#41429

[Perf][1/n] Eliminate various GPU<->CPU syncs#41429
njhill merged 7 commits into
vllm-project:mainfrom
njhill:fix-gpucpu-syncs1

njhill commented May 1, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Uh oh!

yewentao256 May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

njhill commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

yewentao256 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

njhill commented May 1, 2026 •

edited

Loading