[Core][WIP] Check for GPU<->CPU sync during CI by njhill · Pull Request #40561 · vllm-project/vllm

njhill · 2026-04-21T22:43:50Z

vLLM now uses asynchronous scheduling by default and in the majority of cases. Performance relies on the absence of any gpu<->cpu synchronizations on the main cuda stream, but such syncs can be opaque and it is easy for them to creep in accidentally.

This change adds a VLLM_GPU_SYNC_CHECK env var which enables torch.cuda.set_sync_debug_mode for the model forward pass and sampler, so that we can easily check for such syncs.

I'm trying first to enable it globally in the CI to flush out syncs that need to be fixed or where they are unavoidable and the check needs to be suppressed. Will then probably split the fixes into separate PR(s).

Update

Started to open separate PRs fixing identified sync points:

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request introduces a GPU-CPU synchronization check mechanism via the VLLM_GPU_SYNC_CHECK environment variable, which is set to "error" by default in the Dockerfiles. The check is applied to the sample_tokens and execute_model methods in the V1 GPU worker using a new decorator. Feedback indicates that the with_gpu_sync_check decorator should be improved to restore the previous synchronization mode rather than resetting to default and should check the environment variable at runtime to support dynamic disabling.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

mergify · 2026-05-19T03:08:05Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

hmellor · 2026-05-19T07:04:42Z

Thanks for discovering this!

Would it be possible to warn when a GPU sync is found like this? This way Transformers backend users can request that it be fixed by the Transformers team?

@hmellor once this PR is merged, the CI will fail when such syncs occur.

The sync checker does have a warn mode but I don't think we want to enable that at runtime since it may have some overhead.

However we can always just add a logged warning next to any of the added with gpu_sync_allowed()'s like these.

Ok, in that case I think I'd prefer no warning instead of always warning. I'll add something to the Transformers backend docs explaining how users can enable this for development so that they can catch syncs in their models.

njhill · 2026-05-19T15:49:51Z

Replacing with #43107.

njhill requested review from ProExpertProg, WoosukKwon, gshtras, hmellor, houseroad, mgoin, robertgshaw2-redhat, tjtanaa, tlrmchlsmth, yewentao256 and youkaichao as code owners April 21, 2026 22:43

claude Bot reviewed Apr 21, 2026

View reviewed changes

mergify Bot added ci/build rocm Related to AMD ROCm v1 labels Apr 21, 2026

github-project-automation Bot added this to AMD Apr 21, 2026

github-project-automation Bot moved this to Todo in AMD Apr 21, 2026

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread vllm/v1/worker/utils.py Outdated

njhill force-pushed the sync-check branch from 3c59f62 to 4a8fe98 Compare April 21, 2026 22:51

njhill added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Apr 21, 2026

njhill requested review from 22quinn, LucasWilkinson, MatthewBonanni, jeejeelee and pavanimajety as code owners April 22, 2026 14:54

njhill added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Apr 22, 2026

njhill added 18 commits April 30, 2026 09:16

lora load adapter

e5b45df

Signed-off-by: Nick Hill <nickhill123@gmail.com>

idefics2

ccb7c03

Signed-off-by: Nick Hill <nickhill123@gmail.com>

phi4mm_audio

e509ea8

Signed-off-by: Nick Hill <nickhill123@gmail.com>

temp

24afeb6

Signed-off-by: Nick Hill <nickhill123@gmail.com>

temp2

bd27f57

Signed-off-by: Nick Hill <nickhill123@gmail.com>

temp3

fb51bda

Signed-off-by: Nick Hill <nickhill123@gmail.com>

fix custom lp

e44ac81

Signed-off-by: Nick Hill <nickhill123@gmail.com>

temp4

ceda006

Signed-off-by: Nick Hill <nickhill123@gmail.com>

temp5

a7f931f

Signed-off-by: Nick Hill <nickhill123@gmail.com>

h2d util

9822304

Signed-off-by: Nick Hill <nickhill123@gmail.com>

use async_tensor_h2d utility function

93ac29e

Signed-off-by: Nick Hill <nickhill123@gmail.com>

avoid circular import

be7b548

Signed-off-by: Nick Hill <nickhill123@gmail.com>

typo

4805d7b

Signed-off-by: Nick Hill <nickhill123@gmail.com>

qwen recompute_mrope_positions; qwen3_vl updates

ddc75a8

Signed-off-by: Nick Hill <nickhill123@gmail.com>

switch gpu_sync_allowed count to first_only bool

610ff40

Signed-off-by: Nick Hill <nickhill123@gmail.com>

remove now-redundant guards in grouped_topk_router.py

b3007c0

Signed-off-by: Nick Hill <nickhill123@gmail.com>

qwen3_asr

bcbbb93

Signed-off-by: Nick Hill <nickhill123@gmail.com>

post-rebase fixups

a926b48

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill force-pushed the sync-check branch from 74d9ecd to a926b48 Compare April 30, 2026 17:20

This was referenced May 1, 2026

[Perf][1/n] Eliminate various GPU<->CPU syncs #41429

Merged

[Perf][2/n] Eliminate GPU<->CPU syncs in pooling code #41433

Merged

[Perf][3/n] Eliminate GPU<->CPU syncs in attention impls #41434

Merged

njhill mentioned this pull request May 11, 2026

[Perf][4/n] Eliminate various GPU<->CPU syncs #42347

Merged

mergify Bot added the needs-rebase label May 19, 2026

hmellor reviewed May 19, 2026

View reviewed changes

njhill mentioned this pull request May 19, 2026

[Core][WIP] Check for GPU<->CPU sync during CI #43107

Open

njhill closed this May 19, 2026

github-project-automation Bot moved this from Todo to Done in AMD May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core][WIP] Check for GPU<->CPU sync during CI#40561

[Core][WIP] Check for GPU<->CPU sync during CI#40561
njhill wants to merge 71 commits into
vllm-project:mainfrom
njhill:sync-check

njhill commented Apr 21, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

mergify Bot commented May 19, 2026

Uh oh!

hmellor May 19, 2026

Uh oh!

njhill May 19, 2026

Uh oh!

hmellor May 20, 2026

Uh oh!

njhill commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

njhill commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify Bot commented May 19, 2026

Uh oh!

hmellor May 19, 2026

Choose a reason for hiding this comment

Uh oh!

njhill May 19, 2026

Choose a reason for hiding this comment

Uh oh!

hmellor May 20, 2026

Choose a reason for hiding this comment

Uh oh!

njhill commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

njhill commented Apr 21, 2026 •

edited

Loading