Skip to content

[Core][WIP] Check for GPU<->CPU sync during CI#40561

Closed
njhill wants to merge 71 commits into
vllm-project:mainfrom
njhill:sync-check
Closed

[Core][WIP] Check for GPU<->CPU sync during CI#40561
njhill wants to merge 71 commits into
vllm-project:mainfrom
njhill:sync-check

Conversation

@njhill

@njhill njhill commented Apr 21, 2026

Copy link
Copy Markdown
Member

vLLM now uses asynchronous scheduling by default and in the majority of cases. Performance relies on the absence of any gpu<->cpu synchronizations on the main cuda stream, but such syncs can be opaque and it is easy for them to creep in accidentally.

This change adds a VLLM_GPU_SYNC_CHECK env var which enables torch.cuda.set_sync_debug_mode for the model forward pass and sampler, so that we can easily check for such syncs.

I'm trying first to enable it globally in the CI to flush out syncs that need to be fixed or where they are unavoidable and the check needs to be suppressed. Will then probably split the fixes into separate PR(s).

Update

Started to open separate PRs fixing identified sync points:

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added ci/build rocm Related to AMD ROCm v1 labels Apr 21, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 21, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a GPU-CPU synchronization check mechanism via the VLLM_GPU_SYNC_CHECK environment variable, which is set to "error" by default in the Dockerfiles. The check is applied to the sample_tokens and execute_model methods in the V1 GPU worker using a new decorator. Feedback indicates that the with_gpu_sync_check decorator should be improved to restore the previous synchronization mode rather than resetting to default and should check the environment variable at runtime to support dynamic disabling.

Comment thread vllm/v1/worker/utils.py Outdated
@njhill njhill added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Apr 21, 2026
@njhill njhill added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Apr 22, 2026
njhill added 18 commits April 30, 2026 09:16
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@mergify

mergify Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 19, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for discovering this!

Would it be possible to warn when a GPU sync is found like this? This way Transformers backend users can request that it be fixed by the Transformers team?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmellor once this PR is merged, the CI will fail when such syncs occur.

The sync checker does have a warn mode but I don't think we want to enable that at runtime since it may have some overhead.

However we can always just add a logged warning next to any of the added with gpu_sync_allowed()'s like these.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in that case I think I'd prefer no warning instead of always warning. I'll add something to the Transformers backend docs explaining how users can enable this for development so that they can catch syncs in their models.

@njhill

njhill commented May 19, 2026

Copy link
Copy Markdown
Member Author

Replacing with #43107.

@njhill njhill closed this May 19, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build kv-connector multi-modality Related to multi-modality (#4194) needs-rebase nvidia qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants