[Core] Avoid seq_lens_cpu GPU->CPU sync by njhill · Pull Request #40654 · vllm-project/vllm

njhill · 2026-04-22T22:04:24Z

With help from claude

Signed-off-by: Nick Hill <nickhill123@gmail.com>

gemini-code-assist

Code Review

This pull request introduces seq_lens_cpu_upper_bound to CommonAttentionMetadata to provide a CPU-side upper bound for sequence lengths, aiming to eliminate blocking GPU-to-CPU synchronizations during model execution, particularly in speculative decoding scenarios. The field is integrated across various attention backends and model runners to facilitate kernel dispatch and workspace sizing using optimistic bounds. A review comment identifies a critical issue in vllm/v1/spec_decode/eagle.py, where subtracting num_rejected_tokens from the CPU upper bound could trigger a synchronization or lead to device mismatches, potentially undermining the performance benefits of the change.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

benchislett · 2026-04-22T23:30:57Z

@njhill we already have something similar here in V1 with optimistic_seq_lens_cpu. What is the goal of this PR?

benchislett · 2026-04-22T23:31:26Z

Why is it modifying the vLLM V1 specdec implementation? Is this change not intended to be isolated to MRV2?

njhill · 2026-04-22T23:40:13Z

@benchislett the primary motivation is eliminating the current prefill cpu sync for DS3.2. This generalizes the v1-specific optimistic_seq_lens_cpu via explicit field in CommonAttentionMetadata that can be used more places in the various attention backends. And also extending it to MRV2.

njhill · 2026-04-22T23:46:37Z

Also aiming to avoid use of common_attn_metadata.seq_lens_cpu generally which is deprecated and can implicitly sync from GPU, i.e. preferring to use the new field consistently which is guaranteed to be cpu-only.

benchislett · 2026-04-23T00:00:21Z

@LucasWilkinson @MatthewBonanni for review

Signed-off-by: Nick Hill <nickhill123@gmail.com>

# Conflicts: # vllm/v1/spec_decode/eagle.py Signed-off-by: Nick Hill <nickhill123@gmail.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Adrian <info@zzit.ch>

Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

mergify Bot added speculative-decoding v1 labels Apr 22, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync

e4da3cc

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill force-pushed the seq_lens_cpu_nosync branch from 6e33c79 to e4da3cc Compare April 22, 2026 22:05

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread vllm/v1/spec_decode/eagle.py Outdated

njhill marked this pull request as ready for review April 22, 2026 22:12

njhill requested review from LucasWilkinson, MatthewBonanni, WoosukKwon, alexm-redhat, benchislett, luccafong, pavanimajety, youkaichao and zhuohan123 as code owners April 22, 2026 22:12

claude Bot reviewed Apr 22, 2026

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 22, 2026

fix tests

6f5024d

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill added 2 commits April 23, 2026 10:09

mla_attention

835e24b

Signed-off-by: Nick Hill <nickhill123@gmail.com>

reduce comments

a80a5b1

Signed-off-by: Nick Hill <nickhill123@gmail.com>

WoosukKwon approved these changes Apr 23, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into seq_lens_cpu_nosync

eb8eff7

# Conflicts: # vllm/v1/spec_decode/eagle.py Signed-off-by: Nick Hill <nickhill123@gmail.com>

WoosukKwon enabled auto-merge (squash) April 23, 2026 23:34

WoosukKwon merged commit fe85a92 into vllm-project:main Apr 24, 2026
76 checks passed

njhill deleted the seq_lens_cpu_nosync branch April 24, 2026 00:38

voipmonitor mentioned this pull request Apr 24, 2026

[Kimi] Track Kimi K2.5/K2.6 MLA + EAGLE serving on Blackwell (DCP4/DCP8, FP8 KV, draft backend split) #40608

Open

WoosukKwon mentioned this pull request Apr 24, 2026

[Bugfix] Fix IMA in DSA + MTP #40772

Merged

ignaciosica mentioned this pull request Apr 24, 2026

[Bugfix] add seq_lens_cpu_upper_bound to CommonAttentionMetadata in mla_runner.py #40844

Merged

avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

660517d

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

9cd24b8

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Adrian <info@zzit.ch>

shen-shanshan mentioned this pull request May 6, 2026

[Misc][Main2Main] Upgrade vLLM to 0427 vllm-project/vllm-ascend#8899

Merged

Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request May 7, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

6afc04c

Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

njhill mentioned this pull request May 10, 2026

[Model Runner V2] Fix seq_lens_cpu_upper_bound #42202

Merged

weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

a2a3de4

Signed-off-by: Nick Hill <nickhill123@gmail.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

27eefb2

Signed-off-by: Nick Hill <nickhill123@gmail.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

4f099a2

Signed-off-by: Nick Hill <nickhill123@gmail.com>

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

2f35c55

Signed-off-by: Nick Hill <nickhill123@gmail.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

1cfdfdc

Signed-off-by: Nick Hill <nickhill123@gmail.com>

izhuhaoran mentioned this pull request May 29, 2026

[Model Runner V2] Use actual batch max_seq_len for attn metadata #43991

Merged

brian-dellabetta pushed a commit to neuralmagic/vllm that referenced this pull request May 29, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

fcbd482

Signed-off-by: Nick Hill <nickhill123@gmail.com>

mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026

[Core] Avoid seq_lens_cpu GPU->CPU sync (vllm-project#40654)

52db28b

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Avoid seq_lens_cpu GPU->CPU sync#40654

[Core] Avoid seq_lens_cpu GPU->CPU sync#40654
WoosukKwon merged 5 commits into
vllm-project:mainfrom
njhill:seq_lens_cpu_nosync

njhill commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

benchislett commented Apr 22, 2026

Uh oh!

benchislett commented Apr 22, 2026

Uh oh!

njhill commented Apr 22, 2026 •

edited

Loading

Uh oh!

njhill commented Apr 22, 2026

Uh oh!

benchislett commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

njhill commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

benchislett commented Apr 22, 2026

Uh oh!

benchislett commented Apr 22, 2026

Uh oh!

njhill commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill commented Apr 22, 2026

Uh oh!

benchislett commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

njhill commented Apr 22, 2026 •

edited

Loading