[Model Runner V2] Fix `test_spec_decode_acceptance_length` Acceptance length regression by yewentao256 · Pull Request #43685 · vllm-project/vllm

yewentao256 · 2026-05-26T15:24:31Z

Purpose

Fixes https://buildkite.com/vllm/ci/builds/67897#019e5781-68a4-4544-a7c6-04ba535ea293

VLLM_USE_V2_MODEL_RUNNER=1 bash tests/v1/kv_connector/nixl_integration/spec_decode_acceptance_test.sh

Originally

        # ── Extract metrics from decode server ────────────────────────────
        n_drafts = _fetch_metric("vllm:spec_decode_num_drafts_total")
        n_accepted = _fetch_metric("vllm:spec_decode_num_accepted_tokens_total")
    
        assert n_drafts > 0, "No spec-decode drafts were generated"
    
        acceptance_length = 1 + (n_accepted / n_drafts)
        expected = config.expected_acceptance_length
    
        print(
            f"\n{config.id}: acceptance_length={acceptance_length:.3f} "
            f"(expected={expected:.3f})"
        )
        print(f"  Drafts: {n_drafts:.0f}, Accepted: {n_accepted:.0f}")
    
        # ── Assert acceptance length (all methods) ────────────────────────
        rel_error = abs(acceptance_length - expected) / expected
>       assert rel_error <= rtol, (
            f"Acceptance length regression for {config.id}! "
            f"Expected: {expected:.3f}, "
            f"Got: {acceptance_length:.3f}, "
            f"Relative error: {rel_error:.2%} (tolerance: {rtol:.0%})"
        )
E       AssertionError: Acceptance length regression for llama3-8b-eagle3! Expected: 2.600, Got: 1.961, Relative error: 24.58% (tolerance: 5%)
E       assert 0.24578209894195022 <= 0.05

tests/v1/kv_connector/nixl_integration/test_spec_decode_acceptance.py:183: AssertionError
====================================== warnings summary =======================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: 14 warnings
  /home/yewentao256/.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================== short test summary info ===================================
FAILED tests/v1/kv_connector/nixl_integration/test_spec_decode_acceptance.py::test_spec_decode_acceptance_length - AssertionError: Acceptance length regression for llama3-8b-eagle3! Expected: 2.600, Got: 1...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================= 1 failed, 16 warnings in 124.15s (0:02:04) ==========================

Now

================== 1 passed, 16 warnings in 97.33s (0:01:37) ==================

This PR fixes the issue (the drafter doesn't get the kv correctly)

For eagle3 + kv connector, we did it in an order like pre_forward -> target forward -> post_forward -> speculator.propose This makes the drafter can't get the correct kv.

We now do it in order as pre_forward -> target forward -> speculator.propose -> post_forward

Signed-off-by: yewentao256 <zhyanwentao@126.com>

gemini-code-assist

Code Review

This pull request defers the KV connector finalization in the GPU model runner until the drafter has written its KV cache. This is achieved by storing the scheduler output in _deferred_kv_connector_scheduler_output during execute_model under specific conditions (last PP rank, speculator present, and not a dummy run) and executing the post-forward step during sample_tokens. There are no review comments to address, and I have no additional feedback to provide.

njhill · 2026-05-27T02:15:55Z

Thanks @yewentao256! I included this in a slightly bigger refactor in #43719, included you as co-author.

yewentao256

Close this PR as #43719 landed

fix test_spec_decode_acceptance_length for mrv2

7f73e49

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from WoosukKwon and njhill as code owners May 26, 2026 15:24

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label May 26, 2026

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

yewentao256 mentioned this pull request May 26, 2026

[Feature]: Migration from Model Runner v1 to Model Runner v2 #41286

Open

30 tasks

mergify Bot added the v1 label May 26, 2026

njhill mentioned this pull request May 26, 2026

[MRV2][BugFix] Fix KV connector handling in spec decode case #43719

Merged

yewentao256 commented May 27, 2026

View reviewed changes

yewentao256 closed this May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model Runner V2] Fix `test_spec_decode_acceptance_length` Acceptance length regression#43685

[Model Runner V2] Fix `test_spec_decode_acceptance_length` Acceptance length regression#43685
yewentao256 wants to merge 1 commit into
mainfrom
wentao-fix-v2-test_spec_decode_acceptance_length

yewentao256 commented May 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

njhill commented May 27, 2026

Uh oh!

yewentao256 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yewentao256 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

njhill commented May 27, 2026

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yewentao256 commented May 26, 2026 •

edited

Loading