[MRV2] Also enable MRV2 for Llama and Mistral dense models #43458

Merged

WoosukKwon merged 14 commits into

vllm-project:mainfrom

njhill:mrv2-migration-more-dense

Jun 2, 2026

njhill commented May 23, 2026 •

edited

Loading

Member

This is a combination of @yewentao256's #42665 with additional fixes after iterating on the CI failures.

For testing what CI issues remain.

njhill and others added 5 commits

May 22, 2026 16:47


          mr v2 migration, more dense models

1fe48b9

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>


          revert unnecessary changes

2dcf80a

Signed-off-by: Nick Hill <nickhill123@gmail.com>


          force v1 runner for tests

ae29c01

Signed-off-by: yewentao256 <zhyanwentao@126.com>


          revert unnecessary changes

e47c6c4

Signed-off-by: Nick Hill <nickhill123@gmail.com>


          fix to test_forward_error.py

f76d2c9

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners

May 23, 2026 00:10

njhill added the ready label

mergify Bot added llama mistral v1 labels

gemini-code-assist Bot reviewed

View reviewed changes

gemini-code-assist Bot left a comment

Contributor

Code Review

This pull request expands the default V2 model runner support to include Llama and Mistral architectures and introduces a force_v1_runner flag in test utilities to isolate correctness tests from model runner changes. Feedback was provided regarding the implementation of the force_v1_runner flag, noting that the current dictionary unpacking order allows existing environment variables to override the forced V1 setting, and a code suggestion was offered to ensure strict enforcement.

tests/utils.py

njhill added the ready-run-all-tests label

njhill mentioned this pull request

[Model Runner V2] Migration from v1 to v2, with more Llama and Mistral dense models [2/N] #42665

Closed


          fix to test_abort_final_step.py

35d2675

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill force-pushed the mrv2-migration-more-dense branch from 123fe8b to 35d2675 Compare

May 25, 2026 18:31

njhill added 2 commits

May 27, 2026 07:18


          Merge remote-tracking branch 'origin/main' into mrv2-migration-more-d…

893a60d

…ense


          Merge branch 'main' into mrv2-migration-more-dense

c8df3ae

njhill changed the title ~~[MRV2] Also enable for Llama and Mistral dense models~~ [MRV2] Also enable MRV2 for Llama and Mistral dense models

njhill added 2 commits

May 28, 2026 09:37


          Merge branch 'main' into mrv2-migration-more-dense

e69e6d8


          final fix test_abort_final_step.py

57cf69d

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill removed the ready label


          Merge remote-tracking branch 'origin/main' into mrv2-migration-more-d…

47f66e7

…ense

njhill added 3 commits

June 1, 2026 23:05


          patch MRV2 with new apply_sparse_weight_patches

b43cb1b

Signed-off-by: Nick Hill <nickhill123@gmail.com>


          Merge remote-tracking branch 'origin/main' into mrv2-migration-more-d…

134f37f

…ense


          fix precommit

b3d4178

Signed-off-by: Nick Hill <nickhill123@gmail.com>

vllm-project deleted a comment from mergify Bot

njhill commented Jun 2, 2026

Member Author

Remaining test failures are unrelated/environment-based and occurring elsewhere.

WoosukKwon approved these changes

View reviewed changes

WoosukKwon merged commit da107a5 into vllm-project:main

231 of 236 checks passed

njhill deleted the mrv2-migration-more-dense branch

June 2, 2026 18:19

yewentao256 mentioned this pull request

[Feature]: Migration from Model Runner v1 to Model Runner v2 #41286

Open

30 tasks

sammshen mentioned this pull request

[Chore][CI] Force vLLM Model Runner V1 in the PD comprehensive test LMCache/LMCache#3502

Merged

2 tasks

deng451e pushed a commit to LMCache/LMCache that referenced this pull request


          [Chore][CI] Force vLLM Model Runner V1 in the PD comprehensive test (#…

9fbd742

…3502)

[CI/CD] Force vLLM Model Runner V1 in the PD comprehensive test

The PD comprehensive test (k3-comprehensive-test) started timing out on
2026-06-02 when the floating vLLM nightly bumped from 0.22.1rc1.dev91 to
.dev95. That delta includes vllm-project/vllm#43458, which enables Model
Runner V2 (MRV2) by default for Llama/Mistral dense models -- including
meta-llama/Llama-3.2-1B-Instruct, the model used by pd.yaml.

MRV2 is incompatible with KV cache connectors, so the prefiller/decoder
(both driven through LMCacheConnectorV1) hang on the first request and the
job is killed by the 30-minute timeout. The vLLM servers start fine; only
the disaggregated request path hangs.

Pin the PD prefiller and decoder to MRV1 via VLLM_USE_V2_MODEL_RUNNER=0
until MRV2 supports KV connectors / reliably falls back to MRV1.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

Lyj1007 pushed a commit to Lyj1007/LMCache that referenced this pull request


          [Chore][CI] Force vLLM Model Runner V1 in the PD comprehensive test (L…

de8f4c1

…MCache#3502)

[CI/CD] Force vLLM Model Runner V1 in the PD comprehensive test

The PD comprehensive test (k3-comprehensive-test) started timing out on
2026-06-02 when the floating vLLM nightly bumped from 0.22.1rc1.dev91 to
.dev95. That delta includes vllm-project/vllm#43458, which enables Model
Runner V2 (MRV2) by default for Llama/Mistral dense models -- including
meta-llama/Llama-3.2-1B-Instruct, the model used by pd.yaml.

MRV2 is incompatible with KV cache connectors, so the prefiller/decoder
(both driven through LMCacheConnectorV1) hang on the first request and the
job is killed by the 30-minute timeout. The vLLM servers start fine; only
the disaggregated request path hangs.

Pin the PD prefiller and decoder to MRV1 via VLLM_USE_V2_MODEL_RUNNER=0
until MRV2 supports KV connectors / reliably falls back to MRV1.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request


          [MRV2] Also enable MRV2 for Llama and Mistral dense models (vllm-proj…

c99430b

…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request


          [MRV2] Also enable MRV2 for Llama and Mistral dense models (vllm-proj…

…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>

andakai pushed a commit to andakai/vllm that referenced this pull request


          [MRV2] Also enable MRV2 for Llama and Mistral dense models (vllm-proj…

9c773ea

…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>

JisoLya pushed a commit to JisoLya/vllm that referenced this pull request


          [MRV2] Also enable MRV2 for Llama and Mistral dense models (vllm-proj…

7b03ad6

…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: JisoLya <523420504@qq.com>

knight0528 pushed a commit to knight0528/vllm that referenced this pull request


          [MRV2] Also enable MRV2 for Llama and Mistral dense models (vllm-proj…

ec3965d

…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>

micah-wil mentioned this pull request

[ROCm][V2] Fix failed assertion in Llama models when using EAGLE with ROCM_AITER_FA #44936

Merged

waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request


          [MRV2] Also enable MRV2 for Llama and Mistral dense models (vllm-proj…

…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

WoosukKwon WoosukKwon approved these changes

youkaichao Awaiting requested review from youkaichao youkaichao is a code owner

robertgshaw2-redhat Awaiting requested review from robertgshaw2-redhat robertgshaw2-redhat is a code owner

mgoin Awaiting requested review from mgoin mgoin is a code owner

tlrmchlsmth Awaiting requested review from tlrmchlsmth tlrmchlsmth is a code owner

houseroad Awaiting requested review from houseroad houseroad is a code owner

hmellor Awaiting requested review from hmellor

yewentao256 Awaiting requested review from yewentao256 yewentao256 is a code owner

ProExpertProg Awaiting requested review from ProExpertProg ProExpertProg is a code owner

+1 more reviewer

gemini-code-assist[bot] gemini-code-assist[bot] left review comments

Labels

llama mistral ready-run-all-tests v1