Skip to content

[MRV2] Also enable MRV2 for Llama and Mistral dense models #43458

Merged
WoosukKwon merged 14 commits into
vllm-project:mainfrom
njhill:mrv2-migration-more-dense
Jun 2, 2026
Merged

[MRV2] Also enable MRV2 for Llama and Mistral dense models #43458
WoosukKwon merged 14 commits into
vllm-project:mainfrom
njhill:mrv2-migration-more-dense

Conversation

@njhill

@njhill njhill commented May 23, 2026

Copy link
Copy Markdown
Member

This is a combination of @yewentao256's #42665 with additional fixes after iterating on the CI failures.

For testing what CI issues remain.

njhill and others added 5 commits May 22, 2026 16:47
Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 23, 2026
@mergify mergify Bot added llama Related to Llama models mistral Related to Mistral models v1 labels May 23, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands the default V2 model runner support to include Llama and Mistral architectures and introduces a force_v1_runner flag in test utilities to isolate correctness tests from model runner changes. Feedback was provided regarding the implementation of the force_v1_runner flag, noting that the current dictionary unpacking order allows existing environment variables to override the forced V1 setting, and a code suggestion was offered to ensure strict enforcement.

Comment thread tests/utils.py
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill force-pushed the mrv2-migration-more-dense branch from 123fe8b to 35d2675 Compare May 25, 2026 18:31
@njhill njhill changed the title [MRV2] Also enable for Llama and Mistral dense models [MRV2] Also enable MRV2 for Llama and Mistral dense models May 27, 2026
@njhill njhill removed the ready ONLY add when PR is ready to merge/full CI is needed label May 31, 2026
njhill added 3 commits June 1, 2026 23:05
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@vllm-project vllm-project deleted a comment from mergify Bot Jun 2, 2026
@njhill

njhill commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

Remaining test failures are unrelated/environment-based and occurring elsewhere.

@WoosukKwon WoosukKwon merged commit da107a5 into vllm-project:main Jun 2, 2026
231 of 236 checks passed
@njhill njhill deleted the mrv2-migration-more-dense branch June 2, 2026 18:19
deng451e pushed a commit to LMCache/LMCache that referenced this pull request Jun 3, 2026
…3502)

[CI/CD] Force vLLM Model Runner V1 in the PD comprehensive test

The PD comprehensive test (k3-comprehensive-test) started timing out on
2026-06-02 when the floating vLLM nightly bumped from 0.22.1rc1.dev91 to
.dev95. That delta includes vllm-project/vllm#43458, which enables Model
Runner V2 (MRV2) by default for Llama/Mistral dense models -- including
meta-llama/Llama-3.2-1B-Instruct, the model used by pd.yaml.

MRV2 is incompatible with KV cache connectors, so the prefiller/decoder
(both driven through LMCacheConnectorV1) hang on the first request and the
job is killed by the 30-minute timeout. The vLLM servers start fine; only
the disaggregated request path hangs.

Pin the PD prefiller and decoder to MRV1 via VLLM_USE_V2_MODEL_RUNNER=0
until MRV2 supports KV connectors / reliably falls back to MRV1.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
Lyj1007 pushed a commit to Lyj1007/LMCache that referenced this pull request Jun 3, 2026
…MCache#3502)

[CI/CD] Force vLLM Model Runner V1 in the PD comprehensive test

The PD comprehensive test (k3-comprehensive-test) started timing out on
2026-06-02 when the floating vLLM nightly bumped from 0.22.1rc1.dev91 to
.dev95. That delta includes vllm-project/vllm#43458, which enables Model
Runner V2 (MRV2) by default for Llama/Mistral dense models -- including
meta-llama/Llama-3.2-1B-Instruct, the model used by pd.yaml.

MRV2 is incompatible with KV cache connectors, so the prefiller/decoder
(both driven through LMCacheConnectorV1) hang on the first request and the
job is killed by the 30-minute timeout. The vLLM servers start fine; only
the disaggregated request path hangs.

Pin the PD prefiller and decoder to MRV1 via VLLM_USE_V2_MODEL_RUNNER=0
until MRV2 supports KV connectors / reliably falls back to MRV1.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Jun 4, 2026
…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
andakai pushed a commit to andakai/vllm that referenced this pull request Jun 4, 2026
…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026
…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: JisoLya <523420504@qq.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026
…ect#43458)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models mistral Related to Mistral models ready-run-all-tests Trigger CI with all tests for wide-ranging PRs v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants