[Model Runner V2] Migration from v1 to v2, with more Llama and Mistral dense models [2/N] by yewentao256 · Pull Request #42665 · vllm-project/vllm

yewentao256 · 2026-05-14T18:26:35Z

Purpose

Make progress for #41286

Based on #39337 (review) by @NickLucche , we firstly expand to llama and mistral, if this works well we will expand to all dense later.

Test

Covered in CI

Signed-off-by: yewentao256 <zhyanwentao@126.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

gemini-code-assist

Code Review

This pull request expands the DEFAULT_V2_MODEL_RUNNER_ARCHITECTURES in vllm/config/vllm.py to include LlamaForCausalLM and MistralForCausalLM. Additionally, it adds corresponding test cases for Llama 3.2 and Mistral 7B models in tests/test_config.py to ensure correct environment configuration. I have no feedback to provide as there were no review comments to evaluate.

mgoin · 2026-05-14T20:54:43Z

+                architectures=["LlamaForCausalLM"],
+                runner_type="generate",
+                is_moe=False,
+                is_quantized=False,


Why should quantization matter at all to mrv2? This might be too incremental an approach

We are very conservative about adding supported features, let's do it step by step. As you can see in #41286 we fixed a bunch of issues just for qwen dense model. We don't want to involve too much possilble CI failures in one PR

Signed-off-by: yewentao256 <zhyanwentao@126.com>

njhill · 2026-05-22T16:11:27Z

+    # Note: torch stock compile is not supported in v2, so keep
+    # the runner fixed across compile modes so this test only checks
+    # compilation correctness
+    base_env = {"VLLM_USE_V2_MODEL_RUNNER": "0"}


Can we not add this to the oracle? We should have it fall back automatically I think?

#43233
This should be solved here, CI failure related, a lot of similar issues.

njhill · 2026-05-22T16:16:53Z

+            # TODO: ngram / ngram_gpu / eagle are not supported by the v2
+            # model runner yet.
            if speculative_config.method in ("ngram", "ngram_gpu"):
                unsupported.append("ngram/ngram_gpu speculative decoding")
-            elif speculative_config.method not in ("eagle", "eagle3", "mtp"):
+            elif speculative_config.method not in ("eagle3", "mtp"):


What is the reason for this? MRV2 does support eagle.

We talked about this before, OOM issue related https://buildkite.com/vllm/ci/builds/66718#019e3bb8-eba0-4c70-8abd-eab98274facf

I temporally remove support of eagle, as there is a behavior change between v1 and v2

#35214 This PR might be related as it removes the full fp32 allocation which saves memory

OK thanks, I think we should understand/fix the behavior change instead

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill · 2026-05-24T01:01:53Z

@yewentao256 fyi I pushed another small test change that's needed

Signed-off-by: yewentao256 <zhyanwentao@126.com>

njhill · 2026-06-02T18:21:34Z

Thanks @yewentao256, have now merged your commits here + additional fixes in #43458.

mr v2 migration, more dense models

ecff0d3

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth and youkaichao as code owners May 14, 2026 18:26

claude Bot reviewed May 14, 2026

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label May 14, 2026

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

yewentao256 mentioned this pull request May 14, 2026

[Feature]: Migration from Model Runner v1 to Model Runner v2 #41286

Open

30 tasks

yewentao256 changed the title ~~[Model Runner V2] Migration from v1 to v2, with more dense models [2/N]~~ [Model Runner V2] Migration from v1 to v2, with more Llama and Mistral dense models [2/N] May 14, 2026

mergify Bot added llama Related to Llama models mistral Related to Mistral models labels May 14, 2026

Merge branch 'main' into wentao-mrv2-migration-more-dense

50c7f28

mgoin reviewed May 14, 2026

View reviewed changes

yewentao256 added 2 commits May 15, 2026 12:42

Merge branch 'main' into wentao-mrv2-migration-more-dense

2d9b699

Merge branch 'main' into wentao-mrv2-migration-more-dense

cd8d973

yewentao256 added the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label May 18, 2026

yewentao256 added 3 commits May 18, 2026 21:52

remove support for eagle

2eb258f

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Merge branch 'main' into wentao-mrv2-migration-more-dense

2c83b3f

fix torch compile test

29b96ef

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 mentioned this pull request May 20, 2026

[Model Runner v2] Force v1 runner for tests #43233

Merged

njhill added the v2 label May 20, 2026

njhill reviewed May 22, 2026

View reviewed changes

njhill mentioned this pull request May 23, 2026

[MRV2] Also enable MRV2 for Llama and Mistral dense models #43458

Merged

yewentao256 and others added 2 commits May 23, 2026 20:32

Merge branch 'main' into wentao-mrv2-migration-more-dense

a73a58b

update

918a278

Signed-off-by: yewentao256 <zhyanwentao@126.com>

fix to test_forward_error.py

185796a

Signed-off-by: Nick Hill <nickhill123@gmail.com>

mergify Bot added the v1 label May 24, 2026

njhill reviewed May 24, 2026

View reviewed changes

Comment thread vllm/config/vllm.py Outdated

yewentao256 and others added 4 commits May 26, 2026 14:01

temply convert back eagle

5517c96

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Merge branch 'main' into wentao-mrv2-migration-more-dense

485fb52

Merge branch 'main' into wentao-mrv2-migration-more-dense

91c3db9

Merge branch 'main' into wentao-mrv2-migration-more-dense

b1d07cb

njhill closed this Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model Runner V2] Migration from v1 to v2, with more Llama and Mistral dense models [2/N]#42665

[Model Runner V2] Migration from v1 to v2, with more Llama and Mistral dense models [2/N]#42665
yewentao256 wants to merge 14 commits into
mainfrom
wentao-mrv2-migration-more-dense

yewentao256 commented May 14, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mgoin May 14, 2026

Uh oh!

yewentao256 May 14, 2026 •

edited

Loading

Uh oh!

njhill May 22, 2026

Uh oh!

yewentao256 May 22, 2026

Uh oh!

njhill May 22, 2026

Uh oh!

yewentao256 May 22, 2026

Uh oh!

njhill May 22, 2026

Uh oh!

njhill commented May 24, 2026

Uh oh!

Uh oh!

njhill commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

yewentao256 commented May 14, 2026

Purpose

Test

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mgoin May 14, 2026

Choose a reason for hiding this comment

Uh oh!

yewentao256 May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njhill May 22, 2026

Choose a reason for hiding this comment

Uh oh!

yewentao256 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

njhill May 22, 2026

Choose a reason for hiding this comment

Uh oh!

yewentao256 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

njhill May 22, 2026

Choose a reason for hiding this comment

Uh oh!

njhill commented May 24, 2026

Uh oh!

Uh oh!

njhill commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yewentao256 May 14, 2026 •

edited

Loading