Skip to content

[Model Runner V2] Migration from v1 to v2, with more Llama and Mistral dense models [2/N]#42665

Closed
yewentao256 wants to merge 14 commits into
mainfrom
wentao-mrv2-migration-more-dense
Closed

[Model Runner V2] Migration from v1 to v2, with more Llama and Mistral dense models [2/N]#42665
yewentao256 wants to merge 14 commits into
mainfrom
wentao-mrv2-migration-more-dense

Conversation

@yewentao256

Copy link
Copy Markdown
Member

Purpose

Make progress for #41286

Based on #39337 (review) by @NickLucche , we firstly expand to llama and mistral, if this works well we will expand to all dense later.

Test

Covered in CI

Signed-off-by: yewentao256 <zhyanwentao@126.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label May 14, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands the DEFAULT_V2_MODEL_RUNNER_ARCHITECTURES in vllm/config/vllm.py to include LlamaForCausalLM and MistralForCausalLM. Additionally, it adds corresponding test cases for Llama 3.2 and Mistral 7B models in tests/test_config.py to ensure correct environment configuration. I have no feedback to provide as there were no review comments to evaluate.

@yewentao256 yewentao256 changed the title [Model Runner V2] Migration from v1 to v2, with more dense models [2/N] [Model Runner V2] Migration from v1 to v2, with more Llama and Mistral dense models [2/N] May 14, 2026
@mergify mergify Bot added llama Related to Llama models mistral Related to Mistral models labels May 14, 2026
Comment thread tests/test_config.py
architectures=["LlamaForCausalLM"],
runner_type="generate",
is_moe=False,
is_quantized=False,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should quantization matter at all to mrv2? This might be too incremental an approach

@yewentao256 yewentao256 May 14, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are very conservative about adding supported features, let's do it step by step. As you can see in #41286 we fixed a bunch of issues just for qwen dense model. We don't want to involve too much possilble CI failures in one PR

@yewentao256 yewentao256 added the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label May 18, 2026
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
# Note: torch stock compile is not supported in v2, so keep
# the runner fixed across compile modes so this test only checks
# compilation correctness
base_env = {"VLLM_USE_V2_MODEL_RUNNER": "0"}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not add this to the oracle? We should have it fall back automatically I think?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#43233
This should be solved here, CI failure related, a lot of similar issues.

Comment thread vllm/config/vllm.py Outdated
Comment on lines +1991 to +1995
# TODO: ngram / ngram_gpu / eagle are not supported by the v2
# model runner yet.
if speculative_config.method in ("ngram", "ngram_gpu"):
unsupported.append("ngram/ngram_gpu speculative decoding")
elif speculative_config.method not in ("eagle", "eagle3", "mtp"):
elif speculative_config.method not in ("eagle3", "mtp"):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this? MRV2 does support eagle.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this before, OOM issue related https://buildkite.com/vllm/ci/builds/66718#019e3bb8-eba0-4c70-8abd-eab98274facf

I temporally remove support of eagle, as there is a behavior change between v1 and v2

#35214 This PR might be related as it removes the full fp32 allocation which saves memory

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks, I think we should understand/fix the behavior change instead

yewentao256 and others added 2 commits May 23, 2026 20:32
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill

njhill commented May 24, 2026

Copy link
Copy Markdown
Member

@yewentao256 fyi I pushed another small test change that's needed

@mergify mergify Bot added the v1 label May 24, 2026
Comment thread vllm/config/vllm.py Outdated
@njhill

njhill commented Jun 2, 2026

Copy link
Copy Markdown
Member

Thanks @yewentao256, have now merged your commits here + additional fixes in #43458.

@njhill njhill closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models mistral Related to Mistral models ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs v1 v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants