[Model Runner V2][Bugfix] Fix MRV2 LoRA warmup by jeejeelee · Pull Request #35536 · vllm-project/vllm

jeejeelee · 2026-02-27T18:15:44Z

Purpose

WIP

Fix LoRA warmup
LoRA Verification

Test Plan

export VLLM_USE_V2_MODEL_RUNNER=1
pytest vllm/tests/lora/test_llm_with_multi_loras.py -v -s

Test Result

tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requests
tests/lora/test_llm_with_multi_loras.py::test_load_inplace_offline_reload
tests/lora/test_llm_with_multi_loras.py::test_load_inplace_false_no_reload
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================ 4 passed, 20 warnings in 101.24s (0:01:41) =============================================================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

gemini-code-assist

Code Review

This pull request aims to fix the LoRA warmup in the v2 model runner by wrapping the execute_model call in _dummy_run with the maybe_dummy_run_with_lora context manager. This correctly sets up dummy LoRA adapters for the warmup. However, there is a critical issue in how maybe_dummy_run_with_lora is called, where a boolean value is passed instead of the expected numpy array for num_scheduled_tokens. This will cause a TypeError during execution.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify · 2026-03-10T10:39:48Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify · 2026-05-22T18:20:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

WoosukKwon

@jeejeelee Apologies for the late review and thanks again for the PR!

I think the PR is clean overall. I left some stylistic questions. Please take a look.

WoosukKwon · 2026-06-01T16:21:50Z

    num_reqs: int,
    num_tokens: int,
    uniform_token_count: int | None,
+    num_active_loras: int = 0,


nit: Can we remove the default value (0) for num_active_loras across all functions? I think requiring it to be passed explicitly would make the call sites clearer, and it looks like all functions are already called with an explicit num_active_loras argument anyway.

WoosukKwon · 2026-06-01T16:23:47Z

+    return [0, lora_config.max_loras + 1]
+
+
+def resolve_effective_num_active_loras(


This looks a bit inefficient and verbose to me. Why don't we pre-compute the dispatch mapping like how we handle num_tokens?

WoosukKwon · 2026-06-01T16:34:12Z

        device: torch.device,
        cudagraph_mode: CUDAGraphMode,
        decode_query_len: int,
+        lora_capture_cases: list[int] | None = None,


just curious: why cases instead of sizes?

it's just that I used case😅

WoosukKwon · 2026-06-01T16:37:25Z

+    def hook(num_active_loras: int, num_reqs: int, num_tokens: int) -> None:
+        num_scheduled = np.full(num_reqs, num_tokens // num_reqs, dtype=np.int32)
+        num_scheduled[-1] += num_tokens % num_reqs
+        with runner.maybe_select_dummy_loras(
+            lora_config, num_scheduled, num_active_loras=num_active_loras
+        ):
+            pass


It shouldn't be in this PR, but it'd be nice if we can think about how to avoid this hack (if you agree it's a hack 😅). I guess we may need to refactor LoRAModelRunnerMixin?

Will dig into it in the following PR.

Co-authored-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

mergify · 2026-06-09T12:41:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Co-authored-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify · 2026-06-12T05:32:08Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

WoosukKwon

LGTM! Thanks again for the PR!

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

Init

667a33b

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee requested review from WoosukKwon and njhill as code owners February 27, 2026 18:15

jeejeelee marked this pull request as draft February 27, 2026 18:15

mergify Bot added v1 bug Something isn't working labels Feb 27, 2026

gemini-code-assist Bot reviewed Feb 27, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/model_runner.py Outdated

Fix

d5a7dc2

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee force-pushed the fix-mrv2-lora branch from e2f25ff to d5a7dc2 Compare February 28, 2026 00:46

jeejeelee added 3 commits February 28, 2026 08:46

Merge branch 'main' into fix-mrv2-lora

05eecf9

Fix

1e3bf93

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Dond

4ad1346

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify Bot added the nvidia label Feb 28, 2026

github-project-automation Bot added this to NVIDIA Feb 28, 2026

jeejeelee marked this pull request as ready for review February 28, 2026 13:22

jeejeelee added 4 commits February 28, 2026 21:22

Merge branch 'main' into fix-mrv2-lora

c0fc477

Merge branch 'vllm-project:main' into fix-mrv2-lora

e2cefa7

OPT

17e8b0b

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into fix-mrv2-lora

9d6fcf6

mergify Bot added the needs-rebase label Mar 10, 2026

Fix

0433835

mergify Bot removed the needs-rebase label Mar 10, 2026

jeejeelee added 3 commits March 10, 2026 15:59

Fix

a410a20

Add text

46cc8e0

Merge branch 'main' into fix-mrv2-lora

0f79272

njhill reviewed Mar 12, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/model_runner.py Outdated

Comment thread vllm/v1/worker/gpu/model_runner.py Outdated

Comment thread vllm/v1/worker/gpu/lora_utils.py Outdated

jeejeelee and others added 3 commits March 13, 2026 19:25

Update vllm/v1/worker/gpu/model_runner.py

65e35bc

Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Update vllm/v1/worker/gpu/model_runner.py

d85392b

Co-authored-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

OPT

9271139

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee added 2 commits March 14, 2026 16:25

Merge branch 'main' into fix-mrv2-lora

09c01a4

Merge branch 'main' into fix-mrv2-lora

0e87213

njhill mentioned this pull request May 19, 2026

[Model Runner V2] Fix lora Triton Error [CUDA]: device-side assert triggered #43139

Merged

njhill added the v2 label May 22, 2026

mergify Bot added the needs-rebase label May 22, 2026

Rebase

e78528d

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify Bot added the qwen Related to Qwen models label May 23, 2026

Merge branch 'main' into fix-mrv2-lora

279028c

mergify Bot removed the needs-rebase label May 23, 2026

Merge branch 'main' into fix-mrv2-lora

6b56c42

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 1, 2026

WoosukKwon reviewed Jun 1, 2026

View reviewed changes

Update vllm/v1/worker/gpu/model_runner.py

dd5eb5b

Co-authored-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

mergify Bot added the needs-rebase label Jun 9, 2026

jeejeelee and others added 4 commits June 9, 2026 20:47

Update vllm/v1/worker/gpu/lora_utils.py

dd1f609

Co-authored-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

Update vllm/v1/worker/gpu/model_runner.py

ffb97cd

Co-authored-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

Optimize code

2ece812

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Address conflict

351f3bb

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify Bot removed the needs-rebase label Jun 10, 2026

Merge branch 'main' into fix-mrv2-lora

5edfdb8

jeejeelee requested a review from WoosukKwon June 10, 2026 09:38

mergify Bot added the needs-rebase label Jun 12, 2026

WoosukKwon approved these changes Jun 15, 2026

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Jun 15, 2026

Merge branch 'main' into fix-mrv2-lora

ed3ec7c

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

WoosukKwon requested a review from yewentao256 as a code owner June 15, 2026 17:02

mergify Bot removed the needs-rebase label Jun 15, 2026

		return [0, lora_config.max_loras + 1]


		def resolve_effective_num_active_loras(

Uh oh!

Conversation

jeejeelee commented Feb 27, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify Bot commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 22, 2026

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

WoosukKwon Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WoosukKwon Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jeejeelee Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jeejeelee Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Jun 9, 2026

Uh oh!

mergify Bot commented Jun 12, 2026

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeejeelee commented Feb 27, 2026 •

edited by github-actions Bot

Loading