[Model Runner v2] Support reload weights (sleep mode) by yewentao256 · Pull Request #42673 · vllm-project/vllm

yewentao256 · 2026-05-14T19:24:24Z

Purpose

VLLM_USE_V2_MODEL_RUNNER=1 pytest tests/basic_correctness/test_cumem.py::test_deep_sleep

Originally

(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360]   File "/home/yewentao256/vllm-source/vllm/v1/executor/uniproc_executor.py", line 93, in collective_rpc
(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360]   File "/home/yewentao256/vllm-source/vllm/v1/serial_utils.py", line 510, in run_method
(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360]     return func(*args, **kwargs)
(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360]   File "/home/yewentao256/vllm-source/vllm/v1/worker/gpu_worker.py", line 351, in reload_weights
(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360]     self.model_runner.reload_weights(*args, **kwargs)
(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=2663266) ERROR 05-14 19:13:43 [core.py:1360] AttributeError: 'GPUModelRunner' object has no attribute 'reload_weights'

Now

======================================== 1 passed, 17 warnings in 45.19s =========================================

Signed-off-by: yewentao256 <zhyanwentao@126.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

gemini-code-assist

Code Review

This pull request introduces a reload_weights method to the GPU model runner in the v1 worker, which delegates the reloading process to the GPUModelRunnerV1 implementation. Feedback indicates that the method should also reset the encoder and multimodal caches to ensure that stale embeddings are not used following a weight update.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

njhill · 2026-05-15T16:48:38Z

@yewentao256 I didn't realize you had added these until after the PR was merged. I don't think we should change this behavior. Folks using this API would already be resetting the caches separately when needed.

        self.reset_encoder_cache()
        self.reset_mm_cache()

…2673) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…2673) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Liuweixiong0118 <lwx34158427@gmail.com>

…2673) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

…2673) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

mrv2 reload weights

d7eaada

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from WoosukKwon and njhill as code owners May 14, 2026 19:24

claude Bot reviewed May 14, 2026

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label May 14, 2026

yewentao256 mentioned this pull request May 14, 2026

[Feature]: Migration from Model Runner v1 to Model Runner v2 #41286

Open

30 tasks

mergify Bot added the v1 label May 14, 2026

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/model_runner.py

njhill approved these changes May 14, 2026

View reviewed changes

yewentao256 and others added 2 commits May 14, 2026 16:15

Update vllm/v1/worker/gpu/model_runner.py

cb1b48b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

Merge branch 'main' into wentao-mrv2-reload-weights

6e47344

yewentao256 mentioned this pull request May 15, 2026

Revert "[Model Runner v2] Oracle for model runner v2 - qwen3 dense model by default [1/N]" (#39337) #42698

Closed

njhill enabled auto-merge (squash) May 15, 2026 16:38

njhill merged commit 6147c70 into main May 15, 2026
71 checks passed

njhill deleted the wentao-mrv2-reload-weights branch May 15, 2026 16:41

yewentao256 mentioned this pull request May 15, 2026

[Bug] Migrate Reset cache for both v2 and v1 model runner #42759

Merged

mgoin mentioned this pull request May 15, 2026

[Model Runner v2] Support update_config #42783

Merged

njhill added the v2 label May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model Runner v2] Support reload weights (sleep mode)#42673

[Model Runner v2] Support reload weights (sleep mode)#42673
njhill merged 3 commits into
mainfrom
wentao-mrv2-reload-weights

yewentao256 commented May 14, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

njhill commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yewentao256 commented May 14, 2026

Purpose

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

njhill commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants