[BugFix] Fix initialization of draft model. by halyavin · Pull Request #29319 · vllm-project/vllm

halyavin · 2025-11-24T15:45:59Z

This initialization is needed to make MTP for DeepSeek V3 work with high-throughput backend.

DeepSeek V3 draft model also has a MoE layer. The method DeviceCommunicationBase.prepare_communication_buffer_for_model calls FusedMoEMethodBase.init_prepare_finalize method which in turn sets fused_experts field of the layer. During MTP calculation without this field FusedMoE.forward_impl method sees that using_modular_kernel property is false and sets do_naive_dispatch_combine flag and as a consequence calls get_ep_group().dispatch(). But dispatch method is not implemented in DeepEPHTAll2AllManager class which throws an exception.

Calling prepare_communication_buffer_for_model on a draft model makes this exception go away and makes MTP working.

…ulations. Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>

gemini-code-assist

Code Review

This pull request addresses a bug related to the initialization of draft models with Mixture of Experts (MoE) layers, ensuring compatibility with the high-throughput backend. The fix involves correctly calling prepare_communication_buffer_for_model on the draft model. My review includes a suggestion to make the condition for this call more robust to prevent potential NoneType errors during model loading.

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>

tlrmchlsmth · 2025-11-25T15:04:07Z

vllm/v1/worker/gpu_model_runner.py

+        if (drafter := getattr(self, "drafter", None)) and (drafter_model := getattr(drafter, 'model', None)):
+            prepare_communication_buffer_for_model(drafter_model)


I think we need a way to share the All2All state between self.model and the drafter -- this may duplicate state

@bnellnm @varun-sundar-rabindranath could you take a look?

From an offline conversation - the DeepEP buffers will be cached, so this won't involve any extra state for those All2All backends. might not be the case for the FlashInfer All2Alls -- @pavanimajety would you know if this causes any overhead in that case?

I think we should go ahead and land this for now to get MTP working on main

We cache the all2all handles here for deepep high-throughput here

vllm/vllm/distributed/device_communicators/all2all.py

Line 370 in 0353d2e

def get_handle(self, kwargs):

This however, hashes on some model and DP/EP properties like hidden_size, num_local_experts and num_global_experts. This means that if the draft model's properties differ from the base-model, which is likely, we will create a new all2all handle. But this is necessary.

I think this is good to land to unbreak main .

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

tlrmchlsmth · 2025-11-25T16:10:50Z

Thanks for the fix @halyavin !

Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Initialize communication buffers that are needed for draft model calc…

71dc5ad

…ulations. Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>

mergify bot added the v1 label Nov 24, 2025

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

Fix review comments.

68cab0e

Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>

halyavin force-pushed the fix-mtp-comm branch from dc0f71d to 68cab0e Compare November 25, 2025 12:27

tlrmchlsmth reviewed Nov 25, 2025

View reviewed changes

fix precommit

5cc5dae

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 25, 2025

tlrmchlsmth enabled auto-merge (squash) November 25, 2025 16:10

tlrmchlsmth approved these changes Nov 25, 2025

View reviewed changes

tlrmchlsmth merged commit de75b0b into vllm-project:main Nov 25, 2025
46 of 47 checks passed

halyavin deleted the fix-mtp-comm branch November 26, 2025 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix initialization of draft model. #29319

[BugFix] Fix initialization of draft model. #29319
tlrmchlsmth merged 3 commits intovllm-project:mainfrom
halyavin:fix-mtp-comm

halyavin commented Nov 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

tlrmchlsmth Nov 25, 2025

Uh oh!

tlrmchlsmth Nov 25, 2025

Uh oh!

tlrmchlsmth Nov 25, 2025

Uh oh!

varun-sundar-rabindranath Nov 25, 2025

Uh oh!

tlrmchlsmth commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if (drafter := getattr(self, "drafter", None)) and (drafter_model := getattr(drafter, 'model', None)):
		prepare_communication_buffer_for_model(drafter_model)

Uh oh!

Conversation

halyavin commented Nov 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

tlrmchlsmth Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

halyavin commented Nov 24, 2025 •

edited by github-actions bot

Loading