[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) by LucasWilkinson · Pull Request #30910 · vllm-project/vllm

LucasWilkinson · 2025-12-17T23:28:28Z

Partially revert #29558 as this broke H200 tests

https://buildkite.com/vllm/ci/builds/43863#019b29e9-5c1b-4eff-83f7-c8304f774aa7

i.e.

VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1 --dp-size=2 --max-model-len 2048

There seems to be multiple issues here so we will try to follow up with a proper fix to restore the PIECEWISE CG support,

torch.compile does not support Size as output meaning the pattern:

orig_shape = hidden_states.shape
...
final_hidden_states = self.experts(              <=== Splitting op!!!!
   hidden_states=hidden_states, router_logits=router_logits
)
...
return final_hidden_states.view(orig_shape)

breaks torch.compile and is common in many MoE model definitions

doing:

final_hidden_states = self.experts(              <=== Splitting op!!!!
   hidden_states=hidden_states, router_logits=router_logits
)
...
orig_shape = hidden_states.shape
return final_hidden_states.view(orig_shape)

can fix this but requires updating all the MoE definitions (and creates a footgun)

the outputs do not seem to have consistent addresses leading to garbage outputs

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

gemini-code-assist

Code Review

This pull request correctly addresses a bug introduced in a previous change by partially reverting it. The original change, which enabled piecewise CUDA graphs for the DeepEP high-throughput backend, caused issues with H200 tests. The fix is to disable CUDA graphs entirely for this specific configuration (deepep_high_throughput with data parallelism > 1), which is a safe and effective solution. The corresponding tests for the reverted feature have also been removed. The changes are clear and well-justified. I have one minor suggestion to fix a typo in a log message for better clarity.

vllm/config/compilation.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

mergify · 2025-12-17T23:34:34Z

Hi @LucasWilkinson, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

yewentao256 · 2025-12-18T01:55:16Z

#30914
Another fix

yewentao256

Thanks for catching this! We can land this first to unblock CI, and I can fix this issue thoroughly in #30914 later

…30910) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit 30bb19a)

…upport) (#30910)" This reverts commit 30bb19a. Signed-off-by: yewentao256 <zhyanwentao@126.com>

…CG support) (vllm-project#30910) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…CG support) (vllm-project#30910) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

…CG support) (vllm-project#30910) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…CG support) (vllm-project#30910) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…CG support) (vllm-project#30910) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

LucasWilkinson added 2 commits December 17, 2025 23:20

partial revert

18986f4

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

wip

ef34bf9

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners December 17, 2025 23:28

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

vllm/config/compilation.py Outdated Show resolved Hide resolved

Update vllm/config/compilation.py

417e0dc

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 17, 2025

remove added

9388297

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson changed the title ~~[BugFix] Partial revert of https://github.com/vllm-project/vllm/pull/29558~~ [BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) Dec 17, 2025

cleanup

ee7bd13

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

BoyuanFeng approved these changes Dec 17, 2025

View reviewed changes

Merge branch 'main' into lwilkinson/partial-revert

4f1fca1

yewentao256 approved these changes Dec 18, 2025

View reviewed changes

khluu merged commit 30bb19a into vllm-project:main Dec 18, 2025
48 checks passed

yewentao256 deleted the lwilkinson/partial-revert branch December 18, 2025 14:45

yewentao256 added a commit that referenced this pull request Dec 18, 2025

Revert "[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG s…

80a35c0

…upport) (#30910)" This reverts commit 30bb19a. Signed-off-by: yewentao256 <zhyanwentao@126.com>

BoyuanFeng mentioned this pull request Dec 22, 2025

[CI Failure]: distributed-tests-h200 #30889

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support)#30910

[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support)#30910
khluu merged 6 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/partial-revert

LucasWilkinson commented Dec 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

yewentao256 commented Dec 18, 2025

Uh oh!

yewentao256 left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

LucasWilkinson commented Dec 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

yewentao256 commented Dec 18, 2025

Uh oh!

yewentao256 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LucasWilkinson commented Dec 17, 2025 •

edited by github-actions bot

Loading

yewentao256 left a comment •

edited

Loading