[MoE] Move triton experts to fused_moe/experts/ by bnellnm · Pull Request #41976 · vllm-project/vllm

bnellnm · 2026-05-07T16:08:43Z

Purpose

Extract TritonExperts and TritonWNA16Experts from fused_moe.py into a new experts/triton_moe.py module. Update all references across the codebase (source, tests, C++ comment, docs).

Forked from #40570

cc: @Jackmin801 , @robertgshaw2-redhat

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Extract TritonExperts and TritonWNA16Experts from fused_moe.py into a new experts/triton_moe.py module. Update all references across the codebase (source, tests, C++ comment, docs). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Jackmin801 <ongjackm@gmail.com>

Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>

…experts Signed-off-by: Jackmin801 <ongjackm@gmail.com> # Conflicts: # vllm/model_executor/layers/fused_moe/__init__.py

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>

…experts Signed-off-by: Jackmin801 <ongjackm@gmail.com> # Conflicts: # vllm/lora/layers/fused_moe.py # vllm/model_executor/layers/fused_moe/fused_moe.py

…perts Signed-off-by: Bill Nell <bnell@redhat.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

mergify · 2026-05-07T16:09:58Z

Documentation preview: https://vllm--41976.org.readthedocs.build/en/41976/

gemini-code-assist

Code Review

This pull request refactors the MoE implementation by moving the Triton-based expert classes, TritonExperts and TritonWNA16Experts, from fused_moe.py to a new dedicated module, experts/triton_moe.py. All associated imports, tests, and documentation have been updated to reflect this change. Feedback highlights a critical circular dependency introduced in the new module; it is recommended to move shared utility functions to a common file and consolidate Triton-specific kernels within triton_moe.py to ensure a clean dependency graph.

gemini-code-assist · 2026-05-07T16:12:07Z

+from vllm.model_executor.layers.fused_moe.fused_moe import (
+    _prepare_expert_assignment,
+    invoke_fused_moe_triton_kernel,
+    invoke_fused_moe_wna16_triton_kernel,
+    try_get_optimal_moe_config,
+)


This import from vllm.model_executor.layers.fused_moe.fused_moe creates a circular dependency at the package level. The vllm.model_executor.layers.fused_moe package's __init__.py imports this file (triton_moe.py), which in turn imports fused_moe.py from the same package. While this might not break immediately due to Python's import caching, it is fragile and can lead to ImportError in the future if dependencies change.

To resolve this, I recommend a more complete refactoring to break the cycle:

Move generic helpers: Functions like _prepare_expert_assignment and try_get_optimal_moe_config are used by both fused_moe.py and triton_moe.py. They could be moved to a shared utility file (e.g., vllm/model_executor/layers/fused_moe/utils.py).

Centralize Triton code: Move the Triton-specific kernels (fused_moe_kernel, fused_moe_kernel_gptq_awq) and their invoker functions (invoke_fused_moe_triton_kernel, invoke_fused_moe_wna16_triton_kernel) from fused_moe.py into this file (triton_moe.py). This would consolidate all Triton-related MoE code in one place.

Update imports: The fused_experts_impl function in fused_moe.py (which appears to be a legacy entry point) can then import the necessary Triton kernel invokers from this file.

This will result in a cleaner dependency graph where fused_moe.py depends on triton_moe.py, but not vice-versa, thus breaking the circular dependency.

bnellnm · 2026-05-08T20:20:05Z

Combined into one PR #41979

Jackmin801 and others added 6 commits April 22, 2026 02:34

updated

90a3a4b

Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>

Merge remote-tracking branch 'upstream/main' into move-triton-moe-to-…

431e7a4

…experts Signed-off-by: Jackmin801 <ongjackm@gmail.com> # Conflicts: # vllm/model_executor/layers/fused_moe/__init__.py

Merge branch 'main' into move-triton-moe-to-experts

c195ac0

Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>

Merge remote-tracking branch 'upstream/main' into move-triton-moe-to-…

7cbeba6

…experts Signed-off-by: Jackmin801 <ongjackm@gmail.com> # Conflicts: # vllm/lora/layers/fused_moe.py # vllm/model_executor/layers/fused_moe/fused_moe.py

Merge remote-tracking branch 'origin/main' into move-triton-moe-to-ex…

b51d152

…perts Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm requested review from WoosukKwon, mgoin, pavanimajety, tlrmchlsmth and yewentao256 as code owners May 7, 2026 16:08

claude Bot reviewed May 7, 2026

View reviewed changes

mergify Bot added documentation Improvements or additions to documentation nvidia labels May 7, 2026

github-project-automation Bot added this to NVIDIA May 7, 2026

bnellnm changed the title ~~Move triton moe to experts~~ [MoE] Move triton experts to fused_moe/experts/ May 7, 2026

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

yewentao256 mentioned this pull request May 7, 2026

[MoE] Move various experts classes to fused_moe/experts/ #41979

Merged

4 tasks

bnellnm closed this May 8, 2026

github-project-automation Bot moved this to Done in NVIDIA May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE] Move triton experts to fused_moe/experts/#41976

[MoE] Move triton experts to fused_moe/experts/#41976
bnellnm wants to merge 6 commits into
vllm-project:mainfrom
neuralmagic:move-triton-moe-to-experts

bnellnm commented May 7, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

mergify Bot commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

bnellnm commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bnellnm commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

mergify Bot commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

bnellnm commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bnellnm commented May 7, 2026 •

edited

Loading