[MoE] Migrate W4A8 CT to Oracle Structure by robertgshaw2-redhat · Pull Request #39197 · vllm-project/vllm

robertgshaw2-redhat · 2026-04-07T14:34:22Z

Purpose

This PR refactors the W4A8 MoE quantization method to use the new modular kernel oracle pattern, improving code organization and maintainability. The changes extract W4A8-specific logic into a dedicated oracle module (w4a8.py) and simplify the main quantization method class by delegating kernel construction and weight processing to reusable helper functions.

Key improvements:

Extract W4A8 backend selection, kernel creation, and weight conversion logic into vllm/model_executor/layers/fused_moe/oracle/w4a8.py
Simplify CompressedTensorsW4A8Fp8MoEMethod by removing low-level kernel construction details
Consolidate stride computation and weight processing into dedicated functions
Update CutlassExpertsW4A8Fp8 to compute strides internally from config dimensions instead of accepting them as parameters
Align W4A8 MoE with the modular kernel architecture used by other MoE implementations

Test Plan

Existing unit tests for W4A8 MoE quantization should pass. The refactoring maintains functional equivalence while reorganizing code structure. CI tests will verify:

W4A8 MoE weight loading and processing
Kernel selection and initialization
Forward pass execution with quantized weights

Test Result

N/A - This is a refactoring that maintains functional equivalence. Existing test coverage validates the changes.

https://claude.ai/code/session_017178oZ2UoCasfwjjB3zmdR

Move kernel selection and weight conversion logic for W4A8 FP8 MoE into a dedicated oracle module, matching the pattern established by the FP8 and NvFP4 oracles. This centralizes backend selection, weight format conversion, quant config creation, and modular kernel construction in oracle/w4a8.py. - Create vllm/model_executor/layers/fused_moe/oracle/w4a8.py with select_w4a8_moe_backend, convert_to_w4a8_moe_kernel_format, make_w4a8_moe_quant_config, and make_w4a8_moe_kernel - Update CutlassExpertsW4A8Fp8 to compute strides from moe_config and implement _supports_* methods for oracle compatibility - Refactor CompressedTensorsW4A8Fp8MoEMethod to use oracle functions and delegate to moe_kernel.apply() Co-authored-by: Claude https://claude.ai/code/session_017178oZ2UoCasfwjjB3zmdR

mergify · 2026-04-07T14:39:06Z

Hi @robertgshaw2-redhat, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

gemini-code-assist

Code Review

This pull request refactors the W4A8 MoE implementation to use a modular kernel architecture. It introduces a new oracle module for W4A8 to centralize weight conversion, reordering for CUTLASS, and kernel configuration. The CutlassExpertsW4A8Fp8 class now computes strides internally and implements various support check methods. CompressedTensorsW4A8Fp8MoEMethod has been updated to leverage these modular components, simplifying its weight processing and application logic. Feedback suggests removing the maybe_make_prepare_finalize method instead of raising a ValueError to maintain a cleaner interface.

gemini-code-assist · 2026-04-07T14:40:01Z

+        raise ValueError(
+            f"{self.__class__.__name__} uses the new modular kernel initialization "
+            "logic. This function should not be called."
+        )


The maybe_make_prepare_finalize method raises a ValueError to prevent its usage. If this method is no longer intended to be used, it should be removed entirely from the class to avoid confusion and maintain a cleaner interface.

Pre-register w13_weight_chan_scale and w2_weight_chan_scale on the layer in create_weights so all parameter registrations live on the layer. The oracle's convert_to_w4a8_moe_kernel_format now uses replace_parameter to update them after load-time computation. Co-authored-by: Claude https://claude.ai/code/session_017178oZ2UoCasfwjjB3zmdR

mergify · 2026-04-07T14:55:27Z

Hi @robertgshaw2-redhat, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>

mergify · 2026-04-07T17:36:22Z

Hi @robertgshaw2-redhat, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-05-23T08:03:04Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

robertgshaw2-redhat requested review from mgoin, pavanimajety, tlrmchlsmth and yewentao256 as code owners April 7, 2026 14:34

robertgshaw2-redhat changed the title ~~Refactor W4A8 MoE to use modular kernel oracle pattern~~ [MoE] Refactor W4A8 MoE to use modular kernel oracle pattern Apr 7, 2026

robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 7, 2026

mergify Bot added the nvidia label Apr 7, 2026

github-project-automation Bot added this to NVIDIA Apr 7, 2026

gemini-code-assist Bot reviewed Apr 7, 2026

View reviewed changes

robertgshaw2-redhat changed the title ~~[MoE] Refactor W4A8 MoE to use modular kernel oracle pattern~~ [MoE] Migrate W4A8 CT to Oracle Structure Apr 7, 2026

pre-commit

bd39c0a

Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>

bedeks mentioned this pull request May 18, 2026

[MoE] Migrate W4A8 CT to oracle kernel setup #42680

Merged

mergify Bot added the needs-rebase label May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE] Migrate W4A8 CT to Oracle Structure#39197

[MoE] Migrate W4A8 CT to Oracle Structure#39197
robertgshaw2-redhat wants to merge 3 commits into
vllm-project:mainfrom
robertgshaw2-redhat:claude/refactor-moe-oracle-selection-QBXyi

robertgshaw2-redhat commented Apr 7, 2026

Uh oh!

mergify Bot commented Apr 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Uh oh!

mergify Bot commented Apr 7, 2026

Uh oh!

mergify Bot commented Apr 7, 2026

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

robertgshaw2-redhat commented Apr 7, 2026

Purpose

Test Plan

Test Result

Uh oh!

mergify Bot commented Apr 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 7, 2026

Uh oh!

mergify Bot commented Apr 7, 2026

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants