Skip to content

[MoE] Migrate W4A8 CT to Oracle Structure#39197

Open
robertgshaw2-redhat wants to merge 3 commits into
vllm-project:mainfrom
robertgshaw2-redhat:claude/refactor-moe-oracle-selection-QBXyi
Open

[MoE] Migrate W4A8 CT to Oracle Structure#39197
robertgshaw2-redhat wants to merge 3 commits into
vllm-project:mainfrom
robertgshaw2-redhat:claude/refactor-moe-oracle-selection-QBXyi

Conversation

@robertgshaw2-redhat

Copy link
Copy Markdown
Collaborator

Purpose

This PR refactors the W4A8 MoE quantization method to use the new modular kernel oracle pattern, improving code organization and maintainability. The changes extract W4A8-specific logic into a dedicated oracle module (w4a8.py) and simplify the main quantization method class by delegating kernel construction and weight processing to reusable helper functions.

Key improvements:

  • Extract W4A8 backend selection, kernel creation, and weight conversion logic into vllm/model_executor/layers/fused_moe/oracle/w4a8.py
  • Simplify CompressedTensorsW4A8Fp8MoEMethod by removing low-level kernel construction details
  • Consolidate stride computation and weight processing into dedicated functions
  • Update CutlassExpertsW4A8Fp8 to compute strides internally from config dimensions instead of accepting them as parameters
  • Align W4A8 MoE with the modular kernel architecture used by other MoE implementations

Test Plan

Existing unit tests for W4A8 MoE quantization should pass. The refactoring maintains functional equivalence while reorganizing code structure. CI tests will verify:

  • W4A8 MoE weight loading and processing
  • Kernel selection and initialization
  • Forward pass execution with quantized weights

Test Result

N/A - This is a refactoring that maintains functional equivalence. Existing test coverage validates the changes.

https://claude.ai/code/session_017178oZ2UoCasfwjjB3zmdR

Move kernel selection and weight conversion logic for W4A8 FP8 MoE
into a dedicated oracle module, matching the pattern established by
the FP8 and NvFP4 oracles. This centralizes backend selection,
weight format conversion, quant config creation, and modular kernel
construction in oracle/w4a8.py.

- Create vllm/model_executor/layers/fused_moe/oracle/w4a8.py with
  select_w4a8_moe_backend, convert_to_w4a8_moe_kernel_format,
  make_w4a8_moe_quant_config, and make_w4a8_moe_kernel
- Update CutlassExpertsW4A8Fp8 to compute strides from moe_config
  and implement _supports_* methods for oracle compatibility
- Refactor CompressedTensorsW4A8Fp8MoEMethod to use oracle functions
  and delegate to moe_kernel.apply()

Co-authored-by: Claude

https://claude.ai/code/session_017178oZ2UoCasfwjjB3zmdR
@robertgshaw2-redhat robertgshaw2-redhat changed the title Refactor W4A8 MoE to use modular kernel oracle pattern [MoE] Refactor W4A8 MoE to use modular kernel oracle pattern Apr 7, 2026
@robertgshaw2-redhat robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 7, 2026
@mergify mergify Bot added the nvidia label Apr 7, 2026
@mergify

mergify Bot commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Hi @robertgshaw2-redhat, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the W4A8 MoE implementation to use a modular kernel architecture. It introduces a new oracle module for W4A8 to centralize weight conversion, reordering for CUTLASS, and kernel configuration. The CutlassExpertsW4A8Fp8 class now computes strides internally and implements various support check methods. CompressedTensorsW4A8Fp8MoEMethod has been updated to leverage these modular components, simplifying its weight processing and application logic. Feedback suggests removing the maybe_make_prepare_finalize method instead of raising a ValueError to maintain a cleaner interface.

Comment on lines +193 to +196
raise ValueError(
f"{self.__class__.__name__} uses the new modular kernel initialization "
"logic. This function should not be called."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The maybe_make_prepare_finalize method raises a ValueError to prevent its usage. If this method is no longer intended to be used, it should be removed entirely from the class to avoid confusion and maintain a cleaner interface.

@robertgshaw2-redhat robertgshaw2-redhat changed the title [MoE] Refactor W4A8 MoE to use modular kernel oracle pattern [MoE] Migrate W4A8 CT to Oracle Structure Apr 7, 2026
Pre-register w13_weight_chan_scale and w2_weight_chan_scale on the
layer in create_weights so all parameter registrations live on the
layer. The oracle's convert_to_w4a8_moe_kernel_format now uses
replace_parameter to update them after load-time computation.

Co-authored-by: Claude

https://claude.ai/code/session_017178oZ2UoCasfwjjB3zmdR
@mergify

mergify Bot commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Hi @robertgshaw2-redhat, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
@mergify

mergify Bot commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Hi @robertgshaw2-redhat, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@mergify

mergify Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants