Skip to content

[Refactor] Move FusedMoE hidden_size roundup to quant_method#34285

Merged
vllm-bot merged 8 commits intovllm-project:mainfrom
BowenBao:bowenbao/move_mxfp4_moe_roundup
Mar 27, 2026
Merged

[Refactor] Move FusedMoE hidden_size roundup to quant_method#34285
vllm-bot merged 8 commits intovllm-project:mainfrom
BowenBao:bowenbao/move_mxfp4_moe_roundup

Conversation

@BowenBao
Copy link
Copy Markdown
Contributor

@BowenBao BowenBao commented Feb 10, 2026

  • Refactor hidden_size and intermediate roundup logic to be handled by QuantMethod.
  • Store padded and unpadded sizes under MoeConfig.
  • Update ROCm padding logic to improve performance on mi300x. Thanks for suggestion and evaluation from @Rohan138. moved to separate pr {ROCm]: gpt-oss fusion/padding fixes #38043
  • Enable Quark MXFP4 MoE with aiter backend running with padded intermediate_size.

@BowenBao BowenBao changed the title Refactor FusedMoE hidden_size roundup [Refactor] Move FusedMoE hidden_size roundup to quant_method Feb 10, 2026
@BowenBao
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the logic for rounding up the hidden_size in FusedMoE layers, moving the responsibility from the generic FusedMoE layer to the specific quantization methods. This is a good architectural improvement. My main feedback is about code duplication and a potential bug in QuarkOCP_MX_MoEMethod where the roundup logic is applied unconditionally for gpt_oss models, even for non-MXFP4 quantization types.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the logic for rounding up the hidden size in FusedMoE layers by moving it from the generic layer.py to the specific quant_method implementations. This is a good architectural improvement, as it places quantization-specific logic where it belongs. The changes in fused_moe_method_base.py and layer.py are correct. However, this refactoring has introduced code duplication in mxfp4.py and quark_moe.py for handling gpt_oss models. I've added comments with suggestions to address this.

@BowenBao
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the hidden_size roundup logic for FusedMoE layers by moving it into the quant_method. This is a good architectural improvement as it localizes quantization-specific logic. The changes are well-structured. I've found one issue where a function is called with incorrect arguments, which I've detailed in a specific comment.

@mergify
Copy link
Copy Markdown

mergify bot commented Feb 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @BowenBao.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 11, 2026
@Rohan138
Copy link
Copy Markdown
Contributor

FYI #32307 might be relevant; I'm not sure what the pad size for gpt-oss on MI300 should be i.e. 128 or 256. Needs further investigation, haven't had time to run proper perf unfortunately.

@BowenBao
Copy link
Copy Markdown
Contributor Author

/gemini review

@mergify mergify bot removed the needs-rebase label Feb 16, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the hidden_size and intermediate_size rounding logic in the FusedMoE layer by moving it into the quant_method. This is a significant improvement in maintainability as it centralizes quantization-specific alignment requirements (especially for MXFP4 backends) within the quantization methods themselves, rather than having brittle model-type checks in the core layer logic. The changes ensure that both the moe_config and the actual weight tensors are created with consistent, correctly padded dimensions across different hardware platforms and quantization schemes.

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

this is a nice simplification. I wonder if we can go even further by making the layer just unaware of the hidden size / intermediate size? WDYT?

@BowenBao
Copy link
Copy Markdown
Contributor Author

I think should be do-able, see #34285 (comment), unless there are other use cases of layer.hidden_size that I'm unaware of.

@mergify
Copy link
Copy Markdown

mergify bot commented Feb 24, 2026

Hi @BowenBao, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@Rohan138 Rohan138 mentioned this pull request Feb 24, 2026
5 tasks
@BowenBao BowenBao force-pushed the bowenbao/move_mxfp4_moe_roundup branch from daa5f22 to 7a71c20 Compare March 26, 2026 00:28
@mergify mergify bot removed the needs-rebase label Mar 26, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 26, 2026

Hi @BowenBao, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Bowen Bao <bowenbao@amd.com>
@BowenBao
Copy link
Copy Markdown
Contributor Author

BowenBao commented Mar 26, 2026

On latest commit

MiniMax TP4 GSM8K Results:

  • flexible-extract: 90.98%
  • strict-match: 91.05%

gpt-oss-120b TP4 GPQA: 65.72%
gpt-oss-120b-w-mxfp4-a-fp8 TP4 GPQA: 66.04%

If no more concerns, @tjtanaa could you approve the PR?

Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Bowen Bao <bowenbao@amd.com>
@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

Kernels (B200) is a known test failure as of 3/25 nightly: https://buildkite.com/vllm/ci/builds/58103/steps/canvas?sid=019d26cd-7ecd-40ff-b3d4-b07dadd7578b&tab=output

Copy link
Copy Markdown
Collaborator

@gshtras gshtras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The padding change moved to its own PR at #38043
Could you please remove it here to avoid conflicts. I think it's GTG otherwise

Signed-off-by: Bowen Bao <bowenbao@amd.com>
@BowenBao
Copy link
Copy Markdown
Contributor Author

@gshtras done

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Mar 26, 2026
@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Mar 26, 2026
@vllm-bot vllm-bot merged commit 0ae89f1 into vllm-project:main Mar 27, 2026
66 of 71 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Mar 27, 2026
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Mar 27, 2026
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
…oject#34285)

Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026
…oject#34285)

Signed-off-by: Bowen Bao <bowenbao@amd.com>

Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026
…oject#34285)

Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
…oject#34285)

Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
bhargav-patel-29 pushed a commit to Bharatgen-Tech/vllm that referenced this pull request Apr 1, 2026
…oject#34285)

Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026
rishitdholakia13 pushed a commit to rishitdholakia13/vllm that referenced this pull request Apr 7, 2026
…oject#34285)

Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026
…oject#34285)

Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Rishi Puri <riship@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build gpt-oss Related to GPT-OSS models nvidia ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.