[NVFP4] NVFP4 MOE emulation fallback for H100/MI300/MI350, standardize TritonExperts usage for OCP MX emulation#35737
Merged
vllm-bot merged 65 commits intoApr 22, 2026
Conversation
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
…vfp4-simulation-support-moe
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces support for NVFP4 MOE models on a wider range of hardware, including AMD Instinct, Nvidia Ampere, and Hopper, through an emulation backend. The changes are extensive, touching quantization layers, model execution, and tests to accommodate this new emulation path. The implementation appears solid and well-integrated. I've found one critical issue that needs to be addressed.
fxmarty-amd
commented
Mar 2, 2026
fxmarty-amd
commented
Mar 2, 2026
fxmarty-amd
commented
Mar 2, 2026
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
mgoin
reviewed
Apr 16, 2026
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Contributor
|
Hi @fxmarty-amd, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
mgoin
approved these changes
Apr 17, 2026
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Contributor
Author
|
Failing tests are: which look unrelated? |
baonudesifeizhai
pushed a commit
to baonudesifeizhai/vllm
that referenced
this pull request
Apr 23, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
yzong-rh
pushed a commit
to yzong-rh/vllm
that referenced
this pull request
Apr 23, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Yifan <yzong@redhat.com>
avinashsingh77
pushed a commit
to avinashsingh77/vllm
that referenced
this pull request
Apr 27, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Lafunamor
pushed a commit
to Lafunamor/vllm
that referenced
this pull request
May 1, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Adrian <info@zzit.ch>
Copilot AI
pushed a commit
to hongbolv/vllm
that referenced
this pull request
May 7, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
weifang231
pushed a commit
to weifang231/eb-vllm
that referenced
this pull request
May 13, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
my-other-github-account
pushed a commit
to my-other-github-account/vllm
that referenced
this pull request
May 15, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
my-other-github-account
pushed a commit
to my-other-github-account/vllm
that referenced
this pull request
May 15, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
mfylcek
pushed a commit
to mfylcek/vllm
that referenced
this pull request
May 19, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
jhu960213
pushed a commit
to jhu960213/vllm
that referenced
this pull request
May 20, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
brian-dellabetta
pushed a commit
to neuralmagic/vllm
that referenced
this pull request
May 29, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
mvanhorn
pushed a commit
to mvanhorn/vllm
that referenced
this pull request
Jun 4, 2026
…e `TritonExperts` usage for OCP MX emulation (vllm-project#35737) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR enables running NVFP4 MOE models on Hopper and AMD Instinct MI300, MI350.
This is useful for researchers, anybody trying out microscaling formats, and people who would like to run e.g. https://huggingface.co/nvidia/Qwen3-30B-A3B-NVFP4 or https://huggingface.co/RedHatAI/Qwen3-30B-A3B-NVFP4 on non-Blackwell devices.
This PR also refactors
quark_moe.pyto stop using the functionalfused_expertsfunction for OCP MX quantization emulation, and instead purely rely onTritonExperts.Test Plan
See
CUDA_VISIBLE_DEVICES="0,1" pytest -s -v tests/evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-mi3xx.txtrunning on AMD Instinct MI325X (MXFP4 & NVFP4 emulation fallback) and passing.pytest tests/models/quantization/test_nvfp4.py -s -vvvvv -k "test_nvfp4_moe"running on MI355X.CUDA_VISIBLE_DEVICES="6,7" pytest tests/quantization/test_quark.py -s -vvvvv -k "test_ocp_mx_wikitext_correctness"(testing MXFP4/MXFP6 Qwen MOE emulation)And see as of 1e1d139
giving:
And
gives