Skip to content

[Misc] [MXFP8] Drop sm100 mxfp8 warning#21881

Merged
b8zhong merged 6 commits intosgl-project:mainfrom
zianglih:mxfp8-sm100
Apr 11, 2026
Merged

[Misc] [MXFP8] Drop sm100 mxfp8 warning#21881
b8zhong merged 6 commits intosgl-project:mainfrom
zianglih:mxfp8-sm100

Conversation

@zianglih
Copy link
Copy Markdown
Contributor

@zianglih zianglih commented Apr 1, 2026

Motivation

@HumansAnd

sm100 mxfp8 now has optimized kernels. Drop the warning.

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@zianglih zianglih changed the title Drop sm100 mxfp8 warning [Misc] [MXFP8] Drop sm100 mxfp8 warning Apr 1, 2026
@zianglih
Copy link
Copy Markdown
Contributor Author

zianglih commented Apr 1, 2026

@b8zhong could you take a look? Thanks!

Copy link
Copy Markdown
Collaborator

@b8zhong b8zhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, could we change the default in this PR? (otherwise, it will still use Triton right)

@zianglih
Copy link
Copy Markdown
Contributor Author

zianglih commented Apr 2, 2026

MoE mxfp8 backend is already resolved to flashinfer_trtllm if not specified:

if self.quantization == "mxfp8":
if self.moe_runner_backend == "auto":
self.moe_runner_backend = "flashinfer_trtllm"
elif self.moe_runner_backend not in [
"cutlass",
"flashinfer_trtllm",
"flashinfer_trtllm_routed",
]:
logger.warning(
"mxfp8 quantization supports only cutlass, flashinfer_trtllm, "
"or flashinfer_trtllm_routed backends. "
f"Overriding {self.moe_runner_backend!r}."
)
self.moe_runner_backend = "flashinfer_trtllm"

@zianglih
Copy link
Copy Markdown
Contributor Author

zianglih commented Apr 2, 2026

Hi @b8zhong , I tried in 16091fb but it requires broader refactoring. trtllm backend requires weight shuffling + scaling factor swizzling, these code currently cannot be reached with auto backend.

@b8zhong
Copy link
Copy Markdown
Collaborator

b8zhong commented Apr 2, 2026

Sure sounds good, I think the dense gemm in MoE is very small of E2E anyway.

@b8zhong
Copy link
Copy Markdown
Collaborator

b8zhong commented Apr 2, 2026

/tag-and-rerun-ci again

@github-actions github-actions Bot added the run-ci label Apr 2, 2026
@b8zhong b8zhong enabled auto-merge (squash) April 3, 2026 20:05
@b8zhong b8zhong merged commit 78043d4 into sgl-project:main Apr 11, 2026
102 of 112 checks passed
LucQueen pushed a commit to LucQueen/sglang that referenced this pull request Apr 11, 2026
pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
@zianglih zianglih deleted the mxfp8-sm100 branch May 5, 2026 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants