Skip to content

Refactor Cutlass MoE runner integration#12023

Open
jonahbernard wants to merge 72 commits intosgl-project:mainfrom
jonahbernard:refactor/cutlass-moe-runner-integration
Open

Refactor Cutlass MoE runner integration#12023
jonahbernard wants to merge 72 commits intosgl-project:mainfrom
jonahbernard:refactor/cutlass-moe-runner-integration

Conversation

@jonahbernard
Copy link
Copy Markdown
Contributor

@jonahbernard jonahbernard commented Oct 23, 2025

Motivation

Refactor Cutlass MoE runner integration into cutlass.py per #8715

Modifications

Implemented CutlassRunnerInput, CutlassRunnerOutput, CutlassMoeQuantInfo, CutlassRunnerCore, pre_permute_standard_to_cutlass, and post_permute_cutlass_to_standard

Accuracy Tests

FP4:
test_deepseek_v3_fp4_cutlass_moe.py -v PASS

W4A8:
No DEEPEP:
TestMoERunner4GPU.test_moe_runner_cutlass_w4a8 PASS
test_cutlass_w4a8_moe.py PASS

DEEPEP_LL: TestMoERunner4GPU.test_moe_runner_cutlass_w4a8_deepep_ll PASS
DEEPEP_NORMAL: TestMoERunner4GPU.test_moe_runner_cutlass_w4a8_deepep_normal PASS

FP8:
TestMoERunner.test_moe_runner_cutlass_fp8 PASS
test_cutlass_moe.py PASS

Benchmarking and Profiling

N/A

Checklist

@ch-wan ch-wan mentioned this pull request Oct 23, 2025
66 tasks
@b8zhong b8zhong added the run-ci label Oct 28, 2025
@github-actions github-actions Bot added the quant LLM Quantization label Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackwell SM100/SM120 quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants