Add pre-suffle weight for new aiter MoE support. by sogalin · Pull Request #12908 · sgl-project/sglang

sogalin · 2025-11-09T06:19:11Z

Motivation

This PR introduces a pre-shuffle step for MoE weights to improve runtime performance and memory access efficiency.

Modifications

Added shuffle_weight from aiter.ops.shuffle to pre-shuffle w13_weight and w2_weight with a (16, 16) granularity matching aiter kernel tile size for better locality and performance.

Accuracy Tests

Model: amd/DeepSeek-R1-MXFP4-Preview

python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000 --port 8000
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [01:50<00:00, 11.96it/s]
Accuracy: 0.946
Invalid: 0.000
Latency: 110.434 s
Output throughput: 1180.643 token/s

Benchmarking and Profiling

Machine: MI355 * 8 GPU
Docker Image: rocm/sgl-dev:v0.5.4.post3-rocm700-mi35x-20251106

Command:
SGLANG_USE_AITER=1 RCCL_MSCCL_ENABLE=0 SGLANG_INT4_WEIGHT=0 SGLANG_MOE_PADDING=1 SGLANG_USE_ROCM700A=1 SGLANG_SET_CPU_AFFINITY=1 SGLANG_ROCM_FUSED_DECODE_MLA=1 python3 -m sglang.launch_server --model-path /data2/deepseek-ai/DeepSeek-R1-MXFP4-Preview/ --tensor-parallel-size 8 --trust-remote-code --chunked-prefill-size 131072 --host 0.0.0.0 --port 8000 --log-requests --disable-radix-cache --mem-fraction-static 0.5 --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4

Prefill 1 8 16
Wo MoE Update 0.0109 0.6524 1.2924
With MoE Update 0.1071 0.6505 1.2965

Decode 1 8 16
Wo MoE Update 0.0109 0.0123 0.0139
With MoE Update 0.0103 0.0118 0.0133

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

HaiShaw · 2025-11-10T08:43:29Z

    )

 _is_hip = is_hip()
+_is_shuffle_moe = get_bool_env_var("AITER_MXFP4_MOE_SF") and _is_hip


suggest change to _is_shuffle_moe_mxfp4

HaiShaw · 2025-11-10T08:44:09Z

 logger = logging.getLogger(__name__)

 _is_hip = is_hip()
+_is_shuffle_moe = get_bool_env_var("AITER_MXFP4_MOE_SF") and _is_hip


change to _is_shuffle_moe_mxfp4

HaiShaw · 2025-11-10T16:58:00Z

@BowenBao @kkHuang-amd please have a review.

BowenBao · 2025-11-10T17:37:06Z

        layer.w2_weight_scale.data = w2_weight_scale.view(s0, s1, -1)

+        # Pre-shuffle weight
+        if _is_shuffle_moe_mxfp4:


@sogalin for my understanding, why would this not need any changes on the kernel call code down in the apply method?

@HaiShaw Would it make sense to keep shuffling as default since it has better perf?

@BowenBao it is set in docker ENV

sogalin added 2 commits November 8, 2025 04:20

Add pre-suffle weight for new aiter MoE support.

eb7516c

Add presuffle MoE support in SGL.

561de39

github-actions Bot added the amd label Nov 9, 2025

Fix lint

b3e72c6

sogalin marked this pull request as ready for review November 9, 2025 06:34

sogalin requested review from BBuf, ByronHsu, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, ch-wan and ispobock as code owners November 9, 2025 06:34

Fix conflict.

606f721

HaiShaw added the run-ci label Nov 10, 2025

HaiShaw and others added 2 commits November 9, 2025 22:53

Merge branch 'main' into preshuffle-moe

f7d8ef8

Add flag to control using preshuffle or not.

0bee34a

HaiShaw requested changes Nov 10, 2025

View reviewed changes

Rename the env to "_is_shuffle_moe_mxfp4".

14ccfbd

HaiShaw approved these changes Nov 10, 2025

View reviewed changes

Merge branch 'main' into preshuffle-moe

304cdd1

BowenBao reviewed Nov 10, 2025

View reviewed changes

hubertlu-tw and others added 2 commits November 10, 2025 09:56

Merge branch 'main' into preshuffle-moe

a6ba1ab

Merge branch 'main' into preshuffle-moe

a100d61

HaiShaw merged commit 661c1c9 into sgl-project:main Nov 10, 2025
10 of 36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pre-suffle weight for new aiter MoE support.#12908

Add pre-suffle weight for new aiter MoE support.#12908
HaiShaw merged 10 commits intosgl-project:mainfrom
sogalin:preshuffle-moe

sogalin commented Nov 9, 2025

Uh oh!

HaiShaw Nov 10, 2025

Uh oh!

HaiShaw Nov 10, 2025

Uh oh!

HaiShaw commented Nov 10, 2025

Uh oh!

BowenBao Nov 10, 2025

Uh oh!

HaiShaw Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sogalin commented Nov 9, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

HaiShaw Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

HaiShaw Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

HaiShaw commented Nov 10, 2025

Uh oh!

BowenBao Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

HaiShaw Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants