Support moe topk sigmoid kernel by rogeryoungh · Pull Request #13049 · sgl-project/sglang

rogeryoungh · 2025-11-11T05:50:07Z

Motivation

This PR introduces a topk_sigmoid CUDA kernel to support MiniMax-M2 that require sigmoid-based expert routing. Our previously workaround was to use grouped_topk with group_size=1.

Previous: 8 kernels were launched.

Now: Only 1 kernel is launched.

Modifications

Accuracy Tests

We have validated the correctness of this change on MiniMax-M2, achieving an accuracy of 0.93 on GSM8K and 0.803 on AIME2025.

Benchmarking and Profiling

Performance benchmarks on the MiniMax-M2 model confirm that this optimization improves overall throughput by approximately 10%.

+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|    |   max_concurrency |   input_throughput |   output_throughput |   mean_ttft_ms |   median_ttft_ms |   p99_ttft_ms |   mean_tpot_ms |   median_tpot_ms |   p99_tpot_ms |   per_user_throughput |
+====+===================+====================+=====================+================+==================+===============+================+==================+===============+=======================+
|  0 |             1.000 |            124.640 |              80.140 |        152.460 |          146.040 |       328.280 |         11.810 |           11.770 |        12.530 |                80.140 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  1 |            16.000 |            807.450 |             485.430 |        204.190 |          170.930 |       612.650 |         30.780 |           21.480 |       151.990 |                30.340 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+
|  2 |            64.000 |           1305.240 |             862.120 |        333.290 |          259.840 |      1286.980 |         67.360 |           35.460 |       291.610 |                13.471 |
+----+-------------------+--------------------+---------------------+----------------+------------------+---------------+----------------+------------------+---------------+-----------------------+

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

Signed-off-by: xuebi <xuebi@minimaxi.com>

gemini-code-assist · 2025-11-11T05:50:11Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

FlamingoPg · 2025-11-11T07:02:45Z

Cool

FlamingoPg · 2025-11-11T07:06:18Z

Thanks a lot, I have some question about achieving an accuracy of 0.93 on GSM8K and 0.803 on AIME2025. Regarding these results, is there a comparison before and after adding this kernel?

rogeryoungh · 2025-11-11T07:37:16Z

I ran the comparison against the lmsysorg/sglang:dev image.

On GSM8K, the accuracy was 0.9249 with the dev image and 0.9295 with this patch. For AIME2025, my measured accuracy is 0.803, while the official MiniMax-M2 model reports 0.78.

lm_eval --model local-completions \
    --model_args base_url=http://localhost:8000/v1/completions,tokenizer=/model,model=/model \
    --tasks gsm8k_cot  \
    --batch_size 128 \
    --num_fewshot 5

# topk_sigmoid
local-completions (base_url=http://localhost:8000/v1/completions,tokenizer=/model,model=/model), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 128
|  Tasks  |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot|      3|flexible-extract|     5|exact_match|↑  |0.9295|±  |0.0071|
|         |       |strict-match    |     5|exact_match|↑  |0.9158|±  |0.0076

# lmsysorg/sglang:dev
local-completions (base_url=http://localhost:8000/v1/completions,tokenizer=/model,model=/model), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 128
|  Tasks  |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot|      3|flexible-extract|     5|exact_match|↑  |0.9249|±  |0.0073|
|         |       |strict-match    |     5|exact_match|↑  |0.9113|±  |0.0078|

Signed-off-by: xuebi <xuebi@minimaxi.com>

BBuf · 2025-11-16T21:34:51Z

Please add a kernel micro benchmark, refer to https://github.com/sgl-project/sglang/tree/main/sgl-kernel/benchmark

rogeryoungh · 2025-11-17T13:04:40Z

I added micro-benchmarks, and below are the test results:

Details

python sgl-kernel/benchmark/bench_moe_topk_sigmoid.py
✅ Torch and SGLang topk_sigmoid implementations match
✅ Torch and SGLang topk_sigmoid implementations match
✅ Torch and SGLang topk_sigmoid implementations match
✅ Torch and SGLang topk_sigmoid implementations match
✅ Torch and SGLang topk_sigmoid implementations match
✅ Torch and SGLang topk_sigmoid implementations match
topk-sigmoid-performance:
     num_tokens  num_experts  topk      SGLang        Torch
0         128.0         32.0   1.0    1.606173    14.432703
1         128.0         32.0   2.0    1.979544    18.997198
2         128.0         32.0   4.0    2.210692    20.242306
3         128.0         32.0   8.0    3.096960    21.853833
4         128.0         64.0   1.0    1.695568    14.660788
5         128.0         64.0   2.0    2.082407    19.980444
6         128.0         64.0   4.0    2.433674    19.924131
7         128.0         64.0   8.0    3.535085    21.541494
8         128.0        128.0   1.0    1.832434    14.898603
9         128.0        128.0   2.0    2.137683    19.689093
10        128.0        128.0   4.0    2.564278    21.878372
11        128.0        128.0   8.0    3.888136    23.623317
12        128.0        256.0   1.0    2.091580    15.850833
13        128.0        256.0   2.0    2.345080    21.415036
14        128.0        256.0   4.0    2.740070    21.677535
15        128.0        256.0   8.0    4.214144    24.138816
16        128.0         12.0   1.0    3.363476    11.810906
17        128.0         12.0   2.0    4.086299    19.232911
18        128.0         12.0   4.0    5.196735    19.951743
19        128.0         12.0   8.0    7.932992    21.698050
20        128.0        512.0   1.0    3.917015    19.575853
21        128.0        512.0   2.0    4.752946    24.443684
22        128.0        512.0   4.0    6.529157    25.821924
23        128.0        512.0   8.0   10.555618    28.067657
24        512.0         32.0   1.0    1.638882    17.125492
25        512.0         32.0   2.0    1.999735    23.120132
26        512.0         32.0   4.0    2.247950    23.036299
27        512.0         32.0   8.0    3.127355    23.925367
28        512.0         64.0   1.0    1.740753    15.580275
29        512.0         64.0   2.0    2.179911    23.328847
30        512.0         64.0   4.0    2.509109    24.131689
31        512.0         64.0   8.0    3.622789    24.281412
32        512.0        128.0   1.0    2.239927    18.465516
33        512.0        128.0   2.0    2.488771    26.375712
34        512.0        128.0   4.0    2.808165    26.076528
35        512.0        128.0   8.0    4.503539    26.507244
36        512.0        256.0   1.0    2.379934    24.298310
37        512.0        256.0   2.0    2.612163    31.943957
38        512.0        256.0   4.0    3.031515    32.432928
39        512.0        256.0   8.0    4.867126    32.959052
40        512.0         12.0   1.0    4.009180    15.938749
41        512.0         12.0   2.0    4.951622    22.493005
42        512.0         12.0   4.0    6.883740    22.376390
43        512.0         12.0   8.0   10.963751    22.862425
44        512.0        512.0   1.0    4.885562    36.050339
45        512.0        512.0   2.0    6.499027    42.411182
46        512.0        512.0   4.0    9.952232    45.588691
47        512.0        512.0   8.0   18.722834    46.178278
48       1024.0         32.0   1.0    1.679261    18.880756
49       1024.0         32.0   2.0    2.042411    24.436893
50       1024.0         32.0   4.0    2.285168    23.827322
51       1024.0         32.0   8.0    3.163615    24.795837
52       1024.0         64.0   1.0    1.885573    19.821977
53       1024.0         64.0   2.0    2.291389    25.667794
54       1024.0         64.0   4.0    2.631438    26.108969
55       1024.0         64.0   8.0    3.871169    26.612751
56       1024.0        128.0   1.0    2.505455    25.683555
57       1024.0        128.0   2.0    2.839051    31.938334
58       1024.0        128.0   4.0    3.360038    32.290966
59       1024.0        128.0   8.0    5.098808    32.825842
60       1024.0        256.0   1.0    2.739490    36.194292
61       1024.0        256.0   2.0    3.116406    41.909735
62       1024.0        256.0   4.0    3.740577    44.474649
63       1024.0        256.0   8.0    5.827517    44.637636
64       1024.0         12.0   1.0    5.902994    18.819952
65       1024.0         12.0   2.0    7.453297    23.324255
66       1024.0         12.0   4.0   10.854953    23.461952
67       1024.0         12.0   8.0   18.510209    23.939933
68       1024.0        512.0   1.0    7.175567    58.191771
69       1024.0        512.0   2.0    9.891369    64.094858
70       1024.0        512.0   4.0   16.385534    66.769785
71       1024.0        512.0   8.0   32.351776    69.287520
72       2048.0         32.0   1.0    1.829207    21.068421
73       2048.0         32.0   2.0    2.278805    26.714469
74       2048.0         32.0   4.0    2.528243    27.339196
75       2048.0         32.0   8.0    3.466488    27.450389
76       2048.0         64.0   1.0    2.053316    25.465890
77       2048.0         64.0   2.0    2.552917    31.511882
78       2048.0         64.0   4.0    3.000597    31.757535
79       2048.0         64.0   8.0    4.563727    33.727956
80       2048.0        128.0   1.0    2.777304    35.636103
81       2048.0        128.0   2.0    3.413895    43.372510
82       2048.0        128.0   4.0    4.428421    43.499526
83       2048.0        128.0   8.0    7.382649    44.146279
84       2048.0        256.0   1.0    3.026781    52.738649
85       2048.0        256.0   2.0    3.894977    64.701569
86       2048.0        256.0   4.0    5.129523    67.221177
87       2048.0        256.0   8.0    8.545173    67.414403
88       2048.0         12.0   1.0    9.092818    18.424893
89       2048.0         12.0   2.0   12.036246    26.188230
90       2048.0         12.0   4.0   18.287509    26.539472
91       2048.0         12.0   8.0   31.996807    26.236377
92       2048.0        512.0   1.0   11.394274    93.481671
93       2048.0        512.0   2.0   16.339157   112.272819
94       2048.0        512.0   4.0   28.238972   111.998075
95       2048.0        512.0   8.0   57.703463   117.038518
96       4096.0         32.0   1.0    1.909912    24.866872
97       4096.0         32.0   2.0    2.409515    34.820380
98       4096.0         32.0   4.0    2.832310    34.684684
99       4096.0         32.0   8.0    4.041181    34.478628
100      4096.0         64.0   1.0    2.338186    36.132783
101      4096.0         64.0   2.0    3.062580    44.554263
102      4096.0         64.0   4.0    4.054049    45.675747
103      4096.0         64.0   8.0    6.600514    44.666798
104      4096.0        128.0   1.0    3.378300    53.901834
105      4096.0        128.0   2.0    4.670523    65.184914
106      4096.0        128.0   4.0    6.755106    68.121627
107      4096.0        128.0   8.0   11.770620    68.404000
108      4096.0        256.0   1.0    4.146615    94.787196
109      4096.0        256.0   2.0    5.715296   108.005861
110      4096.0        256.0   4.0    8.249024   112.639070
111      4096.0        256.0   8.0   14.230330   114.896427
112      4096.0         12.0   1.0   14.855152    22.875291
113      4096.0         12.0   2.0   20.332816    32.244339
114      4096.0         12.0   4.0   32.310028    33.572924
115      4096.0         12.0   8.0   58.242789    33.343229
116      4096.0        512.0   1.0   19.630381   157.883435
117      4096.0        512.0   2.0   29.160358   165.134546
118      4096.0        512.0   4.0   51.915062   166.312485
119      4096.0        512.0   8.0  108.370604   166.565762
120      8192.0         32.0   1.0    2.202498    31.234844
121      8192.0         32.0   2.0    2.978675    51.025664
122      8192.0         32.0   4.0    3.775391    49.964385
123      8192.0         32.0   8.0    5.863031    50.203381
124      8192.0         64.0   1.0    2.968514    57.305209
125      8192.0         64.0   2.0    4.251871    68.014053
126      8192.0         64.0   4.0    6.086570    69.243657
127      8192.0         64.0   8.0   10.536131    71.471305
128      8192.0        128.0   1.0    5.247310    93.226201
129      8192.0        128.0   2.0    7.593929   111.472821
130      8192.0        128.0   4.0   11.353238   117.777406
131      8192.0        128.0   8.0   20.460854   121.017956
132      8192.0        256.0   1.0    6.502393   172.724941
133      8192.0        256.0   2.0    9.149885   187.754140
134      8192.0        256.0   4.0   13.808958   208.527293
135      8192.0        256.0   8.0   24.761531   207.168873
136      8192.0         12.0   1.0   26.413523    33.157307
137      8192.0         12.0   2.0   37.361001    48.213042
138      8192.0         12.0   4.0   60.033314    47.779363
139      8192.0         12.0   8.0  110.748711    46.231309
140      8192.0        512.0   1.0   36.897926   280.690457
141      8192.0        512.0   2.0   55.704671   292.949430
142      8192.0        512.0   4.0  100.482323   293.560653
143      8192.0        512.0   8.0  212.114406   295.395413
144     16384.0         32.0   1.0    2.796739    48.714223
145     16384.0         32.0   2.0    3.976485    77.360224
146     16384.0         32.0   4.0    5.612998    79.568379
147     16384.0         32.0   8.0    9.495054    80.312729
148     16384.0         64.0   1.0    4.511705    94.415055
149     16384.0         64.0   2.0    6.764167   114.123263
150     16384.0         64.0   4.0   10.206392   125.649587
151     16384.0         64.0   8.0   18.366090   117.413979
152     16384.0        128.0   1.0    8.200936   166.785992
153     16384.0        128.0   2.0   12.688144   204.176349
154     16384.0        128.0   4.0   20.394364   203.288089
155     16384.0        128.0   8.0   37.700544   220.648766
156     16384.0        256.0   1.0   10.139636   330.431987
157     16384.0        256.0   2.0   15.292813   376.930517
158     16384.0        256.0   4.0   24.716366   384.972801
159     16384.0        256.0   8.0   45.595276   413.776336
160     16384.0         12.0   1.0   49.194846    34.271523
161     16384.0         12.0   2.0   70.368763    76.725981
162     16384.0         12.0   4.0  116.357078    74.920762
163     16384.0         12.0   8.0  215.335563    71.006188
164     16384.0        512.0   1.0   90.523115   543.311063
165     16384.0        512.0   2.0  125.340486   559.867136
166     16384.0        512.0   4.0  212.382089   562.431008
167     16384.0        512.0   8.0  432.574569   579.504967
168     32768.0         32.0   1.0    4.206019    96.088441
169     32768.0         32.0   2.0    6.262916   138.751874
170     32768.0         32.0   4.0    9.389842   136.735505
171     32768.0         32.0   8.0   16.656640   143.173997
172     32768.0         64.0   1.0    7.315950   176.322663
173     32768.0         64.0   2.0   11.443072   223.520559
174     32768.0         64.0   4.0   18.351364   225.893009
175     32768.0         64.0   8.0   33.928980   227.137949
176     32768.0        128.0   1.0   14.121456   289.869092
177     32768.0        128.0   2.0   22.853408   372.613327
178     32768.0        128.0   4.0   38.377109   414.211812
179     32768.0        128.0   8.0   72.507660   420.528702
180     32768.0        256.0   1.0   17.926437   663.191171
181     32768.0        256.0   2.0   27.857530   755.398127
182     32768.0        256.0   4.0   46.754758   780.346909
183     32768.0        256.0   8.0   87.830342   821.121382
184     32768.0         12.0   1.0   95.320391    75.039072
185     32768.0         12.0   2.0  138.106252   131.030634
186     32768.0         12.0   4.0  227.255174   122.265509
187     32768.0         12.0   8.0  425.842783   127.552931
188     32768.0        512.0   1.0  183.148865  1143.182137
189     32768.0        512.0   2.0  249.909328  1172.065973
190     32768.0        512.0   4.0  422.945727  1174.436033
191     32768.0        512.0   8.0  859.151301  1187.168956

* [model-gateway] update workflow names for gateway and exclude npu (sgl-project#13415) * [Tiny fix] Fix bench_speculative.py run bug (sgl-project#13416) * [model-gateway] Add Gateway Release Tooling (sgl-project#13420) * fix uneven PP layer indices (sgl-project#13282) Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> * diffusion: fix wan2.2 ti2v num_frames adjust logic (sgl-project#13379) Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> * [PD][bug fix] fix memleak when last_batch is none (sgl-project#13144) Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com> Co-authored-by: Shangming Cai <csmthu@gmail.com> * Fix cache_tokens calculate issue when retracted (sgl-project#11900) Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com> Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com> * [feature] Custom base path on FastAPI server (sgl-project#5879) Co-authored-by: lianhu.yin <lianhu.yin@nio.com> Co-authored-by: kebyn <kebyn@kebyn.cc> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com> * Adding user defined hooks support (sgl-project#13217) * Fix log time stats (sgl-project#13418) * [Ci tiny fix] Lower score threshold in evaluation test (sgl-project#13443) * diffusion: fix loading with local model_path (sgl-project#13445) * [2/N] CI refactor: sperate some backend-independent CPU tasks. (sgl-project#13447) * Temporarily disable model hooks CI (sgl-project#13450) * [Deepseek V3.2] Use torch.compile to speed up torch.cat in nsa (sgl-project#13022) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com> * Remove verbs from GET endpoint paths to follow REST standards (sgl-project#13273) * Add missing models (sgl-project#13456) * extend sagemaker.Dockerfile serve script to allow all sglang serve flags (sgl-project#13173) * Fix 8-gpu B200 nightly tests (sgl-project#13457) * Fixes validation errors for Wan-AI models which store model weights in subdirectories (sgl-project#13461) * [Embeddings Performance Testing] Add performance test for embedding models (sgl-project#12359) * [NVIDIA] Fix broken fp8 MoE of deepseek v3 (sgl-project#13264) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> * Temporarily comment out multimodal gen test to recover runners (sgl-project#13463) * Update pr-test.yml to fix invalid job name error * Add interface_v1 option for dynamic HiCache backend (sgl-project#13140) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> * Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 (sgl-project#13455) * fix MambaPool clear method after refactoring (sgl-project#13449) * [AMD CI] Update sgl-router python path in dockerfile. (sgl-project#13458) * [CI] re-enable test_vision_openai_server_a ci (sgl-project#13444) * Adding CI Monitor Improvements (sgl-project#13462) * [GLM4.6v] Required changes for bumping up to transformer 5.x (sgl-project#13229) * [GLM4.6v] Relax the constraint of non-user role chat completion message schema for new GLM-v release (sgl-project#13258) * [model-gateway] use worker startup time out for worker registration (sgl-project#13473) * model: support JetVLM (sgl-project#13289) * chore: add an unified server arg for multimodal inputs preprocess config(sgl-project#12149) Co-authored-by: bianfeng <bianfeng@pinduoduo.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> * [PD] Clarify init method docstrings for kvsender and kvreceiver (sgl-project#13476) * Fix lora test (sgl-project#13479) * [Piecewise CUDA Graph] Support ModelOpt FP8 (sgl-project#13094) * CI: fix NFS EBUSY error in PR test workflow (sgl-project#13460) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com> * [CI] fix triggered by a non-run-ci label (sgl-project#13393) * [CI] remove auto-labeling `run-ci` label. (sgl-project#13486) * fix: change performance log directory to cache path (sgl-project#13482) Co-authored-by: Mick <mickjagger19@icloud.com> * [CI] Add input for pr-gate (sgl-project#13491) * [opt kimi k2 3/n] opt kimi_k2 moe_fused_gate kernel (sgl-project#13374) * [CI] fix lint yml (syntax error) (sgl-project#13496) * [VLM][feat] Support encoder DP for Qwen2.5-VL (sgl-project#13126) Co-authored-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: liusy58 <xiehang.lsy@alibaba-inc.com> Co-authored-by: Yuan Luo <yuan.luo@hotmail.com> * [HiCache] Critical fix to host memory double free (sgl-project#13501) Co-authored-by: Hao Chen <cighao@gmail.com> * [BugFix] Accuracy and function Issue when run ptpc quant model (sgl-project#13157) Co-authored-by: yuechguo <yuechguo@amd.com> * fix: create git tags directly instead of temporary branches (sgl-project#13168) * Add .github/CI_PERMISSIONS.json to define the CI permissions (sgl-project#13509) Co-authored-by: sglang-bot <sglangbot@gmail.com> * README.md -> FOLDER_README.md (sgl-project#13510) Co-authored-by: sglang-bot <sglangbot@gmail.com> * Use slash command to trigger CI (sgl-project#13512) Co-authored-by: sglang-bot <sglangbot@gmail.com> * Add docs on trigger ci (sgl-project#13513) Co-authored-by: sglang-bot <sglangbot@gmail.com> * [Feature] Re:Enable hybrid mem saver (sgl-project#12962) * Trigger CI retry with edit (sgl-project#13516) Co-authored-by: sglang-bot <sglangbot@gmail.com> * Update docs (sgl-project#13519) Co-authored-by: sglang-bot <sglangbot@gmail.com> * Add /tag-and-rerun-ci (sgl-project#13521) * [CI] update pr-gate to be compatible with new slash triggering mananer. (sgl-project#13522) * [CI] fix skipping pr-gate on main (sgl-project#13525) * Small cleanups related to LoRA weight loading (sgl-project#13474) * [CI] fix CI skipped on main (sgl-project#13527) * [model-gateway] fix gateway docker build due to recent py code change (sgl-project#13532) * [model-gateway] limit opened files in docker build to fix edge case (sgl-project#13536) * [docker] fix dockerfile naming for diffusion (sgl-project#13534) * fix lora test (sgl-project#13537) * Remove jet-ai/Jet-Nemotron-2B in nightly text tests as this is constantly failing (sgl-project#13540) * [fix] Fixes accuracy issues caused by incorrect use of rope (sgl-project#13495) * Flashinfer TRTLLM-GEN-MoE + Qwen3 (sgl-project#13489) * [chore] Disable ccache for sgl-kernel release (sgl-project#13541) * Add Qwen/Qwen1.5-MoE-A2.7B to model list (sgl-project#13543) * [Fix] Fix DeepSeek V3 MTP on B200 (sgl-project#13548) * [router][grpc] Support num_reasoning_tokens in haromy models (sgl-project#13047) * [feat][Ascend][Mindspore]: support model-impl of mindspore (sgl-project#9234) * [AMD CI] Local cache fallback. (sgl-project#13452) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [CI] fix amd 1 gpu basic test (sgl-project#13551) * [Doc] Update HiCache and Mooncake docs & Mooncake Setup Error Checking (sgl-project#12740) * purge unnecessary env variable set in deterministic test (sgl-project#13481) * chore: bump sgl-kernel version to 0.3.17.post2 (sgl-project#13542) * Add `lmsys/gpt-oss-20b-bf16` to model validation check (sgl-project#13557) * CI Failure Monitor Improvements (sgl-project#13558) * [RL] Allow passing tensors of different dtypes for FlattenedTensorBucket (sgl-project#13413) * [CI] Fix CUDA workflow's dependency. (sgl-project#13568) * [NPU] Adapt pr-gate for pr-test workflow & workflows refresh (sgl-project#13567) * Tiny enhance test suites sanity check (sgl-project#13589) * [3/N] CI refactor: move some manually triggered tests. (sgl-project#13448) * Support moe topk sigmoid kernel (sgl-project#13049) Co-authored-by: xuebi <xuebi@minimaxi.com> * Expend compatibility check for all quantized MoE models (sgl-project#13465) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> * add https://github.com/netanel-haber to CI_PERMISSIONS.json (sgl-project#13577) * chore: bump sgl-kernel version to 0.3.17.post2 (sgl-project#13570) * [Auto Sync] Update base_grammar_backend.py, collector.py (20251116) (sgl-project#13357) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Sehoon Kim <sehoon@x.ai> * [GDN] Remove unnecessary contiguous() (sgl-project#13604) * [GDN] Remove unnecessary conv state clone (sgl-project#13603) * [VLM] Support Piecewise CUDA Graph for Qwen2.5-VL (sgl-project#13055) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Yuhao Yang <yhyang201@gmail.com> * [diffusion] CI: improve diffusion CI (sgl-project#13562) Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> * feat: support external custom models (sgl-project#13429) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com> Co-authored-by: Mick <mickjagger19@icloud.com> * [CI fix] Fix image download failures in VLM CI tests (sgl-project#13613) * [NVIDIA] Add fp8 gemm benchmark on blackwell (sgl-project#13528) * [UT] Destroy process group after broadcast to resolve port occupation issues in multi-server tests (sgl-project#12379) * [diffusion] refactor: remove PreprocessorConfig (sgl-project#13248) * [diffusion] refactor: refactor pipeline folders (sgl-project#13253) * Add FP32 dtype support for RoPE - Part2 (sgl-project#13328) * [diffusion] fix: remove multimodal_gen redundant get_bool_env_var func (sgl-project#13583) Co-authored-by: Mick <mickjagger19@icloud.com> * Add support for new aiter version (AR accuracy, is_shuffled PR) (sgl-project#13554) Co-authored-by: sogalin <39478626+sogalin@users.noreply.github.com> * diffusion: improve baseline performance monitor (sgl-project#13614) * [Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel) (sgl-project#13453) * [CI] Align metric units for CI rate limit (sgl-project#13633) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [ROCM] Optimized deepseek-r1 fp8 model with + triton_gemm_a8w8 + batch_gemm_a8w8 + fused set_mla_kv_buffer kernel (sgl-project#13617) Co-authored-by: root <root@smci355-ccs-aus-m12-17.cs-aus.dcgpu> Co-authored-by: jacky.cheng <yichiche@amd.com> * fix bench_speculative bug (sgl-project#13197) * Revert "[Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel)" (sgl-project#13644) * [CI] optimize CI workflow info (sgl-project#13634) * CI: Kill zombie diffusion processes in CI & minor code style fix on rotary embedding fallback (sgl-project#13637) * [CI] apply pr-gate for XPU (sgl-project#13663) * Add fused_rmsnorm_gated_cpu kernel for CPU to support Qwen3-Next (sgl-project#11577) * [10/n] decouple quantization impl from vllm dependency - fix import (sgl-project#13524) * Adding nightly tests as release guard for bot bump workflows (sgl-project#13655) * [DeepseekV3.2] Deepseek fp8 support for MHA path (sgl-project#12964) * Fix launch of `Olmo3` (sgl-project#13666) Signed-off-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com> * [Deepseek V3.2] Change indexer weights_proj to fp32 (sgl-project#13459) * enable csgmv automatically on cuda (sgl-project#13600) * Add nightly test CI monitor workflow (sgl-project#13038) * allow loras to be implicitly evicted and loaded based on max_loaded_loras (sgl-project#11526) * Test reorganization: Move tests to manual/ (sgl-project#13610) * [Piecewise CUDA Graph] Fix recompile issue for Mixtral and Grok2 (sgl-project#13667) Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Ke Bao <ISPObaoke@163.com> Co-authored-by: Oasis-Git <ayw.sirius19@gmail.com> * Super tiny remove unused MiniMaxM2MLP class (sgl-project#13659) * Update quantization.md with new model resources (sgl-project#13677) * [model-gateway] add both python and rust cli alias (sgl-project#13678) * [diffusion] CI: improve validation method (sgl-project#13627) * [model-gateway] fix gateway cli arg parser to not use = (sgl-project#13685) * [CI] Move nightly tests to test/nightly/ (sgl-project#13683) * [NVIDIA] Add cutedsl e2e test to GB200 CI (sgl-project#12672) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> * Add sgl-kernel CI test for Blackwell (B200) (sgl-project#13301) * remove unnecessary starvation check (sgl-project#13619) * Fix target MLA with eagle3 support for PD disaggregation (sgl-project#13555) Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com> Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com> * [kimi k2 thinking] Avoid useless torch.zeros_ (sgl-project#13596) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * [opt kimi k2 4 / n] Delete useless pad kernel in sgl_moe_align_block_size (sgl-project#13587) * [VLM] Support Piecewise CUDA Graph for InternVL (sgl-project#13640) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> * [Piecewise Cuda Graph] rename, refactor and add more logging (sgl-project#13675) Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Ke Bao <ISPObaoke@163.com> Co-authored-by: Oasis-Git <ayw.sirius19@gmail.com> * [difusion] CI: speed up multimodal_gen ci (sgl-project#13665) Co-authored-by: Mick <mickjagger19@icloud.com> * [diffusion] doc: minor update docs (sgl-project#13177) * Fix ZMQ bind error on non-zero rank nodes when using SGLANG_BLOCK_NONZERO_RANK_CHILDREN=0 (sgl-project#13686) * [diffusion] server: use meta to avoid Linear init for TextEncoder (sgl-project#13564) Co-authored-by: Mick <mickjagger19@icloud.com> * [Auto Sync] Update http_server.py, io_struct.py, scheduler_... (20251120) (sgl-project#13679) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Zhuqi Li <zhli@x.ai> * [Bugfix] Fix hidden state size in EAGLE PD disaggregation buffers (sgl-project#13590) Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com> Co-authored-by: Shangming Cai <csmthu@gmail.com> * [HiCache] fix unit test with changed new APIs (sgl-project#13498) * [Fix] Qwen3Next lmhead dtype (sgl-project#13708) * [NPU] chore: bump to CANN 8.3.RC1 and Pytorch 2.8.0 (sgl-project#13647) * [11/N] MoE Refactor: Simplifying SBO Implementation with Dispatcher Hooks (sgl-project#13327) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> * [Clean code] Compressed_tensors_moe code clean (sgl-project#13719) * [diffusion] profile: support performance metric dumping and comparison (sgl-project#13630) * [AMD] Enable fused shared expert append and flatten quant for fp8 deepseekR1 model (sgl-project#13705) Co-authored-by: yctseng0211 <yctseng@amd.com> * [diffusion] doc: add contributing.md (sgl-project#13649) * fix 3fs down, lock schedule main thread (sgl-project#13407) * Fix url: use https://roadmap.sglang.io for roadmap (sgl-project#13733) Co-authored-by: sglang-bot <sglangbot@gmail.com> * Super tiny delete unused files (sgl-project#13734) * [diffusion] log: minor improve logging (sgl-project#13735) * [CI] minor hot fix of model validation list (sgl-project#13737) * Add to ci permission (sgl-project#13739) * [Piecewise CUDA Graph] Support Kimi-K2 (non-Thinking) (sgl-project#13466) Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com> Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> * Fix: CI monitor should not exit with error on regressions (sgl-project#13694) * Revert "enable csgmv automatically on cuda" (sgl-project#13707) * Support torch 12.9 + DeepEP by removing custom nvshmem (sgl-project#12949) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> * add some more labels (sgl-project#13701) Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com> * Feat/nemotron nano v3 support (sgl-project#12690) * Fix global scaling factor loading hang (sgl-project#13484) * Fix B200 Nightly tests and move one manual test back to unit test to prevent the same issue (sgl-project#13746) * fix test_lora_update.py starvation message check (sgl-project#13702) * Fix model weights validation with automatic cache cleanup (sgl-project#13729) * [Auto Sync] Update evict_policy.py, radix_cache.py (20251120) (sgl-project#13669) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: cctry <shiyang@x.ai> * [Tiny] Renaming environ for NVFP4 dispatch (sgl-project#13756) * modularize gsm8k and mmmu test classes (sgl-project#13506) * Use dual stream for DS MoE whenever cuda graph is used (instead of with token threshold) (sgl-project#9405) * [Ascend] support Kimi-K2-Thinking (sgl-project#12759) Co-authored-by: ZhengdQin <zhengdqin@gmail.com> Co-authored-by: richhuan <huan_rz@qq.com> Co-authored-by: ZhengdQin <46387172+ZhengdQin@users.noreply.github.com> * Refactor eagle bigram key matching (sgl-project#13714) * [diffusion] fix: fix hunyuanvideo and add 2-gpu ci test (sgl-project#13720) Co-authored-by: Mick <mickjagger19@icloud.com> * Update mem checker during busy (sgl-project#13704) * Tiny support different prompts in `send_one.py` (sgl-project#13768) * [diffusion] refactor: refactor sampling params (sgl-project#13706) * [VLM] Replace torch.repeat_interleave with faster np.repeat for Qwen-VL series (sgl-project#13736) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> * [Spec v2] Remove `allocate_lens` and enable over-allocation (sgl-project#13478) * [diffusion] CI: tinyfix diffusion ci (sgl-project#13769) Co-authored-by: Mick <mickjagger19@icloud.com> * align code style eagle draft&draft_extend cuda graph runner (sgl-project#13533) * Refactor MHA & MLA KV caches to support FP4 (sgl-project#13547) Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com> * Move unnecessary input_addr capture under debug mode flag for speed-up (sgl-project#13690) * Gather static input buffers for cuda graph (sgl-project#13676) * Revert "Fix RMSNorm API CALL mismatch issue. (sgl-project#10032)" (sgl-project#13727) * [model-gateway] update smg code owner (sgl-project#13777) * [model-gateway] clean up router manager function order (sgl-project#13776) * Fix typo in docs (sgl-project#13709) * [Feature] HiCache JIT kernel (once again) (sgl-project#13764) * [DeepEP] Add SGLANG_DEEPEP_BF16_DISPATCH env var in Normal mode (sgl-project#13787) * Upgrade flashmla kernel for NSA tp support (sgl-project#13718) * [diffusion] feat: support sp for image models (sgl-project#13180) * [diffusion] CI: add run_suite to multimodal_gen CI (sgl-project#13791) * Fix pagination bug in CI monitor preventing performance-test-2-gpu data collection (sgl-project#13781) * [Scheduler] Tiny organize code style (sgl-project#13806) * [Deepseek] Refactor deepseek server_args _handle_model_specific_adjustments (sgl-project#13687) * [CI] Tiny refactoring sgl-kernel tests (sgl-project#13813) * Tune fp8_w8a8 fused triton moe for GLM-4.6-FP8 (sgl-project#13815) * make trtllm attn backend's init_forward_metadat non blocking (sgl-project#13802) * remove package json which is not used (sgl-project#13810) * [1/2] Refactor DeepGeem requant for FP8 Linear on Blackwell (sgl-project#13601) Co-authored-by: fy1214 * chore: bump sgl-kernel version to 0.3.18 (sgl-project#13816) * xgrammar up version to 0.1.27 (sgl-project#13650) * Fix bug: Incorrect variable used in rem_total_token_offset calculatio… (sgl-project#13201) * [Doc] Refine fused_moe_triton configs doc (sgl-project#13820) * Update MindSpore documentation (sgl-project#13656) Co-authored-by: wangtiance <tiancew@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Refactor cache init logic (sgl-project#13800) * [Bugfix] Add jit kernel files in packaging (sgl-project#13829) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Xu Yongfei <xuyongfei.xyf@antgroup.com> * [diffusion] doc: minor update contributing.md with test section (sgl-project#13792) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [misc] Rename minilb install env & remove files & fix lint (sgl-project#13831) * [diffusion] CI: send nightly-test outputs of diffusion to slack for correctness monitoring (sgl-project#13833) Co-authored-by: Mick <mickjagger19@icloud.com> * [chore]Upgrade flashinfer to 0.5.3 (sgl-project#13751) * [Intel XPU]support xgrammar backend for intel xpu (sgl-project#13245) * [sgl-kernel Code Clean] Remove useless lightning_attention kernel (sgl-project#13819) * [VLM] Revise InternVL Piecewise CUDA Graph Supporting (sgl-project#13846) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> * Fix TorchAO quant in VLM (sgl-project#13508) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com> * [Fix]: Adjust FutureMap's token_id_bufs Size to Prevent ChunkedPrefill's next_token_ids from Overwriting Previous Prefill Requests' next_token_id (sgl-project#13713) Signed-off-by: vito.yy <vito.yy@antgroup.com> * Fix: Safe RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads (sgl-project#11871) * [Fix] Fix uvloop get_event_loop() is not suitable for 0.22.x (sgl-project#13612) Signed-off-by: lzy <tomlzy213@gmail.com> Co-authored-by: lzy <tomlzy213@gmail.com> * Tiny unpin uvloop for other backends (sgl-project#13858) * [model-gateway] Refactor router e2e responses tests (sgl-project#13745) Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Simo Lin <linsimo.mark@gmail.com> * [Perf] Optimize DeepSeek-R1 w4afp8 glue kernels (sgl-project#10027) Co-authored-by: Fan Yin <1106310035@qq.com> * Fix quantized moe checker fail for Qwen3 dense fp8 model (sgl-project#13853) * [model-gateway] add grpc server code owner (sgl-project#13865) * [BugFix] fix outplace_fused_experts missing is_gated (sgl-project#13864) * fix xgrammar_backend crash with malformed inputs (sgl-project#13752) * [Auto Sync] Update schedule_batch.py, schedule_policy.py, b... (20251122) (sgl-project#13763) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com> Co-authored-by: Hanming Lu <hanming@x.ai> * [Doc] Add an Introduction to Expert Parallelism (sgl-project#13783) * add LoRA warning if loading a preexisting LoRA adapter with a different name (sgl-project#13822) * [NPU] Fix NPU CI (sgl-project#13834) Co-authored-by: c30031083 <chenxu140@huawei.com> * Overlap glm moe gemms in two cuda streams (sgl-project#13786) * [Performance] Replace preprocess_video logic from GLM multimodal processor with transformer impl for speed up (up to 27% faster) and addressing OOM (up to 50x improvements) (sgl-project#13487) * Add support for bf16 x bf16 cutlass fused MoE (sgl-project#10275) Co-authored-by: Sam Li <lsam@nvidia.com> Co-authored-by: jackeyhua <jackeyhuasjtu@gmail.com> * [Router bugfix] Fix router_manager selecting the wrong router when enable-igw. (sgl-project#13572) * Fix nightly test job to fail when any test fails (sgl-project#13871) * [diffusion] refactor: remove training-related code (sgl-project#13860) * [CI] fix multimodel-gen-test job (sgl-project#13874) * [diffusion] CI: add validation and cleanup for corrupted safetensors in multimodal loader (sgl-project#13870) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [CI] fix lint error (sgl-project#13891) * fix: draft model revision misuse model revision (sgl-project#11893) * Fix trace publish paths in nightly-test-nvidia workflow (sgl-project#13888) * Adding nightly tests for Kimi-K2-thinking, Qwen3, minimax-m2, GLM4.6 (sgl-project#13890) * [Fix] JIT kernel dependencies in other platforms (sgl-project#13889) * remove RoPE CPU fp32 tests (sgl-project#13827) Co-authored-by: Fan Yin <1106310035@qq.com> * Move test_dummy_grok_models.py from manual to srt (temporary) (sgl-project#13901) * [CI tiny fix] Enhance robustness of vision chunked prefill test with ROUGE-L metric (sgl-project#13793) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * update flashinfer_cubin==0.5.3 (sgl-project#13848) * fix * fix --------- Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com> Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com> Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com> Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Signed-off-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com> Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com> Signed-off-by: vito.yy <vito.yy@antgroup.com> Signed-off-by: lzy <tomlzy213@gmail.com> Co-authored-by: Simo Lin <linsimo.mark@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: AlphaBaby <fujianhao1997@qq.com> Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com> Co-authored-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Mike Qiu <qdy220091330@gmail.com> Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com> Co-authored-by: kebyn <kebuyuni@gmail.com> Co-authored-by: lianhu.yin <lianhu.yin@nio.com> Co-authored-by: kebyn <kebyn@kebyn.cc> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com> Co-authored-by: Carlo Mussolini <48855305+Carlomus@users.noreply.github.com> Co-authored-by: Rain H <2510421000@qq.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Sirut Buasai <73297481+sirutBuasai@users.noreply.github.com> Co-authored-by: Vedant V Jhaveri <vedantjh2@gmail.com> Co-authored-by: Kaixi Hou <kaixih@nvidia.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: Sai Enduri <saimanas.enduri@amd.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com> Co-authored-by: Douglas Yang <dyang@college.harvard.edu> Co-authored-by: Binyao Jiang <byjiang1996@gmail.com> Co-authored-by: Zijian Zhang <35801754+futrime@users.noreply.github.com> Co-authored-by: wingedge <handkodu@gmail.com> Co-authored-by: bianfeng <bianfeng@pinduoduo.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: b8zhong <b8zhong@uwaterloo.ca> Co-authored-by: alisonshao <54658187+alisonshao@users.noreply.github.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: Nicholas <45984215+liusy58@users.noreply.github.com> Co-authored-by: liusy58 <xiehang.lsy@alibaba-inc.com> Co-authored-by: Yuan Luo <yuan.luo@hotmail.com> Co-authored-by: Hao Chen <cighao@gmail.com> Co-authored-by: Morpheus Guo <yuechao.guo@amd.com> Co-authored-by: yuechguo <yuechguo@amd.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: sglang-bot <sglangbot@gmail.com> Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com> Co-authored-by: Glen Liu <62917497+glenliu21@users.noreply.github.com> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: gongwei-130 <56567052+gongwei-130@users.noreply.github.com> Co-authored-by: Baidu-AIAK <Baidu_AIAK@163.com> Co-authored-by: Chen Haozhe <c-34@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: ykwd <oneday117@qq.com> Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com> Co-authored-by: Even Zhou <even.y.zhou@outlook.com> Co-authored-by: Roger Young <42564206+rogeryoungh@users.noreply.github.com> Co-authored-by: xuebi <xuebi@minimaxi.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Sehoon Kim <sehoon@x.ai> Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Yuhao Yang <yhyang201@gmail.com> Co-authored-by: StonyPort <157573149+zhooooong@users.noreply.github.com> Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com> Co-authored-by: Zeyu Li <li_zeyu@pku.edu.cn> Co-authored-by: iLeGend <824040212@qq.com> Co-authored-by: joesun <shauntajoesph@gmail.com> Co-authored-by: Thomas Wang <1am9trash@gmail.com> Co-authored-by: sogalin <39478626+sogalin@users.noreply.github.com> Co-authored-by: DarkSharpness <76582120+DarkSharpness@users.noreply.github.com> Co-authored-by: yctseng0211 <yctseng@amd.com> Co-authored-by: root <root@smci355-ccs-aus-m12-17.cs-aus.dcgpu> Co-authored-by: jacky.cheng <yichiche@amd.com> Co-authored-by: Lzhang-hub <57925599+Lzhang-hub@users.noreply.github.com> Co-authored-by: YanbingJiang <yanbing.jiang@intel.com> Co-authored-by: Fan Yin <1106310035@qq.com> Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com> Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com> Co-authored-by: Stefan He <hebiaobuaa@gmail.com> Co-authored-by: Ke Bao <ISPObaoke@163.com> Co-authored-by: Oasis-Git <ayw.sirius19@gmail.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Co-authored-by: zyksir <zhuyikai.zyk@gmail.com> Co-authored-by: Zhuqi Li <zhli@x.ai> Co-authored-by: Michele Marzollo <37903931+michelemarzollo@users.noreply.github.com> Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com> Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com> Co-authored-by: weibingo <weibing_lai@163.com> Co-authored-by: Jiajun Li <48857426+guapisolo@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: roikoren755 <26850796+roikoren755@users.noreply.github.com> Co-authored-by: Shu Wang <shuw@nvidia.com> Co-authored-by: cctry <shiyang@x.ai> Co-authored-by: Trevor Morris <tmorris@nvidia.com> Co-authored-by: Yijie Zhu <762412795@qq.com> Co-authored-by: ZhengdQin <zhengdqin@gmail.com> Co-authored-by: richhuan <huan_rz@qq.com> Co-authored-by: ZhengdQin <46387172+ZhengdQin@users.noreply.github.com> Co-authored-by: yinghui <32845984+cicirori@users.noreply.github.com> Co-authored-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com> Co-authored-by: ErsongWang <158176536+ErsongWang@users.noreply.github.com> Co-authored-by: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com> Co-authored-by: Swipe4057 <106391009+Swipe4057@users.noreply.github.com> Co-authored-by: liuhuijiayou <46172426+liuhuijiayou@users.noreply.github.com> Co-authored-by: Tiance Wang <wangtiance@gmail.com> Co-authored-by: wangtiance <tiancew@qq.com> Co-authored-by: Xu Yongfei <xuyongfei.xyf@antgroup.com> Co-authored-by: gaopengff <pengfei.gao@intel.com> Co-authored-by: ant-yy <vito.yy@antgroup.com> Co-authored-by: Zhi Yiliu <2584074296@qq.com> Co-authored-by: lzy <tomlzy213@gmail.com> Co-authored-by: Xinyue Zhang <xinyue.zhang@oracle.com> Co-authored-by: Yuhao Yao <37280700+yuhyao@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com> Co-authored-by: Hanming Lu <hanming@x.ai> Co-authored-by: c30031083 <chenxu140@huawei.com> Co-authored-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com> Co-authored-by: Sam Li <lsam@nvidia.com> Co-authored-by: jackeyhua <jackeyhuasjtu@gmail.com> Co-authored-by: Siyuan Chen <41201609+SYChen123@users.noreply.github.com> Co-authored-by: Yibo Cai <cyb70289@gmail.com> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: Zaili Wang <109502517+ZailiWang@users.noreply.github.com> Co-authored-by: josephyou <josephyou@tencent.com>

feat: support moe topk sigmoid kernel

d39dd85

Signed-off-by: xuebi <xuebi@minimaxi.com>

rogeryoungh requested review from BBuf, FlamingoPg, HaiShaw, ispobock, merrymercy, yizhang2077 and zhyncs as code owners November 11, 2025 05:50

github-actions Bot added amd sgl-kernel labels Nov 11, 2025

Merge branch 'main' into minimax-m2-sigmoid-topk

685f423

FlamingoPg added the run-ci label Nov 11, 2025

Merge branch 'main' into minimax-m2-sigmoid-topk

3e0126b

FlamingoPg removed the amd label Nov 11, 2025

FlamingoPg self-assigned this Nov 11, 2025

fix: topk_sigmoid init in setup_rocm

a0588bc

Signed-off-by: xuebi <xuebi@minimaxi.com>

github-actions Bot added the amd label Nov 11, 2025

FlamingoPg approved these changes Nov 13, 2025

View reviewed changes

FlamingoPg added the ready-to-merge The PR is ready to merge after the CI is green. label Nov 13, 2025

FlamingoPg and others added 3 commits November 14, 2025 01:22

Merge branch 'main' into minimax-m2-sigmoid-topk

bf491ad

Merge branch 'main' into minimax-m2-sigmoid-topk

e7fa18a

Merge branch 'main' into minimax-m2-sigmoid-topk

85e0eaa

ispobock assigned BBuf Nov 16, 2025

BBuf approved these changes Nov 16, 2025

View reviewed changes

update: add bench_moe_topk_sigmoid

5c48f85

xuebi and others added 5 commits November 17, 2025 21:10

update: lint

3c7743f

Merge branch 'main' into minimax-m2-sigmoid-topk

65eca84

Merge branch 'main' into minimax-m2-sigmoid-topk

c213422

Merge branch 'main' into minimax-m2-sigmoid-topk

e680489

Merge branch 'main' into minimax-m2-sigmoid-topk

c8e23b1

ispobock merged commit e72cf13 into sgl-project:main Nov 19, 2025
53 of 58 checks passed

rogeryoungh mentioned this pull request Nov 27, 2025

Optimize topk sigmoid in minimax_m2 #14047

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support moe topk sigmoid kernel#13049

Support moe topk sigmoid kernel#13049
ispobock merged 13 commits intosgl-project:mainfrom
rogeryoungh:minimax-m2-sigmoid-topk

rogeryoungh commented Nov 11, 2025

Uh oh!

gemini-code-assist Bot commented Nov 11, 2025

Uh oh!

FlamingoPg commented Nov 11, 2025

Uh oh!

FlamingoPg commented Nov 11, 2025

Uh oh!

rogeryoungh commented Nov 11, 2025

Uh oh!

BBuf commented Nov 16, 2025 •

edited

Loading

Uh oh!

rogeryoungh commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

rogeryoungh commented Nov 11, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Nov 11, 2025

Uh oh!

FlamingoPg commented Nov 11, 2025

Uh oh!

FlamingoPg commented Nov 11, 2025

Uh oh!

rogeryoungh commented Nov 11, 2025

Uh oh!

BBuf commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rogeryoungh commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BBuf commented Nov 16, 2025 •

edited

Loading