Update benchmark scripts by merrymercy · Pull Request #8 · sgl-project/sglang

merrymercy · 2024-01-16T00:12:51Z

No description provided.

* Use fused_experts_cpu and add weight packing * add check on whether AMX is supported * move utils to cpu_utils.py * address comment * no need to pass in is_vnni since it's True by default; change inplace to True * refactor prepack_weight_if_needed * Only import sgl_kernel.cpu once

support fp8 dispatch

* align shapes Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> * fix Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>

* set a higher timeout threshold to prevent forced terminated * disable rope kernel to address the accuracy regression in llama

add pd disaggregation best practices

Signed-off-by: Stanley Winata <stanley.winata@amd.com> [Wave] Add wave extend attention kernel Signed-off-by: Harsh Menon <harsh@nod-labs.com> [Wave] Adding logit_cap and layer scaling to API Also add support for the wave backend to the model runner. And use Triton decode kernels for now. [Wave] Run chunked prefill for perf comparison on Wave test Need to rename the non chunked/regular prefill version because otherwise rpd will treat it as the same kernel Signed-off-by: Stanley Winata <stanley.winata@amd.com> [Wave] Cache the function that loads the wave kernel Also maintain a global kernel hash to avoid recomputing the hash on every call. [Wave] Don't specify block size and enable buffer ops [Wave] Enable wave runtime and update scheduling API [Wave] Update API to use wave_compile & WaveCompileOptions [Wave] Update wave backend and extend attention to latest [Wave] Add speculative decode kernel Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com> cache kernels using lru_cache Update WaveBackend to use Wave Decode (sgl-project#6) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Revert "Update WaveBackend to use Wave Decode (sgl-project#6)" (sgl-project#7) This reverts commit eac4599. Wave Backend decode (sgl-project#8) * align shapes Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> * fix Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> --------- Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Wave backend fixes (sgl-project#10) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> More fixes to Wave decode (sgl-project#12) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> is_causal Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Enable the grok in3 model (sgl-project#14) Set unique cache dir for each worker (sgl-project#16) update kernel (sgl-project#18) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> updated spec decode test as per wave Signed-off-by: xintin <gaurav.verma@amd.com> fix extend (sgl-project#23) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Refactor paged decode intermediate arrays shapes (sgl-project#24) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> remove dyn symbols (sgl-project#26) Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> cleanup shapes (sgl-project#27) Some fields were removed from `paged_decode_attention_shape`. Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Remove `mha` param from Wave decode attention kernel (sgl-project#28) Depends on iree-org/iree-turbine#1039 Signed-off-by: Paul Zhang <paul.zhang@amd.com> nfc: fix problems reported by linting update references from iree.turbine to wave_lang

# This is the 1st commit message: rebase # This is the commit message sgl-project#2: remove duplicated code # This is the commit message sgl-project#3: add type hints # This is the commit message sgl-project#4: add clear cache for benchmark alignment # This is the commit message sgl-project#5: remove unuse arg # This is the commit message sgl-project#6: clear cache once # This is the commit message sgl-project#7: simplified VAE cache logic for qwenimage and wan # This is the commit message sgl-project#8: remove duplicated code

Deepseek V3.2 support

[FEAT] Support dense&sparse together

* [npu]adaptation to deterministic inference * modify review comments

Log Session Id Patch

31B dense model with bidirectional attention fix

merrymercy added 9 commits January 15, 2024 18:58

update benchmark

66ce775

update benchmark

41ceb4b

rename

83af48a

improve benchmark

86c7e9f

update

f5b3484

improve

0046417

update

06e3258

add articles

c28aedd

update

9edab30

merrymercy merged commit 70359bf into main Jan 16, 2024

merrymercy deleted the benchmark branch January 16, 2024 00:13

wonderisland mentioned this pull request Sep 19, 2024

[Bug] illegal memory access encountered #1467

Closed

5 tasks

CSEEduanyu mentioned this pull request Jan 26, 2025

[Bug] NCCL Crash with SIGSEGV Frequently when deploying deepseek v3 #2803

Closed

5 tasks

lambert0312 mentioned this pull request Feb 18, 2025

Support NextN (MTP) speculative decoding for DeepSeek-V3/R1 #3582

Merged

ToughK mentioned this pull request Feb 18, 2025

[Bug] sglang crashed when use enable_dp_attention running DeepSeekV3 on 2x8xH100 #3658

Closed

5 tasks

mahaocong90 mentioned this pull request Feb 26, 2025

[Bug] H20 8 gpu x 2 with --enable-dp-attention occurred CUDA error: an illegal memory access #3892

Closed

5 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Update benchmark scripts (sgl-project#8)

9ae401c

This was referenced Apr 16, 2025

enable ci test: upstream ci for XPU DiweiSun/sglang#4

Closed

Enable CPU CI: upstream CI enabling with github workflow DiweiSun/sglang#3

Closed

Update README.md DiweiSun/sglang#1

Closed

riou-chen mentioned this pull request Apr 17, 2025

[Bug] run eagle3 failed #5448

Closed

ch-wan pushed a commit to ch-wan/sglang that referenced this pull request Apr 25, 2025

Merge pull request sgl-project#8 from xutizhou/deepgemm

84a25cd

support fp8 dispatch

ericschreiber mentioned this pull request Aug 13, 2025

[Bug] CUDA error: uncorrectable ECC error encountered when using HiCache with xPyD disaggregation. #9151

Closed

5 tasks

gaolaobao mentioned this pull request Aug 25, 2025

[Bug] RTX 5060: RMSNorm failed, same as the #7249 issue, when running qwen2.5-0.5b-instruct model. #9600

Closed

5 tasks

Xia-Weiwen pushed a commit to Xia-Weiwen/sglang that referenced this pull request Sep 5, 2025

Fix llama acc regression (sgl-project#8)

66ea73e

* set a higher timeout threshold to prevent forced terminated * disable rope kernel to address the accuracy regression in llama

Johnsonms mentioned this pull request Oct 2, 2025

Support DeepSeek V3.2 Exp #11061

Merged

kalyank007 pushed a commit to kalyank007/sglang that referenced this pull request Nov 7, 2025

Update rotary_embedding.py (sgl-project#8)

f0c2b27

Johnsonms mentioned this pull request Nov 10, 2025

[Bug] DeepSeek V32 CUDA error: an illegal memory access was encountered #12893

Closed

5 tasks

0xymoro mentioned this pull request Nov 10, 2025

[Bug] 0.5.5 custom all reduce crashing #13016

Closed

5 tasks

amd-youchen referenced this pull request in amd-youchen/sglang Nov 13, 2025

Merge pull request Yuechguo#8 from zhyajie/dev/perf

f101705

add pd disaggregation best practices

RahulB200 mentioned this pull request Nov 13, 2025

[Bug] Kimi K2 Thinking Marlin Kernel Crash #13234

Closed

5 tasks

yhyang201 pushed a commit that referenced this pull request Dec 13, 2025

clean code for mm_cache && add para-check (#8)

02d5dfc

fstandhartinger pushed a commit to fstandhartinger/sglang that referenced this pull request Jan 13, 2026

Merge pull request sgl-project#8 from chutesai/deepseek-v32

62dc133

Deepseek V3.2 support

tpoisonooo pushed a commit to tpoisonooo/sglang that referenced this pull request Feb 12, 2026

Merge pull request sgl-project#8 from GitHubstart0916/dense_as_sparse

3b31271

[FEAT] Support dense&sparse together

Martion-z mentioned this pull request Feb 13, 2026

[Bug] CUDA error: an illegal memory access was encountered with SGLang v0.5.8 + HiCache #18785

Closed

5 tasks

chenkaiyue mentioned this pull request Feb 28, 2026

Fix: Cuda Graph + HiCache + Speculative Decoding Working Together were giving Cuda Illegal memory access error. #19177

Open

alisonshao mentioned this pull request Mar 1, 2026

Upgrade transformers==5.3.0 #17784

Merged

21 tasks

putdanil mentioned this pull request Mar 4, 2026

[Bug] FLUX.2-dev FP8 transformer crashes with 4 reference images during denoising #19873

Closed

5 tasks

0xymoro mentioned this pull request Mar 6, 2026

[Bug] Illegal memory access on 0.5.9 nvfp4 #20011

Closed

5 tasks

Estrella-xx added a commit to Estrella-xx/sglang that referenced this pull request Mar 10, 2026

[npu]adaptation to deterministic inference (sgl-project#8)

4636a29

* [npu]adaptation to deterministic inference * modify review comments

alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Mar 14, 2026

Merge pull request sgl-project#8 from alphabetc1/fix/benchmark

7e0a78c

MMuzzammil1 added a commit to MMuzzammil1/sglang that referenced this pull request Mar 16, 2026

Merge pull request sgl-project#8 from DIA/muz/log-session-id

1a00988

Log Session Id Patch

lviy mentioned this pull request Mar 26, 2026

[Bug] Enablling DP-Attention cause 'nan' of 'inf' in prob tensor #21460

Open

5 tasks

mmangkad pushed a commit to mmangkad-dev/sglang that referenced this pull request Apr 3, 2026

Merge pull request sgl-project#8 from pyc96/kp/gemma4-audio

f16b722

31B dense model with bidirectional attention fix

twb1235 mentioned this pull request Apr 7, 2026

[Bug] I noticed that with the node 2 and pp 2 tp8 setup, the workers don't exit on their own when the master goes down. I have to kill them manually #22227

Open

5 tasks

samuellees mentioned this pull request Apr 8, 2026

fix: enable custom all-reduce coexistence with NCCL symmetric memory #22354

Closed

5 tasks

silencejade mentioned this pull request Apr 25, 2026

[NPU] Fix mrope_position computation in Eagle Worker v2 with PlanStream #23423

Open

5 tasks

JackLeeHal mentioned this pull request May 9, 2026

[Question] running DeepSeek-V4-Pro on B300 #24776

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update benchmark scripts#8

Update benchmark scripts#8
merrymercy merged 9 commits intomainfrom
benchmark

merrymercy commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

merrymercy commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant