[sgl-kernel] chore: update deepgemm version by FlamingoPg · Pull Request #13402 · sgl-project/sglang

FlamingoPg · 2025-11-17T06:01:52Z

Motivation

As titled.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-11-17T06:02:03Z

Summary of Changes

Hello @FlamingoPg, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the version of the DeepGEMM dependency used within the "sgl-kernel" component. By updating the Git tag reference in the CMake configuration, the project will now build against a more recent commit of the DeepGEMM library, ensuring access to the latest developments and improvements from that external module.

Highlights

Dependency Update: The DeepGEMM submodule reference has been updated to a newer commit ("92585b702909c4576efed6fada640bb37a7c5b50") in "sgl-kernel/CMakeLists.txt".

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the deepgemm dependency to a new commit. The change is straightforward. For improved maintainability, I've added a suggestion to include a comment in CMakeLists.txt specifying the version or a brief description corresponding to the new commit hash. It would also be beneficial to summarize the dependency's changes in the pull request description for future updates.

gemini-code-assist · 2025-11-17T06:03:03Z

    repo-deepgemm
    GIT_REPOSITORY https://github.com/sgl-project/DeepGEMM
-    GIT_TAG        f4adba8a6695e635b0106ce3dae3202016ad0ee5
+    GIT_TAG        92585b702909c4576efed6fada640bb37a7c5b50


For better maintainability, it's good practice to add a comment indicating what this commit hash corresponds to (e.g., a version number or a brief description of the change). This makes it easier to track dependencies and understand why a particular version is being used.

GIT_TAG 92585b702909c4576efed6fada640bb37a7c5b50 # TODO: Add version/description for this commit

hlu1 · 2025-11-18T00:04:17Z

cc @YAMY1234 to verify perf improvement to dsv32

YAMY1234 · 2025-11-19T00:48:51Z

Verified around 10% improvement for kernel deep_gemm.fp8_mqa_logits for DpskV3.2

python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --enable-dp-attention --disable-cuda-graph --disable-radix-cache

Profiling results under 8192 isl:

Before:

After:

Adding GPQA Acc:

====================                                                                                                      | 24/198 [19:31<2:01:39, 41.95s/it]
Repeat: 8, mean: 0.789
Scores: ['0.758', '0.808', '0.778', '0.798', '0.778', '0.803', '0.793', '0.798']
====================
Writing report to /tmp/gpqa_deepseek-ai_DeepSeek-V3.2-Exp.html
{'chars': np.float64(15109.676767676769), 'chars:std': np.float64(12031.954168701903), 'score:std': np.float64(0.4015072103909452), 'score': np.float64(0.797979797979798)}
Writing results to /tmp/gpqa_deepseek-ai_DeepSeek-V3.2-Exp.json
Total latency: 1130.090 s
Score: 0.798

hlu1 · 2025-11-19T02:58:17Z

@FlamingoPg There is one more update coming: deepseek-ai/DeepGEMM#230

Fridge003 · 2025-11-23T19:41:42Z

@hlu1 @YAMY1234 Dpsk v32 fails on CI, with some shape errors
https://github.com/sgl-project/sglang/actions/runs/19419989490/job/56168731348?pr=13402

YAMY1234 · 2025-11-23T20:23:05Z

@hlu1 @YAMY1234 Dpsk v32 fails on CI, with some shape errors https://github.com/sgl-project/sglang/actions/runs/19419989490/job/56168731348?pr=13402
@Fridge003 @FlamingoPg

deepseek-ai/DeepGEMM@fdf1622 This commit may fix the issue, maybe we can wait until this commit to get merged and update or hashtag accordingly.

Fridge003 · 2025-12-06T19:17:03Z

@FlamingoPg Can we pull the latest main branch into sgl-release branch and change the tag here

FlamingoPg · 2025-12-17T17:18:24Z

@FlamingoPg Can we pull the latest main branch into sgl-release branch and change the tag here

Done

* 'main' of https://github.com/sgl-project/sglang: (136 commits) fix: unreachable error check in retraction (sgl-project#15433) [sgl-kernel] chore: update deepgemm version (sgl-project#13402) [diffusion] multi-platform: support diffusion on amd and fix encoder loading on MI325 (sgl-project#13760) [amd] Add deterministic all-reduce kernel for AMD (ROCm) (sgl-project#15340) [diffusion] refactor: refactor _build_req_from_sampling to use shallow_asdict (sgl-project#13782) Add customized sampler registration (sgl-project#15423) Update readme (sgl-project#15425) Fix Mindspore model import warning (sgl-project#15287) [Feature] Xiaomi `MiMo-V2-Flash` day0 support (sgl-project#15207) [diffusion] profiling: add bench_serving.py and VBench (sgl-project#15410) [DLLM] Fix dLLM regression (sgl-project#15371) [Deepseek V3.2] Fix Deepseek MTP in V1 mode (sgl-project#15429) chore: update CI_PERMISSIONS (sgl-project#15431) [DLLM] Add CI for diffusion LLMs (sgl-project#14723) Support using different attention backend for draft decoding. (sgl-project#14843) feat(dsv32): better error handling for DeepSeek-v3.2 encoder (sgl-project#14353) tiny fix lint on main (sgl-project#15424) multimodal: precompute hash for MultimodalDataItem (sgl-project#14354) [AMD] Clear pre-built AITER kernels and warmup to prevent segfaults and test timeouts (sgl-project#15318) [Performance] optimize NSA backend metadata computation for multi-step speculative decoding (sgl-project#14781) ...

trevor-m · 2026-01-21T18:15:20Z

FYI I recently started seeing some deepgemm errors (file not found error by cuLoadModule):
It appeared to be fixed after reverting this PR.

  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/linear.py", line 260, in forward
    output = self.quant_method.apply(self, x, bias)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8.py", line 545, in apply
    return self.w8a8_block_fp8_linear(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8_utils.py", line 411, in deepgemm_w8a8_block_fp8_linear_with_fallback
    output = w8a8_block_fp8_matmul_deepgemm(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8_kernel.py", line 1071, in w8a8_block_fp8_matmul_deepgemm
    deep_gemm_fp8_fp8_bf16_nt(A, As, B, Bs, C)
  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8_kernel.py", line 105, in deep_gemm_fp8_fp8_bf16_nt
    deep_gemm_wrapper.gemm_nt_f8f8bf16((A, As), (B, Bs), C)
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/entrypoint.py", line 97, in gemm_nt_f8f8bf16
    with compile_utils.deep_gemm_execution_hook(m, n, k, num_groups, kernel_type):
  File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/compile_utils.py", line 295, in deep_gemm_execution_hook
    _maybe_compile_deep_gemm_one_type_all(kernel_type, n, k, num_groups)
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/compile_utils.py", line 110, in _maybe_compile_deep_gemm_one_type_all
    _compile_deep_gemm_one_type_all(
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/compile_utils.py", line 170, in _compile_deep_gemm_one_type_all
    executor.execute(m=m)
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/compile_utils.py", line 247, in execute
    deep_gemm.fp8_gemm_nt(
  File "/usr/local/lib/python3.12/dist-packages/deep_gemm/__init__.py", line 50, in _fn
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA driver error (_deps/repo-deepgemm-src/csrc/apis/../jit_kernels/impls/../../jit/handle.hpp:120): 301 (CUDA_ERROR_FILE_NOT_FOUND, file not found)

[sgl-kernel] chore: update deepgemm version

e0b036f

FlamingoPg requested review from BBuf, HaiShaw, ispobock, merrymercy, yizhang2077 and zhyncs as code owners November 17, 2025 06:01

FlamingoPg self-assigned this Nov 17, 2025

github-actions Bot added the sgl-kernel label Nov 17, 2025

FlamingoPg assigned Fridge003 Nov 17, 2025

sglang-bot added the run-ci label Nov 17, 2025

gemini-code-assist Bot reviewed Nov 17, 2025

View reviewed changes

hlu1 mentioned this pull request Nov 19, 2025

[Feature] NSA optimization roadmap #11989

Closed

Fridge003 mentioned this pull request Dec 13, 2025

[Roadmap] DeepSeek v3.2 (GLM 5) Optimization #15025

Open

40 tasks

FlamingoPg added 2 commits December 18, 2025 01:17

Update DeepGEMM repository tag in CMakeLists

0c01f5d

Merge branch 'main' into upd-deepgemm-1117

350cb87

Fridge003 approved these changes Dec 19, 2025

View reviewed changes

Fridge003 merged commit 65c0985 into sgl-project:main Dec 19, 2025
645 of 686 checks passed

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 23, 2025

[sgl-kernel] chore: update deepgemm version (sgl-project#13402)

8dd7e96

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[sgl-kernel] chore: update deepgemm version (sgl-project#13402)

e981139

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[sgl-kernel] chore: update deepgemm version (sgl-project#13402)

66efe92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sgl-kernel] chore: update deepgemm version#13402

[sgl-kernel] chore: update deepgemm version#13402
Fridge003 merged 3 commits intosgl-project:mainfrom
FlamingoPg:upd-deepgemm-1117

FlamingoPg commented Nov 17, 2025

Uh oh!

gemini-code-assist Bot commented Nov 17, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Nov 17, 2025

Uh oh!

hlu1 commented Nov 18, 2025

Uh oh!

YAMY1234 commented Nov 19, 2025 •

edited

Loading

Uh oh!

hlu1 commented Nov 19, 2025

Uh oh!

Fridge003 commented Nov 23, 2025

Uh oh!

YAMY1234 commented Nov 23, 2025

Uh oh!

Fridge003 commented Dec 6, 2025

Uh oh!

FlamingoPg commented Dec 17, 2025

Uh oh!

Uh oh!

trevor-m commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

FlamingoPg commented Nov 17, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Nov 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

hlu1 commented Nov 18, 2025

Uh oh!

YAMY1234 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hlu1 commented Nov 19, 2025

Uh oh!

Fridge003 commented Nov 23, 2025

Uh oh!

YAMY1234 commented Nov 23, 2025

Uh oh!

Fridge003 commented Dec 6, 2025

Uh oh!

FlamingoPg commented Dec 17, 2025

Uh oh!

Uh oh!

trevor-m commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

YAMY1234 commented Nov 19, 2025 •

edited

Loading