Skip to content

[sgl-kernel] chore: update deepgemm version#13402

Merged
Fridge003 merged 3 commits intosgl-project:mainfrom
FlamingoPg:upd-deepgemm-1117
Dec 19, 2025
Merged

[sgl-kernel] chore: update deepgemm version#13402
Fridge003 merged 3 commits intosgl-project:mainfrom
FlamingoPg:upd-deepgemm-1117

Conversation

@FlamingoPg
Copy link
Copy Markdown
Collaborator

Motivation

As titled.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @FlamingoPg, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the version of the DeepGEMM dependency used within the "sgl-kernel" component. By updating the Git tag reference in the CMake configuration, the project will now build against a more recent commit of the DeepGEMM library, ensuring access to the latest developments and improvements from that external module.

Highlights

  • Dependency Update: The DeepGEMM submodule reference has been updated to a newer commit ("92585b702909c4576efed6fada640bb37a7c5b50") in "sgl-kernel/CMakeLists.txt".
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the deepgemm dependency to a new commit. The change is straightforward. For improved maintainability, I've added a suggestion to include a comment in CMakeLists.txt specifying the version or a brief description corresponding to the new commit hash. It would also be beneficial to summarize the dependency's changes in the pull request description for future updates.

Comment thread sgl-kernel/CMakeLists.txt Outdated
repo-deepgemm
GIT_REPOSITORY https://github.com/sgl-project/DeepGEMM
GIT_TAG f4adba8a6695e635b0106ce3dae3202016ad0ee5
GIT_TAG 92585b702909c4576efed6fada640bb37a7c5b50
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability, it's good practice to add a comment indicating what this commit hash corresponds to (e.g., a version number or a brief description of the change). This makes it easier to track dependencies and understand why a particular version is being used.

    GIT_TAG        92585b702909c4576efed6fada640bb37a7c5b50 # TODO: Add version/description for this commit

@hlu1
Copy link
Copy Markdown
Collaborator

hlu1 commented Nov 18, 2025

cc @YAMY1234 to verify perf improvement to dsv32

@YAMY1234
Copy link
Copy Markdown
Collaborator

YAMY1234 commented Nov 19, 2025

Verified around 10% improvement for kernel deep_gemm.fp8_mqa_logits for DpskV3.2

python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --enable-dp-attention --disable-cuda-graph --disable-radix-cache

Profiling results under 8192 isl:

Before:
image

After:
image

Adding GPQA Acc:

====================                                                                                                      | 24/198 [19:31<2:01:39, 41.95s/it]
Repeat: 8, mean: 0.789
Scores: ['0.758', '0.808', '0.778', '0.798', '0.778', '0.803', '0.793', '0.798']
====================
Writing report to /tmp/gpqa_deepseek-ai_DeepSeek-V3.2-Exp.html
{'chars': np.float64(15109.676767676769), 'chars:std': np.float64(12031.954168701903), 'score:std': np.float64(0.4015072103909452), 'score': np.float64(0.797979797979798)}
Writing results to /tmp/gpqa_deepseek-ai_DeepSeek-V3.2-Exp.json
Total latency: 1130.090 s
Score: 0.798

@hlu1
Copy link
Copy Markdown
Collaborator

hlu1 commented Nov 19, 2025

@FlamingoPg There is one more update coming: deepseek-ai/DeepGEMM#230

@Fridge003
Copy link
Copy Markdown
Collaborator

@YAMY1234
Copy link
Copy Markdown
Collaborator

@hlu1 @YAMY1234 Dpsk v32 fails on CI, with some shape errors https://github.com/sgl-project/sglang/actions/runs/19419989490/job/56168731348?pr=13402
@Fridge003 @FlamingoPg

deepseek-ai/DeepGEMM@fdf1622 This commit may fix the issue, maybe we can wait until this commit to get merged and update or hashtag accordingly.

@Fridge003
Copy link
Copy Markdown
Collaborator

@FlamingoPg Can we pull the latest main branch into sgl-release branch and change the tag here

@FlamingoPg
Copy link
Copy Markdown
Collaborator Author

@FlamingoPg Can we pull the latest main branch into sgl-release branch and change the tag here

Done

@Fridge003 Fridge003 merged commit 65c0985 into sgl-project:main Dec 19, 2025
645 of 686 checks passed
xiaobaicxy added a commit to xiaobaicxy/sglang that referenced this pull request Dec 19, 2025
* 'main' of https://github.com/sgl-project/sglang: (136 commits)
  fix: unreachable error check in retraction (sgl-project#15433)
  [sgl-kernel] chore: update deepgemm version (sgl-project#13402)
  [diffusion] multi-platform: support diffusion on amd and fix encoder loading on MI325 (sgl-project#13760)
  [amd] Add deterministic all-reduce kernel for AMD (ROCm) (sgl-project#15340)
  [diffusion] refactor: refactor _build_req_from_sampling to use shallow_asdict (sgl-project#13782)
  Add customized sampler registration (sgl-project#15423)
  Update readme (sgl-project#15425)
  Fix Mindspore model import warning (sgl-project#15287)
  [Feature] Xiaomi `MiMo-V2-Flash` day0 support (sgl-project#15207)
  [diffusion] profiling: add bench_serving.py and VBench (sgl-project#15410)
  [DLLM] Fix dLLM regression (sgl-project#15371)
  [Deepseek V3.2] Fix Deepseek MTP in V1 mode (sgl-project#15429)
  chore: update CI_PERMISSIONS (sgl-project#15431)
  [DLLM] Add CI for diffusion LLMs (sgl-project#14723)
  Support using different attention backend for draft decoding. (sgl-project#14843)
  feat(dsv32): better error handling for DeepSeek-v3.2 encoder (sgl-project#14353)
  tiny fix lint on main (sgl-project#15424)
  multimodal: precompute hash for MultimodalDataItem (sgl-project#14354)
  [AMD] Clear pre-built AITER kernels and warmup to prevent segfaults and test timeouts (sgl-project#15318)
  [Performance] optimize NSA backend metadata computation for multi-step speculative decoding (sgl-project#14781)
  ...
Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 23, 2025
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
@trevor-m
Copy link
Copy Markdown
Collaborator

FYI I recently started seeing some deepgemm errors (file not found error by cuLoadModule):
It appeared to be fixed after reverting this PR.

  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/linear.py", line 260, in forward
    output = self.quant_method.apply(self, x, bias)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8.py", line 545, in apply
    return self.w8a8_block_fp8_linear(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8_utils.py", line 411, in deepgemm_w8a8_block_fp8_linear_with_fallback
    output = w8a8_block_fp8_matmul_deepgemm(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8_kernel.py", line 1071, in w8a8_block_fp8_matmul_deepgemm
    deep_gemm_fp8_fp8_bf16_nt(A, As, B, Bs, C)
  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8_kernel.py", line 105, in deep_gemm_fp8_fp8_bf16_nt
    deep_gemm_wrapper.gemm_nt_f8f8bf16((A, As), (B, Bs), C)
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/entrypoint.py", line 97, in gemm_nt_f8f8bf16
    with compile_utils.deep_gemm_execution_hook(m, n, k, num_groups, kernel_type):
  File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/compile_utils.py", line 295, in deep_gemm_execution_hook
    _maybe_compile_deep_gemm_one_type_all(kernel_type, n, k, num_groups)
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/compile_utils.py", line 110, in _maybe_compile_deep_gemm_one_type_all
    _compile_deep_gemm_one_type_all(
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/compile_utils.py", line 170, in _compile_deep_gemm_one_type_all
    executor.execute(m=m)
  File "/sgl-workspace/sglang/python/sglang/srt/layers/deep_gemm_wrapper/compile_utils.py", line 247, in execute
    deep_gemm.fp8_gemm_nt(
  File "/usr/local/lib/python3.12/dist-packages/deep_gemm/__init__.py", line 50, in _fn
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA driver error (_deps/repo-deepgemm-src/csrc/apis/../jit_kernels/impls/../../jit/handle.hpp:120): 301 (CUDA_ERROR_FILE_NOT_FOUND, file not found)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants