Skip to content

Add SGLang CUDA crash API logging inspired by FlashInfer#20910

Merged
BBuf merged 24 commits intomainfrom
add_sglang_cuda_crash_debug_2
Mar 22, 2026
Merged

Add SGLang CUDA crash API logging inspired by FlashInfer#20910
BBuf merged 24 commits intomainfrom
add_sglang_cuda_crash_debug_2

Conversation

@BBuf
Copy link
Copy Markdown
Collaborator

@BBuf BBuf commented Mar 19, 2026

Motivation

This PR adds SGLang-native API-level CUDA crash logging for LLM and diffusion kernel call boundaries.

The implementation is inspired by FlashInfer's API logging utility:
https://github.com/flashinfer-ai/flashinfer/blob/main/flashinfer/api_logging.py

This version keeps the scope focused on crash debugging and level-10 dump capture. Replay-related code was intentionally not included so the implementation stays smaller and aligned with the actual SGLang debugging workflow.

Also add a skill to debug cuda crash like flashinfer.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added documentation Improvements or additions to documentation deepseek diffusion SGLang Diffusion labels Mar 19, 2026
@BBuf
Copy link
Copy Markdown
Collaborator Author

BBuf commented Mar 19, 2026

/tag-and-rerun-ci

Comment thread python/sglang/jit_kernel/debug_utils.py Outdated
Comment thread docs/references/environment_variables.md Outdated
Comment thread python/sglang/api_logging.py Outdated
Comment thread python/sglang/srt/layers/quantization/bitsandbytes.py Outdated
Comment thread python/sglang/srt/models/kimi_vl_moonvit.py Outdated
Comment thread sgl-kernel/python/sgl_kernel/flash_attn.py Outdated
@BBuf
Copy link
Copy Markdown
Collaborator Author

BBuf commented Mar 20, 2026

/tag-and-rerun-ci

@@ -0,0 +1,657 @@
---
name: debug-cuda-crash
description: Tutorial for debugging CUDA crashes in SGLang using kernel API logging
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be telling when to call the skill.
e.g.

description: Call this skill when you need debug CUDA crashes using kernel API logging.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 42beeb2

):
return norm_infer_native(x, weight, bias, eps, is_rms_norm, out)

@maybe_wrap_jit_kernel_debug(op_name="jit_kernel.diffusion.triton.rms_norm_fn")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to manually give it a name? It should be able to auto infer the op name

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



@maybe_wrap_jit_kernel_debug(
op_name="jit_kernel.diffusion.triton.fuse_residual_layernorm_scale_shift_gate_select01_kernel"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too tedious. The decorator should auto infer the name

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread python/sglang/srt/models/minimax_m2.py Outdated
tl.store(out2_row + offsets2, out2, mask=mask2)


@debug_kernel_api(op_name="MiniMaxM2.rms_sumsq_serial")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should auto infer.
The decorator can look at the file name and function name.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@BBuf
Copy link
Copy Markdown
Collaborator Author

BBuf commented Mar 21, 2026

/tag-and-rerun-ci

@BBuf BBuf merged commit 766d225 into main Mar 22, 2026
76 of 105 checks passed
@BBuf BBuf deleted the add_sglang_cuda_crash_debug_2 branch March 22, 2026 08:39
OrangeRedeng pushed a commit to OrangeRedeng/sglang that referenced this pull request Mar 22, 2026
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
empty-quiver added a commit to empty-quiver/sglang-turboquant that referenced this pull request Apr 28, 2026
The patched __init__.py imports maybe_wrap_debug_kernel from
sgl_kernel.debug_utils for the SGLANG_KERNEL_API_LOGLEVEL machinery.
This file exists in upstream sgl-kernel feature branches (PR sgl-project#20910)
but never landed on the kvcache-ai fork's main. The old PyPI-installed
sglang-kt 0.6.1 image happened to bundle it; our source build does not.

Drop a verbatim copy from PR sgl-project#20910 (BBuf/Xiaoyu Zhang) into the
in-tree sgl-kernel source so the wheel we build packages it. Without
this file the source-built sgl-kernel raises ModuleNotFoundError at
import time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blackwell SM100/SM120 deepseek diffusion SGLang Diffusion documentation Improvements or additions to documentation hicache Hierarchical Caching for SGLang jit-kernel quant LLM Quantization run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants