Skip to content

[JIT sgl-kernel] Jit support per tensor quant#15709

Merged
BBuf merged 28 commits intomainfrom
jit_support_per_tensor_quant
Dec 25, 2025
Merged

[JIT sgl-kernel] Jit support per tensor quant#15709
BBuf merged 28 commits intomainfrom
jit_support_per_tensor_quant

Conversation

@BBuf
Copy link
Copy Markdown
Collaborator

@BBuf BBuf commented Dec 24, 2025

Motivation

➜  jit_kernel python3 tests/test_per_tensor_quant_fp8.py
====================================================== test session starts ======================================================
platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0
rootdir: /home/lmsys/bbuf/sglang/python
configfile: pyproject.toml
plugins: hydra-core-1.3.2, anyio-4.11.0, typeguard-4.4.4
collected 9 items                                                                                                               

tests/test_per_tensor_quant_fp8.py .........                                                                              [100%]

======================================================= warnings summary ========================================================
../../../../../../../usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1290
  /usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1290: PytestAssertRewriteWarning: Module already imported so cannot be rewritten; anyio
    self._mark_plugins_for_rewrite(hook, disable_autoload)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================= 9 passed, 1 warning in 13.14s ================================================
  • sgl-kernel
 jit_kernel python3 /home/lmsys/bbuf/sglang/sgl-kernel/benchmark/bench_per_tensor_quant_fp8.py
⚠️ vLLM not available, skipping comparison
per-tensor-quant-fp8-performance:
    batch_size  seq_len   SGL Kernel
0         16.0     64.0    10.249762
1         16.0    128.0    21.541752
2         16.0    256.0    48.441215
3         16.0    512.0   106.394665
4         16.0   1024.0   206.246530
5         16.0   2048.0   403.524995
6         32.0     64.0    21.571416
7         32.0    128.0    48.642592
8         32.0    256.0   106.193720
9         32.0    512.0   206.433368
10        32.0   1024.0   403.411667
11        32.0   2048.0   884.436687
12        64.0     64.0    48.301857
13        64.0    128.0   106.155326
14        64.0    256.0   206.425190
15        64.0    512.0   403.736333
16        64.0   1024.0   884.741346
17        64.0   2048.0  1761.169354
18       128.0     64.0   105.996946
19       128.0    128.0   206.263826
20       128.0    256.0   403.728982
21       128.0    512.0   884.528001
22       128.0   1024.0  1761.481285
23       128.0   2048.0  3514.272054
  • jit kernel(pr)
vLLM not available, skipping comparison
per-tensor-quant-fp8-performance:
    batch_size  seq_len   SGL Kernel
0         16.0     64.0    10.227294
1         16.0    128.0    21.485734
2         16.0    256.0    48.617873
3         16.0    512.0   106.206471
4         16.0   1024.0   206.348592
5         16.0   2048.0   403.680662
6         32.0     64.0    21.730780
7         32.0    128.0    49.148478
8         32.0    256.0   106.192177
9         32.0    512.0   206.320342
10        32.0   1024.0   403.442999
11        32.0   2048.0   884.408673
12        64.0     64.0    48.800135
13        64.0    128.0   106.235504
14        64.0    256.0   206.406806
15        64.0    512.0   403.754671
16        64.0   1024.0   884.618640
17        64.0   2048.0  1760.879993
18       128.0     64.0   106.232718
19       128.0    128.0   206.548824
20       128.0    256.0   403.857668
21       128.0    512.0   884.276668
22       128.0   1024.0  1760.829290
23       128.0   2048.0  3514.077346

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the quant LLM Quantization label Dec 24, 2025
@BBuf BBuf requested a review from DarkSharpness December 24, 2025 02:31
@BBuf BBuf marked this pull request as ready for review December 24, 2025 05:04
Comment thread python/sglang/jit_kernel/csrc/gemm/per_tensor_quant_fp8.cuh Outdated
Comment thread python/sglang/jit_kernel/include/sgl_kernel/fp8_utils.cuh Outdated
Comment thread python/sglang/jit_kernel/csrc/gemm/per_tensor_quant_fp8.cuh Outdated
@BBuf BBuf force-pushed the jit_support_per_tensor_quant branch from 5f32ae5 to fea8cb7 Compare December 24, 2025 09:03
Comment thread python/sglang/jit_kernel/csrc/gemm/per_tensor_quant_fp8.cuh Outdated
@BBuf BBuf merged commit de2f288 into main Dec 25, 2025
16 of 28 checks passed
@BBuf BBuf deleted the jit_support_per_tensor_quant branch December 25, 2025 08:24
Leoyzen pushed a commit to Leoyzen/sglang that referenced this pull request Dec 25, 2025
Leoyzen pushed a commit to Leoyzen/sglang that referenced this pull request Dec 25, 2025
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants