[JIT sgl-kernel] Jit support per tensor quant by BBuf · Pull Request #15709 · sgl-project/sglang

BBuf · 2025-12-24T02:14:50Z

Motivation

➜  jit_kernel python3 tests/test_per_tensor_quant_fp8.py
====================================================== test session starts ======================================================
platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0
rootdir: /home/lmsys/bbuf/sglang/python
configfile: pyproject.toml
plugins: hydra-core-1.3.2, anyio-4.11.0, typeguard-4.4.4
collected 9 items                                                                                                               

tests/test_per_tensor_quant_fp8.py .........                                                                              [100%]

======================================================= warnings summary ========================================================
../../../../../../../usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1290
  /usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1290: PytestAssertRewriteWarning: Module already imported so cannot be rewritten; anyio
    self._mark_plugins_for_rewrite(hook, disable_autoload)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================= 9 passed, 1 warning in 13.14s ================================================

sgl-kernel

 jit_kernel python3 /home/lmsys/bbuf/sglang/sgl-kernel/benchmark/bench_per_tensor_quant_fp8.py
⚠️ vLLM not available, skipping comparison
per-tensor-quant-fp8-performance:
    batch_size  seq_len   SGL Kernel
0         16.0     64.0    10.249762
1         16.0    128.0    21.541752
2         16.0    256.0    48.441215
3         16.0    512.0   106.394665
4         16.0   1024.0   206.246530
5         16.0   2048.0   403.524995
6         32.0     64.0    21.571416
7         32.0    128.0    48.642592
8         32.0    256.0   106.193720
9         32.0    512.0   206.433368
10        32.0   1024.0   403.411667
11        32.0   2048.0   884.436687
12        64.0     64.0    48.301857
13        64.0    128.0   106.155326
14        64.0    256.0   206.425190
15        64.0    512.0   403.736333
16        64.0   1024.0   884.741346
17        64.0   2048.0  1761.169354
18       128.0     64.0   105.996946
19       128.0    128.0   206.263826
20       128.0    256.0   403.728982
21       128.0    512.0   884.528001
22       128.0   1024.0  1761.481285
23       128.0   2048.0  3514.272054

jit kernel(pr)

vLLM not available, skipping comparison
per-tensor-quant-fp8-performance:
    batch_size  seq_len   SGL Kernel
0         16.0     64.0    10.227294
1         16.0    128.0    21.485734
2         16.0    256.0    48.617873
3         16.0    512.0   106.206471
4         16.0   1024.0   206.348592
5         16.0   2048.0   403.680662
6         32.0     64.0    21.730780
7         32.0    128.0    49.148478
8         32.0    256.0   106.192177
9         32.0    512.0   206.320342
10        32.0   1024.0   403.442999
11        32.0   2048.0   884.408673
12        64.0     64.0    48.800135
13        64.0    128.0   106.235504
14        64.0    256.0   206.406806
15        64.0    512.0   403.754671
16        64.0   1024.0   884.618640
17        64.0   2048.0  1760.879993
18       128.0     64.0   106.232718
19       128.0    128.0   206.548824
20       128.0    256.0   403.857668
21       128.0    512.0   884.276668
22       128.0   1024.0  1760.829290
23       128.0   2048.0  3514.077346

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

This reverts commit 4b1b88d.

gemini-code-assist · 2025-12-24T02:14:53Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

BBuf added 20 commits December 23, 2025 23:24

jit kernel support per-tensor quant

e00c4c8

jit kernel support per-tensor quant

20907af

jit kernel support per-tensor quant

a1d79db

jit kernel support per-tensor quant

630b122

jit kernel support per-tensor quant

2cf2b4e

jit kernel support per-tensor quant

4bea033

jit kernel support per-tensor quant

701584d

jit kernel support per-tensor quant

d82110f

jit kernel support per-tensor quant

dd0a546

jit kernel support per-tensor quant

c0bd65d

jit kernel support per-tensor quant

8718303

jit kernel support per-tensor quant

4b1b88d

Revert "jit kernel support per-tensor quant"

7c9493e

This reverts commit 4b1b88d.

jit kernel support per-tensor quant

504e2c8

jit kernel support per-tensor quant

efe9f6a

jit kernel support per-tensor quant

244ca7a

jit kernel support per-tensor quant

7e5f74b

jit kernel support per-tensor quant

c2e2266

jit kernel support per-tensor quant

c72f635

upd

8a5d1ad

github-actions Bot added the quant LLM Quantization label Dec 24, 2025

upd

f304504

BBuf requested a review from DarkSharpness December 24, 2025 02:31

BBuf added 3 commits December 24, 2025 12:57

upd

271fda9

upd

b90e531

upd

c638456

github-actions Bot added the sgl-kernel label Dec 24, 2025

upd

fea8cb7

BBuf marked this pull request as ready for review December 24, 2025 05:04

BBuf requested review from FlamingoPg, HaiShaw, ispobock, merrymercy, yizhang2077 and zhyncs as code owners December 24, 2025 05:04

DarkSharpness reviewed Dec 24, 2025

View reviewed changes

Comment thread python/sglang/jit_kernel/csrc/gemm/per_tensor_quant_fp8.cuh Outdated

Comment thread python/sglang/jit_kernel/include/sgl_kernel/fp8_utils.cuh Outdated

Comment thread python/sglang/jit_kernel/csrc/gemm/per_tensor_quant_fp8.cuh Outdated

BBuf force-pushed the jit_support_per_tensor_quant branch from 5f32ae5 to fea8cb7 Compare December 24, 2025 09:03

BBuf added 2 commits December 24, 2025 17:15

ud

2be9ae0

ud

0be1647

FlamingoPg added the run-ci label Dec 25, 2025

FlamingoPg reviewed Dec 25, 2025

View reviewed changes

Comment thread python/sglang/jit_kernel/csrc/gemm/per_tensor_quant_fp8.cuh Outdated

lint

dd705a4

FlamingoPg approved these changes Dec 25, 2025

View reviewed changes

BBuf merged commit de2f288 into main Dec 25, 2025
16 of 28 checks passed

BBuf deleted the jit_support_per_tensor_quant branch December 25, 2025 08:24

Leoyzen pushed a commit to Leoyzen/sglang that referenced this pull request Dec 25, 2025

[JIT sgl-kernel] Jit support per tensor quant (sgl-project#15709)

9d6f61f

Leoyzen pushed a commit to Leoyzen/sglang that referenced this pull request Dec 25, 2025

[JIT sgl-kernel] Jit support per tensor quant (sgl-project#15709)

df0edf3

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[JIT sgl-kernel] Jit support per tensor quant (sgl-project#15709)

af7ea95

DarkSharpness mentioned this pull request Jan 13, 2026

[Roadmap] JIT kernel development #17035

Closed

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JIT sgl-kernel] Jit support per tensor quant#15709

[JIT sgl-kernel] Jit support per tensor quant#15709
BBuf merged 28 commits intomainfrom
jit_support_per_tensor_quant

BBuf commented Dec 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Dec 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BBuf commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BBuf commented Dec 24, 2025 •

edited

Loading