[1/2] Optimizations and refactors about quant kernel#9534
[1/2] Optimizations and refactors about quant kernel#9534ispobock merged 594 commits intosgl-project:mainfrom
Conversation
…fzyzcjy/sglang into feat/opt_quant_extracted_kernel
|
@fzyzcjy Can you share the kernel level benchmark results (e.g., memory bandwidth) for this optimized kernel compared with previous one? |
|
@ispobock I did it two months ago so do not remember clearly... below is the e2e speedup. also iirc when looking at profile the kernel is much faster than before.
|
|
@fzyzcjy I see. E2E test and profile results are also good. For kernel optimization, I just think it's better to have some kernel-level benchmark results (TFLOPS for compute bound kernel, memory bandwidth for memory bound kernel), especially for different problem sizes. It can help us to understand how we reach the hardware limitation and make sure the optimization benefit for both small and large batch size. |
|
Definitely. Let me find some (old) logs. |
|
Here are baseline vs sgl kernel. I do not find previous result logs though... iirc when I started optimizing the bench logic are wrong (things are wrongly put inside L2 cache) and some things even cannot run. IIRC months ago I asked @Alcanderian after my optimization, and we had a discussion and the conclusion is yes the code is near the limit and almost cannot be pushed further. old log 1 (I think it is the latest) old log 2 (may be near the latest) |
|
oops let me have a check |
|
…rnel (sgl-project#9534)" (sgl-project#10292) (cherry picked from commit 6d55f60) # Conflicts: # python/sglang/srt/layers/quantization/fp8_kernel.py # sgl-kernel/tests/test_per_token_group_quant_8bit.py
…quant kernel (sgl-project#9534)" (sgl-project#10292)" This reverts commit bcfdb49.
…gl-project#9534)" (sgl-project#10292)" This reverts commit 6d55f60.

Motivation
to review this, just look at #7601
remarks about CI
thus, we need to check
then it is safe to merge
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist