fix Sliding Window and Sink Token Support in Unified Kernel by zminglei · Pull Request #11634 · sgl-project/sglang

zminglei · 2025-10-14T21:47:30Z

Motivation

Current unified kernel's logic for model with sliding window attention and sink tokens is wrong.
This PR is to fix it for sliding window attention and sink tokens model like gpt-oss-20b.

python3 -m sglang.launch_server --model-path /shared/public/elr-models/openai/gpt-oss-20b/6cd4d0ffba39483fe4fb0f5637831f717dafca35/ --attention-backend triton --enable-deterministic-inference
Before:

lm_eval --model local-chat-completions --model_args model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048 --tasks gsm8k --batch_size 1024 --apply_chat_template --num_fewshot 1 --limit 200

INFO:lm_eval.loggers.evaluation_tracker:Output path not provided, skipping saving results aggregated
local-chat-completions (model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048), gen_kwargs: (None), limit: 200.0, num_fewshot: 1, batch_size: 1024
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     1|exact_match|↑  |    0|±  |     0|
|     |       |strict-match    |     1|exact_match|↑  |    0|±  |     0|

After:

lm_eval --model local-chat-completions --model_args model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048 --tasks gsm8k --batch_size 1024 --apply_chat_template --num_fewshot 1 --limit 200

INFO:lm_eval.loggers.evaluation_tracker:Output path not provided, skipping saving results aggregated
local-chat-completions (model=gpt-oss,base_url=http://127.0.0.1:30000/v1/chat/completions,num_concurrent=128,timeout=999999,max_gen_toks=2048), gen_kwargs: (None), limit: 200.0, num_fewshot: 1, batch_size: 1024
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     1|exact_match|↑  | 0.84|±  |0.0260|
|     |       |strict-match    |     1|exact_match|↑  | 0.03|±  |0.0121|

Modifications

Sink Token Fix
The unified deterministic kernel (_fwd_kernel_unified) had the HAS_SINK parameter defined but wasn't actually using it, causing incorrect softmax computation when sink tokens were present.

Sliding Window Fix
The sliding window attention mask was incorrectly comparing unified array indices with absolute sequence positions, leading to incorrect attention masking.

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

…o triton

zminglei added 5 commits October 10, 2025 23:59

change the server_args condition

b5f0dbd

fix lint

5b061df

Merge remote-tracking branch 'upstream/bhe/1_stage_triton_kernel' int…

e4893cc

…o triton

fix sliding window

9658a6c

lint

05b0167

zminglei marked this pull request as ready for review October 14, 2025 21:50

zminglei requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, ispobock, kushanam, merrymercy and zhyncs as code owners October 14, 2025 21:50

hebiao064 merged commit ec2a21c into sgl-project:bhe/1_stage_triton_kernel Oct 14, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix Sliding Window and Sink Token Support in Unified Kernel#11634

fix Sliding Window and Sink Token Support in Unified Kernel#11634
hebiao064 merged 5 commits intosgl-project:bhe/1_stage_triton_kernelfrom
zminglei:triton-sliding-window

zminglei commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zminglei commented Oct 14, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants