Convert cu_seqlens to CPU for npu_flash_attention_unpad operator by xiaobaicxy · Pull Request #15434 · sgl-project/sglang

xiaobaicxy · 2025-12-19T02:32:32Z

Motivation

In order to improve the performance of VisionAscendAttention, we convert cu_seqlens to CPU before the first transformer layer, because converting it to CPU per layer would interrupt operator dispatch and cause kernel bubbles.

Modifications

Accuracy Tests

All modifications do not affect the precision

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-19T02:32:36Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

yuan-luo · 2025-12-19T03:44:35Z

-        self.act = ACT2FN[hidden_act]
+        self.hidden_act = hidden_act
+        if self.hidden_act == "silu":
+            from sglang.srt.layers.activation import SiluAndMul


Move imports to the top of the file.

* 'main' of https://github.com/sgl-project/sglang: (136 commits) fix: unreachable error check in retraction (sgl-project#15433) [sgl-kernel] chore: update deepgemm version (sgl-project#13402) [diffusion] multi-platform: support diffusion on amd and fix encoder loading on MI325 (sgl-project#13760) [amd] Add deterministic all-reduce kernel for AMD (ROCm) (sgl-project#15340) [diffusion] refactor: refactor _build_req_from_sampling to use shallow_asdict (sgl-project#13782) Add customized sampler registration (sgl-project#15423) Update readme (sgl-project#15425) Fix Mindspore model import warning (sgl-project#15287) [Feature] Xiaomi `MiMo-V2-Flash` day0 support (sgl-project#15207) [diffusion] profiling: add bench_serving.py and VBench (sgl-project#15410) [DLLM] Fix dLLM regression (sgl-project#15371) [Deepseek V3.2] Fix Deepseek MTP in V1 mode (sgl-project#15429) chore: update CI_PERMISSIONS (sgl-project#15431) [DLLM] Add CI for diffusion LLMs (sgl-project#14723) Support using different attention backend for draft decoding. (sgl-project#14843) feat(dsv32): better error handling for DeepSeek-v3.2 encoder (sgl-project#14353) tiny fix lint on main (sgl-project#15424) multimodal: precompute hash for MultimodalDataItem (sgl-project#14354) [AMD] Clear pre-built AITER kernels and warmup to prevent segfaults and test timeouts (sgl-project#15318) [Performance] optimize NSA backend metadata computation for multi-step speculative decoding (sgl-project#14781) ...

JustinTong0323 · 2025-12-22T15:03:47Z

please fix lint by pre-commit -a ~

…glang into eagle-sche * 'ifmn/eagle-dp-attn' of https://github.com/sgl-project/sglang: (22 commits) dp scheduler enhance support with chunked prefill (sgl-project#16071) modify suffix decoding CI dependency update (sgl-project#16063) fix rotary_embedding init npu (sgl-project#16011) feat: bugfix and accuracy fix for stablelm2_1_6b (sgl-project#15932) Update model and feature support for Ascend NPU (sgl-project#16005) Bugfix for Llama4 (sgl-project#15929) Bugfix for ds-vl2 (sgl-project#15894) gme qwen vl runners fix (sgl-project#15899) add profiling in scheduler (sgl-project#15876) llama use triton rope op (sgl-project#15855) suffix decoding adapt npu suffix decoding adapt npu Add suffix decoding speculative algorithm from feature 13553 cherry sgl-project#15434: qwen3 vl performance update cherry sgl-project#15597: fix Qwen3-VL-30B-A3B-Instruct accuracy loss [Schedule] bug fix for schedule enhancer (sgl-project#15834) minilb support roundrobin (sgl-project#15824) fix torchair compile issue cherry sgl-project#15187: lora fix ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/managers/scheduler_enhancer.py

iforgetmyname · 2025-12-31T08:27:09Z

/tag-and-rerun-ci

iforgetmyname · 2025-12-31T08:29:19Z

+        if is_npu():
+            cu_seqlens = cu_seqlens.to("cpu")
+        else:
+            cu_seqlens = cu_seqlens.to(self.device, non_blocking=True)


Suggested change

if is_npu():

cu_seqlens = cu_seqlens.to("cpu")

else:

cu_seqlens = cu_seqlens.to(self.device, non_blocking=True)

if not is_npu():

xxx

else:

xxx

…-project#15434)

xiaobaicxy added 2 commits December 17, 2025 19:11

Replace torch native impl with silu_and_mul/gelu_and mul for qwen2_5_vl

facea8b

Convert cu_seqlens to cpu for npu_flash_attention_unpad

6820bb4

xiaobaicxy requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners December 19, 2025 02:32

github-actions Bot added the Multi-modal multi-modal language model label Dec 19, 2025

yuan-luo requested review from JustinTong0323, mickqian, yhyang201 and yuan-luo December 19, 2025 03:39

yuan-luo reviewed Dec 19, 2025

View reviewed changes

yuan-luo added the run-ci label Dec 19, 2025

xiaobaicxy added 4 commits December 19, 2025 15:35

Refactor: put import in the head of the file

c4abe7e

Merge branch 'sgl-project:main' into main

a514b75

Merge branch 'main' into main

fc01086

xiaobaicxy requested review from Qiaolin-Yu and hebiao064 as code owners December 22, 2025 13:26

Update qwen2_5_vl.py

f747755

xiaobaicxy closed this Dec 23, 2025

xiaobaicxy reopened this Dec 23, 2025

xiaobaicxy added 2 commits December 23, 2025 20:39

Merge branch 'main' into main

ddcfd15

Merge branch 'sgl-project:main' into main

a12d93a

Update qwen3_vl.py

20b8d08

iforgetmyname added a commit that referenced this pull request Dec 26, 2025

cherry #15434: qwen3 vl performance update

e027ff5

iforgetmyname reviewed Dec 31, 2025

View reviewed changes

xiaobaicxy and others added 5 commits December 31, 2025 16:39

Update qwen3_vl.py

f1ee849

Merge branch 'main' into main

6796bbd

Merge branch 'main' into main

b126b80

Merge branch 'main' into main

f40cd43

Merge branch 'main' into main

e206ea2

xiaobaicxy changed the title ~~Qwen2.5-vl support SiluAndMul/GeluAndMul & Convert cu_seqlens to CPU for npu_flash_attention_unpad operator~~ Convert cu_seqlens to CPU for npu_flash_attention_unpad operator Jan 3, 2026

iforgetmyname approved these changes Jan 3, 2026

View reviewed changes

iforgetmyname merged commit 25fa2ac into sgl-project:main Jan 4, 2026
153 of 168 checks passed

JiaruiChang5268 pushed a commit to JiaruiChang5268/sglang that referenced this pull request Jan 10, 2026

cherry sgl-project#15434: qwen3 vl performance update

a3e93fa

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

Convert cu_seqlens to CPU for npu_flash_attention_unpad operator (sgl…

349b0d4

…-project#15434)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert cu_seqlens to CPU for npu_flash_attention_unpad operator#15434

Convert cu_seqlens to CPU for npu_flash_attention_unpad operator#15434
iforgetmyname merged 15 commits intosgl-project:mainfrom
xiaobaicxy:main

xiaobaicxy commented Dec 19, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Dec 19, 2025

Uh oh!

yuan-luo Dec 19, 2025 •

edited

Loading

Uh oh!

xiaobaicxy Dec 19, 2025

Uh oh!

JustinTong0323 commented Dec 22, 2025

Uh oh!

iforgetmyname commented Dec 31, 2025

Uh oh!

iforgetmyname Dec 31, 2025

Uh oh!

xiaobaicxy Dec 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xiaobaicxy commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 19, 2025

Uh oh!

yuan-luo Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaobaicxy Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

JustinTong0323 commented Dec 22, 2025

Uh oh!

iforgetmyname commented Dec 31, 2025

Uh oh!

iforgetmyname Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

xiaobaicxy Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xiaobaicxy commented Dec 19, 2025 •

edited

Loading

yuan-luo Dec 19, 2025 •

edited

Loading