Skip to content

llama use triton rope op#15855

Merged
iforgetmyname merged 1 commit intosgl-project:ifmn/eagle-dp-attnfrom
iforgetmyname:llama_rope
Dec 26, 2025
Merged

llama use triton rope op#15855
iforgetmyname merged 1 commit intosgl-project:ifmn/eagle-dp-attnfrom
iforgetmyname:llama_rope

Conversation

@Liwansi
Copy link
Copy Markdown
Contributor

@Liwansi Liwansi commented Dec 26, 2025

Motivation

add triton split_qkv_rmsnorm_rope op for llama

Modifications

remove npu_apply_rotary_pos_emb, using triton op instead.

Accuracy Tests

qwen3-32b-w8a8:
image

qwen3-30b:
image

Benchmarking and Profiling

32b-w8a8 + eagle step 1

before:
image

after:
image

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@iforgetmyname iforgetmyname merged commit 66c34a4 into sgl-project:ifmn/eagle-dp-attn Dec 26, 2025
1 check passed
@iforgetmyname iforgetmyname deleted the llama_rope branch December 26, 2025 07:56
Liwansi added a commit to iforgetmyname/sglang that referenced this pull request Dec 29, 2025
…glang into eagle-sche

* 'ifmn/eagle-dp-attn' of https://github.com/sgl-project/sglang: (22 commits)
  dp scheduler enhance support with chunked prefill (sgl-project#16071)
  modify suffix decoding
  CI dependency update (sgl-project#16063)
  fix rotary_embedding init npu (sgl-project#16011)
  feat: bugfix and accuracy fix for stablelm2_1_6b (sgl-project#15932)
  Update model and feature support for Ascend NPU (sgl-project#16005)
  Bugfix for Llama4 (sgl-project#15929)
  Bugfix for ds-vl2 (sgl-project#15894)
  gme qwen vl runners fix (sgl-project#15899)
  add profiling in scheduler (sgl-project#15876)
  llama use triton rope op (sgl-project#15855)
  suffix decoding adapt npu
  suffix decoding adapt npu
  Add suffix decoding speculative algorithm from feature 13553
  cherry sgl-project#15434: qwen3 vl performance update
  cherry sgl-project#15597: fix Qwen3-VL-30B-A3B-Instruct accuracy loss
  [Schedule] bug fix for schedule enhancer (sgl-project#15834)
  minilb support roundrobin (sgl-project#15824)
  fix torchair compile issue
  cherry sgl-project#15187: lora fix
  ...

# Conflicts:
#	python/sglang/srt/managers/scheduler.py
#	python/sglang/srt/managers/scheduler_enhancer.py
JiaruiChang5268 pushed a commit to JiaruiChang5268/sglang that referenced this pull request Jan 9, 2026
JiaruiChang5268 pushed a commit to JiaruiChang5268/sglang that referenced this pull request Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants