Skip to content

Fused two elementwise kernels for k_nope and k_pe concat#14862

Merged
HaiShaw merged 3 commits intosgl-project:mainfrom
HaiShaw:fused-concat-k-nope-pe
Dec 15, 2025
Merged

Fused two elementwise kernels for k_nope and k_pe concat#14862
HaiShaw merged 3 commits intosgl-project:mainfrom
HaiShaw:fused-concat-k-nope-pe

Conversation

@kkHuang-amd
Copy link
Copy Markdown
Collaborator

Motivation

Reduce time cost in concat k_nope and k_pe before doing MHA attention

Modifications

Use the triton kernel to replace the naive torch operations

Accuracy Tests

root@mia1-p01-g07:/sgl-workspace/sglang# python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000 --port 8000
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:57<00:00, 23.06it/s]
Accuracy: 0.939
Invalid: 0.000
Latency: 57.372 s
Output throughput: 2335.740 token/s

Benchmarking and Profiling

before fusing two elementwise kernels
image

This PR
image

The time cost can be reduced from (169 us + 104 us) to (128us)

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

attn_dtype = k_nope.dtype
k = k_nope.new_empty(*k_shape, dtype=attn_dtype)
concat_and_cast_mha_k_triton(k, k_nope, k_pe)
elif _is_hip and self.current_attention_backend == "aiter":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm _is_hip or _is_gfx95_supported

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_is_hip is enough, this optimization can go in any ROCm platform.

@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Dec 11, 2025

/tag-and-rerun-ci

@HaiShaw HaiShaw merged commit 2ea844e into sgl-project:main Dec 15, 2025
82 of 89 checks passed
Liwansi added a commit to iforgetmyname/sglang that referenced this pull request Dec 15, 2025
…n_eagle3_npu

* 'main' of https://github.com/sgl-project/sglang: (89 commits)
  [model-gateway] Remove legacy RouterMetrics and Rename SmgMetrics to Metrics and smg_labels to metrics_labels (sgl-project#15160)
  [diffusion] fix: fix video model sp when resolution is not specified (sgl-project#15047)
  [diffusion] fix: fix pytorch non-writable array warning (sgl-project#15017)
  [diffusion] fix: cache dit with parallel (sgl-project#15163)
  chore: change npu pr-test a2 runner (sgl-project#15152)
  [Feature] Fuse mrope all in 1 kernel (sgl-project#14906)
  Fix num running requests (load) wrong cleared for ongoing requests (sgl-project#15116)
  Fused two elementwise kernels for k_nope and k_pe concat (sgl-project#14862)
  fix: adding date and fixing release name issue (sgl-project#15174)
  [CPU] Add Gemma3RMSNorm kernel in sgl-kernel and add ut (sgl-project#9324)
  feature: PR wheel (sgl-project#15170)
  [diffusion] model: support mutli-image input and qwen-image-edit-2509 (sgl-project#15005)
  fix CompressedTensorsW8A8Int8 min_capability (sgl-project#13914)
  Tiny improve summary text in `bench_one_batch_server.py` (sgl-project#15158)
  [model-gateway] add mcp and discovery metrics (sgl-project#15156)
  fix: move ci-bot (sgl-project#15154)
  Fix import warnings (sgl-project#15144)
  ci: adding errors to Github summary (sgl-project#14778)
  [model-gateway] Add streaming metrics for harmony gRPC router (sgl-project#15147)
  [model-gateway] upgrade axum and axum server (sgl-project#15146)
  ...

# Conflicts:
#	python/sglang/srt/server_args.py
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 17, 2025
…#14862)

1 TC failure to check, but irrelevant to this code change
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
…#14862)

1 TC failure to check, but irrelevant to this code change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants