perf(kimi_linear): replace einops rearrange with native torch ops in Kimi-Linear KDA path by vedantjh2 · Pull Request #20396 · sgl-project/sglang

vedantjh2 · 2026-03-12T01:57:52Z

Motivation

einops.rearrange adds Python-level overhead (pattern parsing, backend dispatch, shape validation) on every call. In the Kimi-Linear-48B model's KimiDeltaAttention hot path, this is called thousands of times per forward pass.

This PR replaces all 8 einops.rearrange calls with equivalent native PyTorch operations (unflatten, unsqueeze, squeeze, flatten) across 2 files, removing the einops import entirely from both.

Files Changed

kimi_linear.py — 2 rearrange calls replaced in KimiDeltaAttention.forward()
kda_backend.py — 6 rearrange calls replaced in forward_decode() and forward_extend()

Profiling Results

Profiled with Kimi-Linear-48B-A3B-Instruct on 2x H100 80GB (TP=2), --disable-cuda-graph, 10 prompts (512 input / 128 output tokens). Results scoped to nn.Module: KimiDeltaAttention_* spans via Perfetto SQL.

python -m sglang.launch_server \
  --model-path /shared/public/elr-models/moonshotai/Kimi-Linear-48B-A3B-Instruct \
  --port 30000 --tp 2 --disable-cuda-graph --trust-remote-code

Baseline (einops)

SELECT   name,   COUNT(*) AS calls,   ROUND(AVG(dur) / 1e3, 4) AS avg_ms,   ROUND(SUM(dur) / 1e3, 2) AS total_ms FROM slice WHERE name = 'einops/einops.py(561): rearrange'   AND id IN (     SELECT s.id FROM slice s     JOIN ancestor_slice(s.id) a ON a.name GLOB 'nn.Module: KimiDeltaAttention_*'   ) GROUP BY name;wwwww

name	calls	avg (us)	total (ms)
`einops::rearrange`	12,600	15.77	198.70

Optimized (native torch)

SELECT   name,   COUNT(*) AS calls,   ROUND(AVG(dur) / 1e3, 4) AS avg_ms,   ROUND(SUM(dur) / 1e3, 2) AS total_ms FROM slice WHERE name IN ('aten::unflatten', 'aten::flatten', 'aten::unsqueeze', 'aten::squeeze')   AND id IN (     SELECT s.id FROM slice s     JOIN ancestor_slice(s.id) a ON a.name GLOB 'nn.Module: KimiDeltaAttention_*'   ) GROUP BY name;

name	calls	avg (us)	total (ms)
`aten::unflatten`	10,080	3.41	34.41
`aten::unsqueeze`	12,604	2.62	33.06
`aten::squeeze`	7,520	5.29	39.80
`aten::flatten`	2,520	2.57	6.47
Total			113.74

~1.75x reduction in reshape-related CPU overhead (198.70ms -> 113.74ms).

Correctness

python benchmark/gsm8k/bench_sglang.py --data-path /shared/public/data/gsm8k/test.jsonl

Metric	Baseline (einops)	Optimized (native torch)
Accuracy	0.915	0.910
Invalid	0.000	0.000
Latency	41.6s	40.0s
Throughput	471.9 tok/s	490.7 tok/s

Ran the same test in registered/models/test_kimi_linear_models.py
Locally launched SGLang server with the command:

python -m sglang.launch_server   --model-path /shared/public/elr-models/moonshotai/Kimi-Linear-48B-A3B-Instruct   --port 30000 --tp 2 --trust-remote-code

then launched the test with the following command and result:

python -m sglang.test.few_shot_gsm8k --num-shots 5 --num-questions 200 --max-new-tokens 512 --parallel 128 --port 30000 --data-path /shared/public/data/gsm8k/test.jsonl
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:09<00:00, 21.65it/s]
Accuracy: 0.900
Invalid: 0.000
Latency: 9.288 s
Output throughput: 2059.970 token/s

All replacements produce bitwise identical outputs to the original einops operations, verified with torch.equal across multiple tensor shapes, dtypes, and contiguity states.

…Kimi-Linear KDA path Replace all 8 einops.rearrange calls with native torch operations in the Kimi-Linear-48B model's KimiDeltaAttention hot path: - kimi_linear.py: 2 rearrange calls → unflatten + squeeze/flatten - kda_backend.py: 6 rearrange calls → unflatten + unsqueeze Profiled with Kimi-Linear-48B-A3B-Instruct (TP=2, 2xH100): - Baseline: 12,600 einops::rearrange calls, avg 15.77us, total 198.7ms - Optimized: 0 rearrange calls; replaced by: aten::unflatten 10,080 calls avg 3.41us total 34.4ms aten::flatten 2,520 calls avg 2.57us total 6.5ms aten::unsqueeze 12,604 calls avg 2.62us total 33.1ms aten::squeeze 7,520 calls avg 5.29us total 39.8ms E2E throughput unchanged (2.65 vs 2.72 tok/s, within noise). Mean TPOT: 71.48ms vs 71.81ms baseline.

ispobock · 2026-03-12T02:45:20Z

/tag-and-rerun-ci

vedantjh2 · 2026-03-12T23:25:20Z

/rerun-failed-ci

vedantjh2 · 2026-03-13T02:15:33Z

/rerun-failed-ci

vedantjh2 · 2026-03-13T23:26:17Z

/rerun-failed-ci

vedantjh2 · 2026-03-18T19:11:20Z

/rerun-failed-ci

ispobock · 2026-03-19T03:20:35Z

/tag-and-rerun-ci

vedantjh2 · 2026-03-20T00:56:10Z

/rerun-failed-ci

…Kimi-Linear KDA path (sgl-project#20396)

vedantjh2 requested review from Fridge003, HaiShaw, Qiaolin-Yu, hebiao064, ispobock and merrymercy as code owners March 12, 2026 01:57

Merge remote-tracking branch 'upstream/main' into einops-to-native-kimi

247f8e3

vedantjh2 force-pushed the einops-to-native-kimi branch from 5792f26 to 247f8e3 Compare March 12, 2026 02:21

Merge branch 'main' into einops-to-native-kimi

35c54d6

github-actions Bot added the run-ci label Mar 12, 2026

zminglei approved these changes Mar 12, 2026

View reviewed changes

Merge branch 'main' into einops-to-native-kimi

469ed3a

Merge branch 'main' into einops-to-native-kimi

a4a69bd

vedantjh2 and others added 3 commits March 16, 2026 11:26

Merge branch 'main' into einops-to-native-kimi

4fdbd80

Merge branch 'main' into einops-to-native-kimi

0ca6094

Merge branch 'main' into einops-to-native-kimi

22bb109

ispobock merged commit db995fb into sgl-project:main Mar 20, 2026
220 of 248 checks passed

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

perf(kimi_linear): replace einops rearrange with native torch ops in …

f2f263d

…Kimi-Linear KDA path (sgl-project#20396)

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

perf(kimi_linear): replace einops rearrange with native torch ops in …

f1d3e94

…Kimi-Linear KDA path (sgl-project#20396)

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026

perf(kimi_linear): replace einops rearrange with native torch ops in …

82d3b85

…Kimi-Linear KDA path (sgl-project#20396)

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

perf(kimi_linear): replace einops rearrange with native torch ops in …

98a0c1d

…Kimi-Linear KDA path (sgl-project#20396)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

perf(kimi_linear): replace einops rearrange with native torch ops in …

66e1557

…Kimi-Linear KDA path (sgl-project#20396)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(kimi_linear): replace einops rearrange with native torch ops in Kimi-Linear KDA path#20396

perf(kimi_linear): replace einops rearrange with native torch ops in Kimi-Linear KDA path#20396
ispobock merged 8 commits intosgl-project:mainfrom
vedantjh2:einops-to-native-kimi

vedantjh2 commented Mar 12, 2026 •

edited

Loading

Uh oh!

ispobock commented Mar 12, 2026 •

edited

Loading

Uh oh!

vedantjh2 commented Mar 12, 2026

Uh oh!

vedantjh2 commented Mar 13, 2026

Uh oh!

vedantjh2 commented Mar 13, 2026

Uh oh!

vedantjh2 commented Mar 18, 2026

Uh oh!

ispobock commented Mar 19, 2026

Uh oh!

vedantjh2 commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vedantjh2 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Files Changed

Profiling Results

Baseline (einops)

Optimized (native torch)

Correctness

Uh oh!

ispobock commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vedantjh2 commented Mar 12, 2026

Uh oh!

vedantjh2 commented Mar 13, 2026

Uh oh!

vedantjh2 commented Mar 13, 2026

Uh oh!

vedantjh2 commented Mar 18, 2026

Uh oh!

ispobock commented Mar 19, 2026

Uh oh!

vedantjh2 commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vedantjh2 commented Mar 12, 2026 •

edited

Loading

ispobock commented Mar 12, 2026 •

edited

Loading