[Bug Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5.1 by zRzRzRzRzRzRzR · Pull Request #23037 · sgl-project/sglang

zRzRzRzRzRzRzR · 2026-04-17T05:54:02Z

Under long-running PD + DP + MTP, decode crashes with
vectorized_gather_kernel: index out of bounds.

Two paths feed garbage into the next cuda graph replay:

Shared buffer view. logits_output fields are views into
the cuda graph's pre-allocated output buffers. Assigning them
to spec_info directly means the next replay() overwrites
the data spec_info still holds.
Padded region staleness. EAGLEDraftCudaGraphRunner only
overwrites [:raw_bs] of its input buffers before replay.
Padded slots [raw_bs:bs] retain stale values
(e.g. topk_index = -4886055425978319163) and get gathered
as OOB indices inside the graph.

All guards are no-ops on healthy data.

gemini-code-assist

Code Review

This pull request modifies the eagle_worker.py file to explicitly clone the topk_p, topk_index, and hidden_states tensors from logits_output before assigning them to forward_batch.spec_info. This ensures that the speculative information maintains independent copies of these tensors, preventing potential issues related to shared memory or in-place modifications. I have no feedback to provide as no review comments were submitted.

zRzRzRzRzRzRzR · 2026-04-20T05:20:16Z

#15457 seems to mention this bug as well.

JustinTong0323

Nice catch on the aliasing + stale padding. Two comments inline; also worth squashing the commits (Update eagle_draft_cuda_graph_runner.py etc.) before merge.

kpham-sgl

Thank you for your contribution! This likely fix #22096 (which happen very rarely) as well.

Any chances you have a reproducible script to make sure this fixes the bug.

kpham-sgl · 2026-04-24T07:11:29Z

+            buffers.topk_p.zero_()
+            buffers.topk_index.zero_()
+            buffers.hidden_states.zero_()
+            buffers.req_pool_indices.zero_()


I think this is correct but unsure about extra overhead

Only hits the padding branch (bs != raw_bs), so the cost is bounded by padding size — hidden_states memset dominates and it's bandwidth-bound, one shot per replay. The alternative (stale OOB indices in padded slots) is what triggered the original IMA, so I'd rather eat this than make every downstream gather padding-aware.

Could tighten to a [raw_bs:bs] tail-only memset as a follow-up if it shows up in traces — happy to do that in a separate PR.

JustinTong0323 · 2026-04-27T14:21:39Z

/tag-and-rerun-ci

JustinTong0323 · 2026-04-29T09:29:49Z

/rerun-failed-ci

JustinTong0323 · 2026-04-29T11:21:50Z

/rerun-failed-ci

JustinTong0323 · 2026-04-29T18:41:50Z

/rerun-failed-ci

Qiaolin-Yu · 2026-04-29T21:31:29Z

            buffers.positions.zero_()
+            buffers.topk_p.zero_()
+            buffers.topk_index.zero_()
+            buffers.hidden_states.zero_()


The index tensor needs to be reset to 0 makes sense to me, since the padding part might cause IMA. But why do the hidden states need to be reset? My understanding is that the padding portion will be discarded anyway, so it shouldn’t have any impact at all.

The reason I still zero hidden_states is about what happens inside the captured graph, not the final output:

The graph is captured at padded shape ( bs , not raw_bs ). The captured kernels — RMSNorm, the EagleDraftInput projection, the attention that consumes hidden_states — read full bs rows, not a sliced view. Garbage on padded rows is fine on its own.

But if the previous replay left NaN/Inf in those padded slots (which does happen on GLM-5.1 + PD + DP + MTP — that's how I tracked this bug down), the captured RMSNorm reduces over a NaN row and contaminates scratch / shared buffers that valid lanes read in some configs. The padded output is discarded; the side effects on shared scratch inside the captured graph are not.

….1 (sgl-project#23037)

use clone for logits output

1ac090e

zRzRzRzRzRzRzR requested review from Qiaolin-Yu, Ying1123, hnyls2002 and merrymercy as code owners April 17, 2026 05:54

gemini-code-assist Bot reviewed Apr 17, 2026

View reviewed changes

Merge branch 'main' into glm-bugfix

6e0cc79

Update eagle_draft_cuda_graph_runner.py

d94f28f

zRzRzRzRzRzRzR changed the title ~~[GLM-5.1] Use clone for logits output for MTP layers~~ [Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5.1 Apr 20, 2026

JustinTong0323 reviewed Apr 21, 2026

View reviewed changes

Comment thread python/sglang/srt/speculative/eagle_draft_cuda_graph_runner.py Outdated

Comment thread python/sglang/srt/speculative/eagle_worker.py Outdated

zRzRzRzRzRzRzR added 3 commits April 21, 2026 14:11

[EAGLE] Assert topk_p/topk_index in-range instead of clamping

fd10793

Merge branch 'main' into glm-bugfix

a4c3c1d

Merge branch 'main' into glm-bugfix

57c24e8

zRzRzRzRzRzRzR changed the title ~~[Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5.1~~ [Bug Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5.1 Apr 21, 2026

Qiaolin-Yu assigned kpham-sgl Apr 22, 2026

Merge branch 'sgl-project:main' into glm-bugfix

172b14a

JustinTong0323 assigned Qiaolin-Yu and JustinTong0323 Apr 23, 2026

Merge branch 'main' into glm-bugfix

6546484

kpham-sgl reviewed Apr 24, 2026

View reviewed changes

zRzRzRzRzRzRzR and others added 3 commits April 27, 2026 14:37

remove spec_info clone

1fa1ed8

Merge branch 'main' into glm-bugfix

ca20e73

Merge branch 'main' into glm-bugfix

e0b111a

github-actions Bot added the run-ci label Apr 27, 2026

Merge branch 'sgl-project:main' into glm-bugfix

9d8124a

Qiaolin-Yu reviewed Apr 29, 2026

View reviewed changes

Qiaolin-Yu approved these changes May 1, 2026

View reviewed changes

hnyls2002 approved these changes May 1, 2026

View reviewed changes

Qiaolin-Yu merged commit 79bc250 into sgl-project:main May 1, 2026
388 of 453 checks passed

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

[Bug Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5…

c76d4eb

….1 (sgl-project#23037)

Conversation

zRzRzRzRzRzRzR commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

zRzRzRzRzRzRzR commented Apr 20, 2026

Uh oh!

JustinTong0323 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kpham-sgl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kpham-sgl Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

zRzRzRzRzRzRzR Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

JustinTong0323 commented Apr 27, 2026

Uh oh!

JustinTong0323 commented Apr 29, 2026

Uh oh!

JustinTong0323 commented Apr 29, 2026

Uh oh!

JustinTong0323 commented Apr 29, 2026

Uh oh!

Qiaolin-Yu Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

zRzRzRzRzRzRzR Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zRzRzRzRzRzRzR commented Apr 17, 2026 •

edited

Loading