[Speculative] Fix Eagle3/DFLASH aux hidden state capture during CUDA graph init by merrymercy · Pull Request #22836 · sgl-project/sglang

merrymercy · 2026-04-15T01:10:57Z

Summary

Move set_eagle3_layers_to_capture() and set_dflash_layers_to_capture() into a new init_aux_hidden_state_capture() method
Call it BEFORE init_device_graphs() in ModelRunner.initialize() so CUDA graphs are captured with aux hidden state paths enabled
Remove redundant calls from CudaGraphRunner.__init__() and _dummy_run()

Motivation

Previously set_eagle3_layers_to_capture() was called AFTER CUDA graph capture in initialize(), so the captured graphs ran without aux hidden state capture enabled. For Eagle3 this caused zero acceptance length at runtime. The CudaGraphRunner had a workaround that called set_eagle3_layers_to_capture() without args, which used default layer IDs instead of config-specified ones — breaking models with custom eagle_aux_hidden_state_layer_ids.

Test plan

Tested Eagle3 spec decode with Llama-3.1-8B-Instruct + CUDA graphs: non-zero acceptance length confirmed
Tested with custom eagle_aux_hidden_state_layer_ids from config

…graph init Move `set_eagle3_layers_to_capture()` and `set_dflash_layers_to_capture()` into a new `init_aux_hidden_state_capture()` method and call it BEFORE `init_device_graphs()` in `ModelRunner.initialize()`. Previously these were called AFTER CUDA graph capture, so the captured graphs ran without aux hidden state capture enabled. For Eagle3 this caused zero acceptance length at runtime. The `CudaGraphRunner` had a workaround that called `set_eagle3_layers_to_capture()` without args, which used default layer IDs instead of the config-specified ones. Remove the redundant aux hidden state setup from `CudaGraphRunner.__init__()` and `_dummy_run()`.

gemini-code-assist · 2026-04-15T01:11:01Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

merrymercy · 2026-04-15T01:14:43Z


        self.tbo_plugin = TboCudaGraphRunnerPlugin()

-        # Speculative_inference


we should call this in model_runner.py::initialize instead of the cuda graph runner.

The old code call this similar code twice with different arguments, which is wrong.

merrymercy · 2026-04-15T01:15:08Z

/tag-and-rerun-ci

…graph init (sgl-project#22836)

…graph init (sgl-project#22836) (cherry picked from commit 43925d1)

merrymercy requested review from Fridge003, Ying1123, hnyls2002 and ispobock as code owners April 15, 2026 01:10

hnyls2002 assigned Qiaolin-Yu Apr 15, 2026

merrymercy commented Apr 15, 2026

View reviewed changes

Comment thread python/sglang/srt/model_executor/cuda_graph_runner.py Outdated

Apply suggestions from code review

4d925d1

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

merrymercy commented Apr 15, 2026

View reviewed changes

github-actions Bot added the run-ci label Apr 15, 2026

Fix pre-commit formatting: remove extra blank line

74e1eab

merrymercy added the high priority label Apr 15, 2026

merrymercy merged commit 43925d1 into main Apr 15, 2026
340 of 443 checks passed

merrymercy deleted the lianmin/fix-eagle-capture branch April 15, 2026 21:04

jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026

[Speculative] Fix Eagle3/DFLASH aux hidden state capture during CUDA …

5509e43

…graph init (sgl-project#22836)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[Speculative] Fix Eagle3/DFLASH aux hidden state capture during CUDA …

ef4daf9

…graph init (sgl-project#22836)

zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026

[Speculative] Fix Eagle3/DFLASH aux hidden state capture during CUDA …

6ba8053

…graph init (sgl-project#22836)

kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026

[Speculative] Fix Eagle3/DFLASH aux hidden state capture during CUDA …

a496c53

…graph init (sgl-project#22836)

empty-quiver pushed a commit to empty-quiver/sglang-turboquant that referenced this pull request Apr 28, 2026

[Speculative] Fix Eagle3/DFLASH aux hidden state capture during CUDA …

ad646e4

…graph init (sgl-project#22836) (cherry picked from commit 43925d1)

empty-quiver pushed a commit to empty-quiver/sglang-turboquant that referenced this pull request Apr 28, 2026

[Speculative] Fix Eagle3/DFLASH aux hidden state capture during CUDA …

a761228

…graph init (sgl-project#22836) (cherry picked from commit 43925d1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative] Fix Eagle3/DFLASH aux hidden state capture during CUDA graph init#22836

[Speculative] Fix Eagle3/DFLASH aux hidden state capture during CUDA graph init#22836
merrymercy merged 3 commits intomainfrom
lianmin/fix-eagle-capture

merrymercy commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot commented Apr 15, 2026

Uh oh!

Uh oh!

merrymercy Apr 15, 2026

Uh oh!

merrymercy commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		self.tbo_plugin = TboCudaGraphRunnerPlugin()

		# Speculative_inference

Conversation

merrymercy commented Apr 15, 2026

Summary

Motivation

Test plan

Uh oh!

gemini-code-assist Bot commented Apr 15, 2026

Uh oh!

Uh oh!

merrymercy Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

merrymercy commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants