Fix hybrid_linear_attn_backend crash with ngram speculation by he-yufeng · Pull Request #20739 · sgl-project/sglang

he-yufeng · 2026-03-17T03:51:55Z

Problem

hybrid_linear_attn_backend accesses spec_info.topk at runtime during target_verify mode, but NgramVerifyInput doesn't define topk, causing an AttributeError crash with --speculative-algo NGRAM.

Fix

Read topk from server_args.speculative_eagle_topk at init time instead of from spec_info at runtime. This avoids the dependency on SpecInput subtypes all defining topk, and is consistent with how the backend reads other config (pad_slot_id, device, etc.).

For ngram, speculative_eagle_topk is set to speculative_ngram_max_bfs_breadth in server_args, so tree attention branches execute correctly.

Fixes #20721

Attention backends (hybrid_linear_attn_backend, etc.) access spec_info.topk unconditionally during target_verify, but NgramVerifyInput never sets it. This crashes at server startup when using --speculative-algo NGRAM. Add topk=1 to NgramVerifyInput since ngram speculation doesn't use tree attention (unlike Eagle which has topk>1). Fixes sgl-project#20721

gemini-code-assist · 2026-03-17T03:52:00Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

kpham-sgl

The actual fix should be propagating speculative_eagle_topk to NgramVerifyInput. Its actually already set in

sglang/python/sglang/srt/server_args.py

Line 2987 in 9419453

self.speculative_eagle_topk = self.speculative_ngram_max_bfs_breadth

Conceptually, Ngram does build a spec tree (see https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/speculative/cpp_ngram/ngram.cpp#L257 and https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/speculative/cpp_ngram/ngram.cpp#L296)
The parameters that control the tree breadth and depth are

  # Tree breadth:                                                                                                                                                      
  --speculative-ngram-min-bfs-breadth (default: 1)                                                                                                                 
  --speculative-ngram-max-bfs-breadth (default: 10)                                                                                                                
                                                                                                                                                                     
  # Match window (tree depth):                                                                                                                                         
  --speculative-ngram-min-match-window-size (default: 1)                                                                                                           
  --speculative-ngram-max-match-window-size (default: 12)                                                                                                          
                                                                                                                                                                     
  # Other NGRAM params:                                                                                                                                                
  --speculative-ngram-branch-length (default: 18)                                                                                                                  
  --speculative-ngram-match-type (BFS or PROB, default: BFS)

… server_args hybrid_linear_attn_backend was the only attention backend accessing spec_info.topk at runtime. All other backends read topk from server_args.speculative_eagle_topk during __init__. This makes hybrid_linear_attn_backend consistent and removes the hardcoded self.topk = 1 from NgramVerifyInput that was papering over the issue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kpham-sgl · 2026-03-26T18:42:39Z

To make it consistent with some other attention backend, the simpler fix is to read directly from server args

sglang/python/sglang/srt/layers/attention/flashattention_backend.py

Line 367 in 3867c64

self.topk = model_runner.server_args.speculative_eagle_topk or 0

sglang/python/sglang/srt/layers/attention/nsa_backend.py

Line 339 in 3867c64

self.topk = model_runner.server_args.speculative_eagle_topk or 0

Triton instantiate its own self.topk which can also be trace back to self.topk server_args.speculative_eagle_topk

@he-yufeng lmk what you think

kpham-sgl · 2026-03-26T19:07:04Z

/tag-and-rerun-ci

hnyls2002 · 2026-04-08T19:34:46Z

/rerun-test test_hybrid_attn_backend.py test_ngram_speculative_decoding.py
(2 tries)

github-actions · 2026-04-08T19:35:44Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

github-actions · 2026-04-08T19:38:10Z

✅ 1-gpu-h100 (2 tests): View workflow run

cd test/ && python3 registered/attention/test_hybrid_attn_backend.py
cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

…ect#20739) Co-authored-by: kpham-sgl <khoa.pham@radixark.ai>

he-yufeng requested review from Ying1123, hnyls2002 and merrymercy as code owners March 17, 2026 03:51

Merge branch 'main' into fix/ngram-missing-topk

2e73812

Qiaolin-Yu assigned kpham-sgl Mar 17, 2026

kpham-sgl requested changes Mar 18, 2026

View reviewed changes

hnyls2002 self-assigned this Mar 22, 2026

kpham-sgl requested review from hanming-lu, hebiao064, yizhang2077 and yuan-luo as code owners March 26, 2026 18:37

github-actions Bot added the run-ci label Mar 26, 2026

Merge branch 'main' into fix/ngram-missing-topk

cea0aaf

kpham-sgl approved these changes Apr 7, 2026

View reviewed changes

hnyls2002 approved these changes Apr 8, 2026

View reviewed changes

Merge branch 'main' into fix/ngram-missing-topk

ea7cdec

hnyls2002 changed the title ~~Fix NgramVerifyInput missing topk attribute~~ Fix hybrid_linear_attn_backend crash with ngram speculation Apr 8, 2026

hnyls2002 merged commit c89afae into sgl-project:main Apr 8, 2026
56 of 95 checks passed

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

Fix hybrid_linear_attn_backend crash with ngram speculation (sgl-proj…

9c37b44

…ect#20739) Co-authored-by: kpham-sgl <khoa.pham@radixark.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hybrid_linear_attn_backend crash with ngram speculation#20739

Fix hybrid_linear_attn_backend crash with ngram speculation#20739
hnyls2002 merged 5 commits intosgl-project:mainfrom
he-yufeng:fix/ngram-missing-topk

he-yufeng commented Mar 17, 2026 •

edited by hnyls2002

Loading

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Uh oh!

kpham-sgl left a comment •

edited

Loading

Uh oh!

kpham-sgl commented Mar 26, 2026 •

edited

Loading

Uh oh!

kpham-sgl commented Mar 26, 2026

Uh oh!

hnyls2002 commented Apr 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

he-yufeng commented Mar 17, 2026 • edited by hnyls2002 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Uh oh!

kpham-sgl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kpham-sgl commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kpham-sgl commented Mar 26, 2026

Uh oh!

hnyls2002 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

he-yufeng commented Mar 17, 2026 •

edited by hnyls2002

Loading

kpham-sgl left a comment •

edited

Loading

kpham-sgl commented Mar 26, 2026 •

edited

Loading

hnyls2002 commented Apr 8, 2026 •

edited

Loading