[Spec] Split `accept_length` into `num_accepted_drafts` and `num_accepted_tokens` by hnyls2002 · Pull Request #23962 · sgl-project/sglang

hnyls2002 · 2026-04-28T20:57:15Z

Follows up #23530.

Summary

Split the ambiguous accept_length into two explicit fields on EagleDraftInput / NgramVerifyInput: num_accepted_drafts (strict drafts-only) and num_accepted_tokens (includes the bonus token; equals drafts + 1 per req)
Decouple the accept_length.add_(1) in-place mutation that flipped the variable's semantics mid-function
Match the accept/draft naming convention from [Spec] Fix spec_accept_rate and unify accept/draft naming #23530: name contains draft → drafts-only; contains accept without draft → includes bonus

Why dual-tensor

Eliminates + 1 patterns scattered across attention backends and CUDA graph runners
Each consumer reads the field that matches its semantic, no derivation
Cost is one extra bs-sized int32 tensor per spec_info (~few KB), negligible

Changes

Spec info classes

EagleDraftInput.num_accepted_tokens: torch.Tensor and num_accepted_tokens_cpu: List[int] added alongside the drafts-only fields
NgramVerifyInput.num_accepted_tokens added alongside num_accepted_drafts
Set both fields together at every write site (verify kernel output, recompute on finish, V2 worker assignment, CUDA graph alias)

Lifecycle decoupling

eagle_info_v2.py:sample() returns num_accepted_drafts + 1 out-of-place instead of .add_(1) mutation
eagle_info.py:prepare_extend_after_decode() no longer mutates self.num_accepted_drafts; uses local extend_lens for the kernel call
eagle_worker_v2.py V2 path sets both fields from batch_result.accept_lens (includes bonus) and accept_lens - 1 (drafts-only)

CUDA graph runners

EagleDraftExtendInputBuffers and MultiLayerEagleDraftExtendInputBuffers add a parallel num_accepted_tokens buffer; copy and alias both fields during replay

Attention backends

aiter, flashattention, trtllm_mha, nsa, nsa_backend_mtp_precompute, wave, triton: read spec_info.num_accepted_tokens (or _cpu) directly, removing the explicit + 1

Mechanical rename

accept_length → num_accepted_drafts across spec workers, info classes, attention backends, cuda graph runners, tests
Local variables holding bonus-included values renamed back to accept_lens / accept_len (the rule applies to the value, not just the previous name)
SpeculativeMetrics.accept_length retained (vLLM-compatible metric, includes bonus)

Breaking

None at user-facing API level. All Prometheus metric names, meta_info keys, and CLI args unchanged.

Test plan

test_eagle_infer_a.py test_eagle_infer_b.py test_eagle_infer_beta.py
test_ngram_speculative_decoding.py test_dflash.py test_standalone_speculative_decoding.py
test_eagle_dp_attention.py (multi_layer_eagle_worker path)
DeepSeek V3.2 / NSA backend tests (covers nsa_backend_mtp_precompute extend path)

gemini-code-assist · 2026-04-28T20:57:19Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…pt-length # Conflicts: # python/sglang/srt/managers/io_struct.py # python/sglang/srt/managers/scheduler_output_processor_mixin.py # python/sglang/srt/managers/tokenizer_manager.py # python/sglang/srt/managers/utils.py # python/sglang/srt/speculative/dflash_info.py # python/sglang/srt/speculative/dflash_worker.py # python/sglang/srt/speculative/eagle_worker.py # python/sglang/srt/speculative/multi_layer_eagle_worker.py # python/sglang/srt/speculative/ngram_info.py # python/sglang/srt/speculative/ngram_worker.py

hnyls2002 · 2026-04-29T06:21:58Z

/rerun-stage stage-c-test-8-gpu-h20

hnyls2002 · 2026-04-29T06:22:17Z

/rerun-stage stage-c-test-4-gpu-h100

github-actions · 2026-04-29T06:22:24Z

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies). View workflow run

github-actions · 2026-04-29T06:22:49Z

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies). View workflow run

…pted_tokens` (sgl-project#23962)

hnyls2002 and others added 8 commits April 22, 2026 21:39

clarify

e89a771

rename

e0594c0

rename

a85e6a0

tiny cleanup

6df91cf

Merge branch 'main' into lsyin/spec-metrics-rename

74e9d27

Merge branch 'main' into lsyin/spec-metrics-rename

536e6bf

split accept_length.add_(1); spec_info.accept_length always drafts-only

18b7d89

rename accept_length -> num_accepted_drafts

fb3c22e

hnyls2002 requested review from 1am9trash, BBuf, Fridge003, HaiShaw, Qiaolin-Yu, Ying1123, ch-wan, fzyzcjy, hebiao064, hlu1, hubertlu-tw, ispobock, kkHuang-amd, merrymercy, sufeng-buaa and xiezhq-hermann as code owners April 28, 2026 20:57

hnyls2002 requested a review from Edwardf0t1 as a code owner April 28, 2026 20:57

github-actions Bot added the blackwell SM100/SM120 label Apr 28, 2026

hnyls2002 added 2 commits April 28, 2026 14:00

fix nsa mtp precompute extend_seq_lens missing +1

0ca242f

fix rename: caller local vars holding bonus-included values

43ea111

Base automatically changed from lsyin/spec-metrics-rename to main April 28, 2026 21:40

hnyls2002 added 3 commits April 28, 2026 14:44

store num_accepted_tokens alongside num_accepted_drafts

5d0f6a7

add num_accepted_tokens_cpu list field

3150100

hnyls2002 changed the title ~~[Spec] Make spec_info.accept_length always drafts-only; rename to num_accepted_drafts~~ [Spec] Split accept_length into num_accepted_drafts and num_accepted_tokens Apr 28, 2026

hnyls2002 added high priority run-ci labels Apr 28, 2026

hnyls2002 and others added 6 commits April 28, 2026 15:15

Merge branch 'main' into lsyin/spec-split-accept-length

67817d4

fix stale comments in aiter backend

c01d217

remove stale +1 comments (num_accepted_tokens replaces them)

a9d5da1

restore TODO about cpu vs gpu max for max_extend_len

7b7843a

revert bench_serving rename: accept_length holds bonus-included value

ea14337

revert test rename: accept_length holds bonus-included value

801e6f7

hnyls2002 merged commit bd448e5 into main Apr 29, 2026
181 of 210 checks passed

hnyls2002 deleted the lsyin/spec-split-accept-length branch April 29, 2026 07:02

hnyls2002 mentioned this pull request Apr 30, 2026

fix: rename mimo spec threshold attr to num_accepted_drafts_thres #24118

Merged

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

[Spec] Split accept_length into num_accepted_drafts and `num_acce…

65fba43

…pted_tokens` (sgl-project#23962)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec] Split `accept_length` into `num_accepted_drafts` and `num_accepted_tokens`#23962

[Spec] Split `accept_length` into `num_accepted_drafts` and `num_accepted_tokens`#23962
hnyls2002 merged 19 commits intomainfrom
lsyin/spec-split-accept-length

hnyls2002 commented Apr 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 28, 2026

Uh oh!

hnyls2002 commented Apr 29, 2026

Uh oh!

hnyls2002 commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hnyls2002 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why dual-tensor

Changes

Breaking

Test plan

Uh oh!

gemini-code-assist Bot commented Apr 28, 2026

Uh oh!

hnyls2002 commented Apr 29, 2026

Uh oh!

hnyls2002 commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hnyls2002 commented Apr 28, 2026 •

edited

Loading