Skip to content

[Spec] Fix spec_accept_rate and unify accept/draft naming#23530

Merged
hnyls2002 merged 6 commits intomainfrom
lsyin/spec-metrics-rename
Apr 28, 2026
Merged

[Spec] Fix spec_accept_rate and unify accept/draft naming#23530
hnyls2002 merged 6 commits intomainfrom
lsyin/spec-metrics-rename

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Apr 23, 2026

Summary

  • Fix spec_accept_rate bias (was (accepted+1)/(proposed+1) instead of accepted/proposed)
  • Establish internal convention: name contains accept + draft → strict drafts-only count; name contains accept without draft → includes bonus token

Bug Fix

  • Old formula (num_accepted_drafts + bs) / (bs * num_draft_tokens) had two errors:
    • Numerator included per-req bonus tokens (so zero-accept yielded 1/num_draft_tokens instead of 0)
    • Denominator counted the current-token tree slot (off by -1)
  • Errors canceled at full-accept (rate=1) but biased all intermediate rates upward by (k-m)/(k(k+1))
  • New formula num_accepted_drafts / (bs * num_proposed_drafts) is strict drafts-accepted / drafts-proposed
  • Matches the per-request meta_info["spec_accept_rate"] computation

Renames (drafts-only counts)

  • GenerationBatchResult.num_accepted_tokensnum_accepted_drafts
  • Req.spec_accepted_tokensReq.spec_accepted_drafts
  • BatchTokenIDOutput.spec_accepted_tokensspec_accepted_drafts (IPC dataclass field)
  • update_spec_metrics(num_accepted_tokens) / report_decode_stats(num_accepted_tokens=) parameter
  • Worker kwargs and local variables across eagle_worker, ngram_worker, dflash_worker, multi_layer_eagle_worker, ngram_info

External meta_info keys

  • meta_info["spec_accept_token_num"]spec_accepted_drafts
  • meta_info["spec_draft_token_num"]spec_proposed_drafts
  • meta_info["spec_accept_rate"], meta_info["spec_accept_length"], meta_info["spec_verify_ct"] unchanged

Preserved (follow convention: accept without draft → includes bonus)

  • Scheduler.spec_num_accepted_tokens / spec_total_num_accepted_tokens (per-log-interval / lifetime accumulators)
  • sglang:spec_accept_length Prometheus gauge (mean output tokens per forward, includes bonus)

Breaking

  • sglang:spec_accept_rate Prometheus value changes (intermediate rates de-biased; endpoints unchanged at 0/1)
  • meta_info key renames above

Test Plan

  • test_eagle_infer.py
  • test_ngram_infer.py

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hnyls2002 hnyls2002 force-pushed the lsyin/spec-metrics-rename branch from a65d933 to e0594c0 Compare April 23, 2026 20:06
@hnyls2002 hnyls2002 changed the title fix spec_accept_rate; rename accepted_tokens -> accepted_drafts [Spec] Fix spec_accept_rate and unify accept/draft naming Apr 24, 2026
@hnyls2002
Copy link
Copy Markdown
Collaborator Author

/rerun-test registered/8-gpu-models/test_deepseek_v3_mtp.py test_step3p5_flash_chain_mtp.py test_dsa_models_mtp.py test_deepseek_v32_fp4_mtp_4gpu.py

@github-actions
Copy link
Copy Markdown
Contributor

8-gpu-h200 (3 tests): View workflow run

cd test/ && python3 registered/8-gpu-models/test_deepseek_v3_mtp.py
cd test/ && python3 registered/8-gpu-models/test_step3p5_flash_chain_mtp.py
cd test/ && python3 registered/8-gpu-models/test_dsa_models_mtp.py

4-gpu-b200 (1 test): View workflow run

cd test/ && python3 registered/quant/test_deepseek_v32_fp4_mtp_4gpu.py

@hnyls2002 hnyls2002 merged commit cf0061d into main Apr 28, 2026
143 of 155 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/spec-metrics-rename branch April 28, 2026 21:40
vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant