Skip to content

[Speculative] Support penalty for spec v2 overlap scheduling#22049

Merged
hnyls2002 merged 3 commits intosgl-project:mainfrom
YMbmzy:spec-v2-penalty
Apr 9, 2026
Merged

[Speculative] Support penalty for spec v2 overlap scheduling#22049
hnyls2002 merged 3 commits intosgl-project:mainfrom
YMbmzy:spec-v2-penalty

Conversation

@YMbmzy
Copy link
Copy Markdown
Contributor

@YMbmzy YMbmzy commented Apr 3, 2026

Motivation

Closes the penalty support item in #11762.

Spec v2 (overlap scheduling) previously ignored frequency_penalty,
presence_penalty, repetition_penalty, and logit_bias during verification,
silently producing unpenalized outputs.

Modifications

Two changes in python/sglang/srt/speculative/eagle_info_v2.py:

  1. Apply penalties during verify sampling (sample())

    • Apply acc_additive_penalties, acc_scaling_penalties, and logit_bias
      directly to verify logits, each broadcast via repeat_interleave to match the
      (bs * draft_token_num, V) shape
    • Mirrors apply_logits_bias() logic with per-field expansion; follows the
      same relaxed approximation as spec v1
  2. Accumulate penalty state per decode round (prepare_for_decode())

    • Feed the last accepted token per request into
      penalizer_orchestrator.cumulate_output_tokens() to keep penalty counters
      up-to-date

Test (test/registered/spec/eagle/test_eagle_infer_beta.py):

  • Added test_penalty() — concurrent requests with varied penalty combinations
    and differentiated max_new_tokens to exercise filter_batch

Checklist

@YMbmzy
Copy link
Copy Markdown
Contributor Author

YMbmzy commented Apr 3, 2026

Hi @hnyls2002, could a maintainer help trigger CI? This PR adds penalty support (frequency_penalty, presence_penalty, logit_bias) for spec v2 overlap scheduling, as listed in #11762. Thanks!

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a relaxed version of penalty accumulation and application for Eagle speculative decoding (v2) and includes a new test case to verify these parameters. Feedback highlights a logic error where only the last token is accumulated instead of all newly accepted tokens, a runtime error caused by a typo in the attribute name acc_linear_penalties, and the omission of scaling penalties like repetition penalty in the current implementation.

Comment on lines +95 to +106
output_ids = torch.tensor(
[
(
req.output_ids[-1]
if len(req.output_ids)
else req.origin_input_ids[-1]
)
for req in batch.reqs
],
dtype=torch.int64,
device=batch.device,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In speculative decoding, multiple tokens can be accepted in a single verify round. This logic only accumulates the last accepted token (req.output_ids[-1]) into the penalizer state. All newly accepted tokens from the previous round should be accumulated to ensure frequency_penalty and presence_penalty counters are accurate.

Additionally, creating a new tensor from a list in a loop can be inefficient for large batches; consider gathering the IDs more efficiently if they are already available in a tensor format.

Comment thread python/sglang/srt/speculative/eagle_info_v2.py Outdated
@merrymercy
Copy link
Copy Markdown
Contributor

/tag-and-rerun-ci

@github-actions github-actions Bot added the run-ci label Apr 6, 2026
@hnyls2002
Copy link
Copy Markdown
Collaborator

hnyls2002 commented Apr 9, 2026

/rerun-test registered/spec/eagle/test_eagle_infer_beta.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

1-gpu-5090 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py

@hnyls2002
Copy link
Copy Markdown
Collaborator

/rerun-test registered/spec/eagle/test_eagle_infer_b.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_b.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

1-gpu-5090 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py

@hnyls2002 hnyls2002 merged commit 8a67fb2 into sgl-project:main Apr 9, 2026
109 of 128 checks passed
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants