[Speculative] Support penalty for spec v2 overlap scheduling by YMbmzy · Pull Request #22049 · sgl-project/sglang

YMbmzy · 2026-04-03T13:44:02Z

Motivation

Closes the penalty support item in #11762.

Spec v2 (overlap scheduling) previously ignored frequency_penalty,
presence_penalty, repetition_penalty, and logit_bias during verification,
silently producing unpenalized outputs.

Modifications

Two changes in python/sglang/srt/speculative/eagle_info_v2.py:

Apply penalties during verify sampling (sample())
- Apply acc_additive_penalties, acc_scaling_penalties, and logit_bias
  directly to verify logits, each broadcast via repeat_interleave to match the
  (bs * draft_token_num, V) shape
- Mirrors apply_logits_bias() logic with per-field expansion; follows the
  same relaxed approximation as spec v1
Accumulate penalty state per decode round (prepare_for_decode())
- Feed the last accepted token per request into
  penalizer_orchestrator.cumulate_output_tokens() to keep penalty counters
  up-to-date

Test (test/registered/spec/eagle/test_eagle_infer_beta.py):

Added test_penalty() — concurrent requests with varied penalty combinations
and differentiated max_new_tokens to exercise filter_batch

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Follow the SGLang code style guidance.

YMbmzy · 2026-04-03T13:45:57Z

Hi @hnyls2002, could a maintainer help trigger CI? This PR adds penalty support (frequency_penalty, presence_penalty, logit_bias) for spec v2 overlap scheduling, as listed in #11762. Thanks!

gemini-code-assist

Code Review

This pull request implements a relaxed version of penalty accumulation and application for Eagle speculative decoding (v2) and includes a new test case to verify these parameters. Feedback highlights a logic error where only the last token is accumulated instead of all newly accepted tokens, a runtime error caused by a typo in the attribute name acc_linear_penalties, and the omission of scaling penalties like repetition penalty in the current implementation.

gemini-code-assist · 2026-04-03T13:50:38Z

+            output_ids = torch.tensor(
+                [
+                    (
+                        req.output_ids[-1]
+                        if len(req.output_ids)
+                        else req.origin_input_ids[-1]
+                    )
+                    for req in batch.reqs
+                ],
+                dtype=torch.int64,
+                device=batch.device,
+            )


In speculative decoding, multiple tokens can be accepted in a single verify round. This logic only accumulates the last accepted token (req.output_ids[-1]) into the penalizer state. All newly accepted tokens from the previous round should be accumulated to ensure frequency_penalty and presence_penalty counters are accurate.

Additionally, creating a new tensor from a list in a loop can be inefficient for large batches; consider gathering the IDs more efficiently if they are already available in a tensor format.

merrymercy · 2026-04-06T17:38:19Z

/tag-and-rerun-ci

hnyls2002 · 2026-04-09T08:32:06Z

/rerun-test registered/spec/eagle/test_eagle_infer_beta.py

github-actions · 2026-04-09T08:32:35Z

✅ 1-gpu-5090 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py

hnyls2002 · 2026-04-09T08:33:08Z

/rerun-test registered/spec/eagle/test_eagle_infer_b.py

github-actions · 2026-04-09T08:33:42Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_b.py

github-actions · 2026-04-09T08:49:55Z

✅ 1-gpu-5090 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py

…ject#22049)

[Speculative] Support penalty for spec v2 overlap scheduling

ddddebd

YMbmzy requested review from Ying1123, hnyls2002 and merrymercy as code owners April 3, 2026 13:44

gemini-code-assist Bot reviewed Apr 3, 2026

View reviewed changes

YMbmzy added 2 commits April 6, 2026 21:53

fix: spec-v2 eagle verify penalty

42ad751

Merge branch 'main' into spec-v2-penalty

07d0255

github-actions Bot added the run-ci label Apr 6, 2026

hnyls2002 merged commit 8a67fb2 into sgl-project:main Apr 9, 2026
109 of 128 checks passed

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[Speculative] Support penalty for spec v2 overlap scheduling (sgl-pro…

90b32d1

…ject#22049)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative] Support penalty for spec v2 overlap scheduling#22049

[Speculative] Support penalty for spec v2 overlap scheduling#22049
hnyls2002 merged 3 commits intosgl-project:mainfrom
YMbmzy:spec-v2-penalty

YMbmzy commented Apr 3, 2026 •

edited

Loading

Uh oh!

YMbmzy commented Apr 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 3, 2026

Uh oh!

Uh oh!

merrymercy commented Apr 6, 2026

Uh oh!

hnyls2002 commented Apr 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

hnyls2002 commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YMbmzy commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

YMbmzy commented Apr 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

merrymercy commented Apr 6, 2026

Uh oh!

hnyls2002 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

hnyls2002 commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YMbmzy commented Apr 3, 2026 •

edited

Loading

hnyls2002 commented Apr 9, 2026 •

edited

Loading