Skip to content

[Bug] Blackwell EAGLE Deepseek MTP crashes when using rep/freq penalty #7585

@0xymoro

Description

@0xymoro

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Running into some weird issues with spec decode for deepseek, it seems to crash only on higher batch sizes like:
--speculative-algorithm EAGLE
--speculative-num-steps 2
--speculative-eagle-topk 1
--speculative-num-draft-tokens 4 \

Hunch is that it's related to num steps.

Gemini says in eagle_utils.py, the RuntimeError you encountered was caused by a mismatch between the number of requests being processed and the data used to process them during the verification step of speculative decoding. Specifically, when some requests in a batch were filtered out, the corresponding sampling_info (which holds the logit_bias for each request) was not updated. This resulted in the error you saw, where the logits tensor had a batch size of 7 while the logit_bias tensor still had a size of 8.

and that this may help:

if bs != len(batch.reqs):
sampling_info = copy.deepcopy(sampling_info)
# NOTE: retrive_index are the indices of the requests that are kept.
sampling_info.filter_batch(self.retrive_index.tolist(), self.retrive_index)

Trace:

[2025-06-26 19:44:51 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2647, in run_scheduler_process
scheduler.event_loop_normal()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 732, in event_loop_normal
result = self.run_batch(batch)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1701, in run_batch
) = self.draft_worker.forward_batch_speculative_generation(batch)
File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker.py", line 323, in forward_batch_speculative_generation
self.verify(batch, spec_info)
File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker.py", line 685, in verify
res: EagleVerifyOutput = spec_info.verify(
File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_utils.py", line 381, in verify
sampling_info.apply_logits_bias(linear_penalty)
File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 223, in apply_logits_bias
logits.add
(self.logit_bias)
RuntimeError: The size of tensor a (19) must match the size of tensor b (24) at non-singleton dimension 0

Reproduction

Deepseek, using tp-8 and MTP:

--speculative-algorithm EAGLE
--speculative-num-steps 2
--speculative-eagle-topk 1
--speculative-num-draft-tokens 4 \

Happened on lmsysorg/sglang:v0.4.8-cu128-b200 image and only on higher batch sizes like 20 or so.

Environment

Blackwell, see above

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions