[Bug] Blackwell EAGLE Deepseek MTP crashes when using rep/freq penalty

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

Running into some weird issues with spec decode for deepseek, it seems to crash only on higher batch sizes like:
--speculative-algorithm EAGLE \
                --speculative-num-steps 2 \
                --speculative-eagle-topk 1 \
                --speculative-num-draft-tokens 4 \

Hunch is that it's related to num steps. 

Gemini says in eagle_utils.py, the RuntimeError you encountered was caused by a mismatch between the number of requests being processed and the data used to process them during the verification step of speculative decoding. Specifically, when some requests in a batch were filtered out, the corresponding sampling_info (which holds the logit_bias for each request) was not updated. This resulted in the error you saw, where the logits tensor had a batch size of 7 while the logit_bias tensor still had a size of 8.

and that this may help: 

if bs != len(batch.reqs):
            sampling_info = copy.deepcopy(sampling_info)
            # NOTE: retrive_index are the indices of the requests that are kept.
            sampling_info.filter_batch(self.retrive_index.tolist(), self.retrive_index)


Trace:

[2025-06-26 19:44:51 TP0] Scheduler hit an exception: Traceback (most recent call last):
 File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2647, in run_scheduler_process
  scheduler.event_loop_normal()
 File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
  return func(*args, **kwargs)
 File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 732, in event_loop_normal
  result = self.run_batch(batch)
 File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1701, in run_batch
  ) = self.draft_worker.forward_batch_speculative_generation(batch)
 File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker.py", line 323, in forward_batch_speculative_generation
  self.verify(batch, spec_info)
 File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker.py", line 685, in verify
  res: EagleVerifyOutput = spec_info.verify(
 File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_utils.py", line 381, in verify
  sampling_info.apply_logits_bias(linear_penalty)
 File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 223, in apply_logits_bias
  logits.add_(self.logit_bias)
RuntimeError: The size of tensor a (19) must match the size of tensor b (24) at non-singleton dimension 0

### Reproduction

Deepseek, using tp-8 and MTP:

--speculative-algorithm EAGLE \
                --speculative-num-steps 2 \
                --speculative-eagle-topk 1 \
                --speculative-num-draft-tokens 4 \

Happened on lmsysorg/sglang:v0.4.8-cu128-b200 image and only on higher batch sizes like 20 or so.

### Environment

Blackwell, see above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Blackwell EAGLE Deepseek MTP crashes when using rep/freq penalty #7585

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Blackwell EAGLE Deepseek MTP crashes when using rep/freq penalty #7585

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions