[Bug] logit_bias can cause `out of bounds for dimension` exception

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

Hi,

Token IDs can exceed the `vocab_size` of a model, leading to an `IndexError: index X is out of bounds for dimension 1 with size 129280` error for the deepseek models. I haven't tried with other models.

I believe the issue started on version `lmsysorg/sglang:v0.4.8-rocm630`. When I used version version `0.4.7`, I didn't see this.

```
[2025-07-01 02:38:17] INFO:     127.0.0.1:39434 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP5] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP0] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP4] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP1] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP7] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP3] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP2] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP6] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:26] INFO:     127.0.0.1:39034 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-07-01 02:38:26 TP0] Prefill batch. #new-seq: 1, #new-token: 1, #cached-token: 5, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-07-01 02:38:26 TP4] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26] Received sigquit from a child process. It usually means the child failed.
[2025-07-01 02:38:26 TP6] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP5] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP3] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP7] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280
```

### Reproduction

Any `logit_bias` token id greater than or equal to `129280`, which is the `vocab_size` of a deepseek model, will cause an index `out of bounds` exception.

`curl -v -H 'Content-Type: application/json' localhost:8000/v1/chat/completions -d '{ "model": "deepseek-ai/DeepSeek-V3-0324", "messages": [ { "role": "user", "content": "hi" } ], "stream": true, "logit_bias": { "129280": -100 }, "temperature": 0, "max_tokens": 10 }'`

### Environment

```
root@ryoung-amd:/sgl-workspace# python3 -m sglang.check_env
Python: 3.12.8 (main, Dec  4 2024, 08:54:12) [GCC 11.4.0]
ROCM available: True
GPU 0,1,2,3,4,5,6,7: AMD Instinct MI300X
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.4
ROCM_HOME: /opt/rocm
HIPCC: HIP version: 6.3.42131-fa1d09cbd
ROCM Driver Version: 6.12.12
PyTorch: 2.6.0a0+git8d4926e
sglang: 0.4.8
sgl_kernel: 0.2.0
flashinfer_python: Module Not Found
triton: 3.2.0+gitcddf0fc3
transformers: 4.52.3
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.6
hf_transfer: 0.1.9
huggingface_hub: 0.33.0
interegular: 0.3.3
modelscope: 1.27.1
orjson: 3.10.18
outlines: 0.1.11
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.5
python-multipart: 0.0.20
pyzmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.7.dev2+g113274a0.rocm630
xgrammar: 0.1.19
openai: 1.91.0
tiktoken: 0.7.0
anthropic: 0.55.0
litellm: 1.73.0
decord: 0.6.0
AMD Topology:


============================ ROCm System Management Interface ============================
=============================== Link Type between two GPUs ===============================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7
GPU0   0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU1   XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU2   XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI
GPU3   XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI
GPU4   XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI
GPU5   XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI
GPU6   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI
GPU7   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0
================================== End of ROCm SMI Log ===================================

Hypervisor vendor: KVM
ulimit soft: 1048576
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] logit_bias can cause `out of bounds for dimension` exception #7670

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] logit_bias can cause out of bounds for dimension exception #7670

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] logit_bias can cause `out of bounds for dimension` exception #7670