Skip to content

[Bug] logit_bias can cause out of bounds for dimension exception #7670

@RenaultAI

Description

@RenaultAI

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Hi,

Token IDs can exceed the vocab_size of a model, leading to an IndexError: index X is out of bounds for dimension 1 with size 129280 error for the deepseek models. I haven't tried with other models.

I believe the issue started on version lmsysorg/sglang:v0.4.8-rocm630. When I used version version 0.4.7, I didn't see this.

[2025-07-01 02:38:17] INFO:     127.0.0.1:39434 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP5] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP0] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP4] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP1] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP7] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP3] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP2] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP6] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:26] INFO:     127.0.0.1:39034 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-07-01 02:38:26 TP0] Prefill batch. #new-seq: 1, #new-token: 1, #cached-token: 5, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-07-01 02:38:26 TP4] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26] Received sigquit from a child process. It usually means the child failed.
[2025-07-01 02:38:26 TP6] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP5] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP3] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

[2025-07-01 02:38:26 TP7] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
    new_batch = self.get_new_batch_prefill()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
    new_batch.prepare_for_extend()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
    self.sampling_info = SamplingBatchInfo.from_schedule_batch(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
    logit_bias[i, int(key)] = value
    ~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280

Reproduction

Any logit_bias token id greater than or equal to 129280, which is the vocab_size of a deepseek model, will cause an index out of bounds exception.

curl -v -H 'Content-Type: application/json' localhost:8000/v1/chat/completions -d '{ "model": "deepseek-ai/DeepSeek-V3-0324", "messages": [ { "role": "user", "content": "hi" } ], "stream": true, "logit_bias": { "129280": -100 }, "temperature": 0, "max_tokens": 10 }'

Environment

root@ryoung-amd:/sgl-workspace# python3 -m sglang.check_env
Python: 3.12.8 (main, Dec  4 2024, 08:54:12) [GCC 11.4.0]
ROCM available: True
GPU 0,1,2,3,4,5,6,7: AMD Instinct MI300X
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.4
ROCM_HOME: /opt/rocm
HIPCC: HIP version: 6.3.42131-fa1d09cbd
ROCM Driver Version: 6.12.12
PyTorch: 2.6.0a0+git8d4926e
sglang: 0.4.8
sgl_kernel: 0.2.0
flashinfer_python: Module Not Found
triton: 3.2.0+gitcddf0fc3
transformers: 4.52.3
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.6
hf_transfer: 0.1.9
huggingface_hub: 0.33.0
interegular: 0.3.3
modelscope: 1.27.1
orjson: 3.10.18
outlines: 0.1.11
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.5
python-multipart: 0.0.20
pyzmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.7.dev2+g113274a0.rocm630
xgrammar: 0.1.19
openai: 1.91.0
tiktoken: 0.7.0
anthropic: 0.55.0
litellm: 1.73.0
decord: 0.6.0
AMD Topology:


============================ ROCm System Management Interface ============================
=============================== Link Type between two GPUs ===============================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7
GPU0   0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU1   XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU2   XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI
GPU3   XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI
GPU4   XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI
GPU5   XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI
GPU6   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI
GPU7   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0
================================== End of ROCm SMI Log ===================================

Hypervisor vendor: KVM
ulimit soft: 1048576

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions