[2025-07-01 02:38:17] INFO: 127.0.0.1:39434 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP5] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP0] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP4] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:17 TP1] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP7] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP3] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP2] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[aiter] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:18 TP6] [fused_moe] using default for (6, 7168, 256, 256, 8, 'ActivationType.Silu', 'torch.bfloat16', 'torch.float8_e4m3fnuz', 'torch.float8_e4m3fnuz', 'QuantType.per_128x128', True, False)
[2025-07-01 02:38:26] INFO: 127.0.0.1:39034 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[2025-07-01 02:38:26 TP0] Prefill batch. #new-seq: 1, #new-token: 1, #cached-token: 5, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-07-01 02:38:26 TP4] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
batch = self.get_next_batch_to_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
new_batch = self.get_new_batch_prefill()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
new_batch.prepare_for_extend()
File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
self.sampling_info = SamplingBatchInfo.from_schedule_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
logit_bias[i, int(key)] = value
~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280
[2025-07-01 02:38:26] Received sigquit from a child process. It usually means the child failed.
[2025-07-01 02:38:26 TP6] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
batch = self.get_next_batch_to_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
new_batch = self.get_new_batch_prefill()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
new_batch.prepare_for_extend()
File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
self.sampling_info = SamplingBatchInfo.from_schedule_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
logit_bias[i, int(key)] = value
~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280
[2025-07-01 02:38:26 TP2] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
batch = self.get_next_batch_to_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
new_batch = self.get_new_batch_prefill()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
new_batch.prepare_for_extend()
File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
self.sampling_info = SamplingBatchInfo.from_schedule_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
logit_bias[i, int(key)] = value
~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280
[2025-07-01 02:38:26 TP5] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
batch = self.get_next_batch_to_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
new_batch = self.get_new_batch_prefill()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
new_batch.prepare_for_extend()
File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
self.sampling_info = SamplingBatchInfo.from_schedule_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
logit_bias[i, int(key)] = value
~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280
[2025-07-01 02:38:26 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
batch = self.get_next_batch_to_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
new_batch = self.get_new_batch_prefill()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
new_batch.prepare_for_extend()
File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
self.sampling_info = SamplingBatchInfo.from_schedule_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
logit_bias[i, int(key)] = value
~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280
[2025-07-01 02:38:26 TP1] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
batch = self.get_next_batch_to_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
new_batch = self.get_new_batch_prefill()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
new_batch.prepare_for_extend()
File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
self.sampling_info = SamplingBatchInfo.from_schedule_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
logit_bias[i, int(key)] = value
~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280
[2025-07-01 02:38:26 TP3] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
batch = self.get_next_batch_to_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
new_batch = self.get_new_batch_prefill()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
new_batch.prepare_for_extend()
File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
self.sampling_info = SamplingBatchInfo.from_schedule_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
logit_bias[i, int(key)] = value
~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280
[2025-07-01 02:38:26 TP7] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2645, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 751, in event_loop_overlap
batch = self.get_next_batch_to_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1444, in get_next_batch_to_run
new_batch = self.get_new_batch_prefill()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1605, in get_new_batch_prefill
new_batch.prepare_for_extend()
File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 1325, in prepare_for_extend
self.sampling_info = SamplingBatchInfo.from_schedule_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/sampling/sampling_batch_info.py", line 98, in from_schedule_batch
logit_bias[i, int(key)] = value
~~~~~~~~~~^^^^^^^^^^^^^
IndexError: index 129280 is out of bounds for dimension 1 with size 129280
Checklist
Describe the bug
Hi,
Token IDs can exceed the
vocab_sizeof a model, leading to anIndexError: index X is out of bounds for dimension 1 with size 129280error for the deepseek models. I haven't tried with other models.I believe the issue started on version
lmsysorg/sglang:v0.4.8-rocm630. When I used version version0.4.7, I didn't see this.Reproduction
Any
logit_biastoken id greater than or equal to129280, which is thevocab_sizeof a deepseek model, will cause an indexout of boundsexception.curl -v -H 'Content-Type: application/json' localhost:8000/v1/chat/completions -d '{ "model": "deepseek-ai/DeepSeek-V3-0324", "messages": [ { "role": "user", "content": "hi" } ], "stream": true, "logit_bias": { "129280": -100 }, "temperature": 0, "max_tokens": 10 }'Environment