Checklist
Describe the bug
server crashes with the following errors
ERROR 2026-02-18T14:43:05.821451064Z [severity: ERROR] [2026-02-18 14:43:05 TP1] Scheduler hit an exception: Traceback (most recent call last):
ERROR 2026-02-18T14:43:05.821474181Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3076, in run_scheduler_process
ERROR 2026-02-18T14:43:05.821477155Z [severity: ERROR] scheduler.event_loop_overlap()
ERROR 2026-02-18T14:43:05.821482413Z [severity: ERROR] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
ERROR 2026-02-18T14:43:05.821485104Z [severity: ERROR] return func(*args, **kwargs)
ERROR 2026-02-18T14:43:05.821488163Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821499199Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1123, in event_loop_overlap
ERROR 2026-02-18T14:43:05.821500997Z [severity: ERROR] batch_result = self.run_batch(batch)
ERROR 2026-02-18T14:43:05.821502716Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821504270Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2279, in run_batch
ERROR 2026-02-18T14:43:05.821506070Z [severity: ERROR] batch_result = self.model_worker.forward_batch_generation(
ERROR 2026-02-18T14:43:05.821508102Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821510806Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 675, in forward_batch_generation
ERROR 2026-02-18T14:43:05.821513286Z [severity: ERROR] batch_output = self.verify(model_worker_batch)
ERROR 2026-02-18T14:43:05.821514867Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821517654Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 701, in verify
ERROR 2026-02-18T14:43:05.821519563Z [severity: ERROR] verify_input.prepare_for_v2_verify(
ERROR 2026-02-18T14:43:05.821521257Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_info_v2.py", line 250, in prepare_for_v2_verify
ERROR 2026-02-18T14:43:05.821523072Z [severity: ERROR] target_worker.model_runner.graph_runner.replay_prepare(verify_forward_batch)
ERROR 2026-02-18T14:43:05.821524967Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 826, in replay_prepare
ERROR 2026-02-18T14:43:05.821527137Z [severity: ERROR] attn_backend.init_forward_metadata_replay_cuda_graph(
ERROR 2026-02-18T14:43:05.821529152Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/nsa_backend.py", line 980, in init_forward_metadata_replay_cuda_graph
ERROR 2026-02-18T14:43:05.821530978Z [severity: ERROR] metadata.page_table_1[:, :max_seqlen_k].copy_(page_indices)
ERROR 2026-02-18T14:43:05.821532661Z [severity: ERROR] RuntimeError: The size of tensor a (202752) must match the size of tensor b (202754) at non-singleton dimension 1
ERROR 2026-02-18T14:43:05.821535921Z [severity: ERROR] [2026-02-18 14:43:05 TP0] Scheduler hit an exception: Traceback (most recent call last):
ERROR 2026-02-18T14:43:05.821537326Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3076, in run_scheduler_process
ERROR 2026-02-18T14:43:05.821541352Z [severity: ERROR] scheduler.event_loop_overlap()
ERROR 2026-02-18T14:43:05.821543381Z [severity: ERROR] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
ERROR 2026-02-18T14:43:05.821545083Z [severity: ERROR] return func(*args, **kwargs)
ERROR 2026-02-18T14:43:05.821546569Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821548477Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1123, in event_loop_overlap
ERROR 2026-02-18T14:43:05.821550350Z [severity: ERROR] batch_result = self.run_batch(batch)
ERROR 2026-02-18T14:43:05.821552661Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821555338Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2279, in run_batch
ERROR 2026-02-18T14:43:05.821557515Z [severity: ERROR] batch_result = self.model_worker.forward_batch_generation(
ERROR 2026-02-18T14:43:05.821560725Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821562957Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 675, in forward_batch_generation
ERROR 2026-02-18T14:43:05.821564657Z [severity: ERROR] batch_output = self.verify(model_worker_batch)
ERROR 2026-02-18T14:43:05.821566718Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821568432Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 701, in verify
ERROR 2026-02-18T14:43:05.821570010Z [severity: ERROR] verify_input.prepare_for_v2_verify(
ERROR 2026-02-18T14:43:05.821575207Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_info_v2.py", line 250, in prepare_for_v2_verify
ERROR 2026-02-18T14:43:05.821576791Z [severity: ERROR] target_worker.model_runner.graph_runner.replay_prepare(verify_forward_batch)
ERROR 2026-02-18T14:43:05.821578256Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 826, in replay_prepare
ERROR 2026-02-18T14:43:05.821579762Z [severity: ERROR] attn_backend.init_forward_metadata_replay_cuda_graph(
ERROR 2026-02-18T14:43:05.821581197Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/nsa_backend.py", line 980, in init_forward_metadata_replay_cuda_graph
ERROR 2026-02-18T14:43:05.821583340Z [severity: ERROR] metadata.page_table_1[:, :max_seqlen_k].copy_(page_indices)
ERROR 2026-02-18T14:43:05.821584871Z [severity: ERROR] RuntimeError: The size of tensor a (202752) must match the size of tensor b (202754) at non-singleton dimension 1
From this https://huggingface.co/zai-org/GLM-5/blob/main/config.json, the max_position_embeddings is 202752, but the code is trying to copy a sequence of length 202754 (Tensor b) into it.
Reproduction
Image: lmsysorg/sglang:glm5-blackwell
Command used:
env flags:
SGLANG_ENABLE_SPEC_V2=1
SGLANG_NSA_FORCE_MLA=1
python3 -m sglang.launch_server \
--model-path zai-org/GLM-5-FP8 \
--tp-size 8 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--speculative-algorithm EAGLE \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--trust-remote-code \
--model-loader-extra-config='{"enable_multithread_load": "true","num_threads": 64}' \
--host 0.0.0.0 \
--port 7080
Environment
root@5b1f3303eba3:/sgl-workspace/sglang# python3 -m sglang.check_env
Python: 3.12.3 (main, Jan 22 2026, 20:57:42) [GCC 13.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA B200
GPU 0,1,2,3,4,5,6,7 Compute Capability: 10.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 580.95.05
PyTorch: 2.9.1+cu129
sglang: 0.0.0.dev0
sgl_kernel: 0.3.21
flashinfer_python: 0.6.2
flashinfer_cubin: 0.6.2
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 5.2.0.dev0
torchao: 0.9.0
numpy: 2.3.5
aiohttp: 3.13.3
fastapi: 0.128.7
hf_transfer: 0.1.9
huggingface_hub: 1.4.1
interegular: 0.3.3
modelscope: 1.34.0
orjson: 3.11.7
outlines: 0.1.11
packaging: 25.0
psutil: 7.2.2
pydantic: 2.12.5
python-multipart: 0.0.22
pyzmq: 27.1.0
uvicorn: 0.40.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.79.0
litellm: Module Not Found
decord2: 3.0.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 0-55,112-167 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 0-55,112-167 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 0-55,112-167 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 0-55,112-167 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 56-111,168-223 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 56-111,168-223 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 56-111,168-223 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X 56-111,168-223 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Hypervisor vendor:: KVM
ulimit soft: 1024
Checklist
Describe the bug
server crashes with the following errors
From this https://huggingface.co/zai-org/GLM-5/blob/main/config.json, the max_position_embeddings is
202752, but the code is trying to copy a sequence of length 202754 (Tensor b) into it.Reproduction
Image: lmsysorg/sglang:glm5-blackwell
Command used:
env flags:
Environment