[Bug] GLM 5 Crashes at nsa_backend on B200

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [x] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug

server crashes with the following errors
```
ERROR 2026-02-18T14:43:05.821451064Z [severity: ERROR] [2026-02-18 14:43:05 TP1] Scheduler hit an exception: Traceback (most recent call last):
ERROR 2026-02-18T14:43:05.821474181Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3076, in run_scheduler_process
ERROR 2026-02-18T14:43:05.821477155Z [severity: ERROR] scheduler.event_loop_overlap()
ERROR 2026-02-18T14:43:05.821482413Z [severity: ERROR] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
ERROR 2026-02-18T14:43:05.821485104Z [severity: ERROR] return func(*args, **kwargs)
ERROR 2026-02-18T14:43:05.821488163Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821499199Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1123, in event_loop_overlap
ERROR 2026-02-18T14:43:05.821500997Z [severity: ERROR] batch_result = self.run_batch(batch)
ERROR 2026-02-18T14:43:05.821502716Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821504270Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2279, in run_batch
ERROR 2026-02-18T14:43:05.821506070Z [severity: ERROR] batch_result = self.model_worker.forward_batch_generation(
ERROR 2026-02-18T14:43:05.821508102Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821510806Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 675, in forward_batch_generation
ERROR 2026-02-18T14:43:05.821513286Z [severity: ERROR] batch_output = self.verify(model_worker_batch)
ERROR 2026-02-18T14:43:05.821514867Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821517654Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 701, in verify
ERROR 2026-02-18T14:43:05.821519563Z [severity: ERROR] verify_input.prepare_for_v2_verify(
ERROR 2026-02-18T14:43:05.821521257Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_info_v2.py", line 250, in prepare_for_v2_verify
ERROR 2026-02-18T14:43:05.821523072Z [severity: ERROR] target_worker.model_runner.graph_runner.replay_prepare(verify_forward_batch)
ERROR 2026-02-18T14:43:05.821524967Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 826, in replay_prepare
ERROR 2026-02-18T14:43:05.821527137Z [severity: ERROR] attn_backend.init_forward_metadata_replay_cuda_graph(
ERROR 2026-02-18T14:43:05.821529152Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/nsa_backend.py", line 980, in init_forward_metadata_replay_cuda_graph
ERROR 2026-02-18T14:43:05.821530978Z [severity: ERROR] metadata.page_table_1[:, :max_seqlen_k].copy_(page_indices)
ERROR 2026-02-18T14:43:05.821532661Z [severity: ERROR] RuntimeError: The size of tensor a (202752) must match the size of tensor b (202754) at non-singleton dimension 1


ERROR 2026-02-18T14:43:05.821535921Z [severity: ERROR] [2026-02-18 14:43:05 TP0] Scheduler hit an exception: Traceback (most recent call last):
ERROR 2026-02-18T14:43:05.821537326Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3076, in run_scheduler_process
ERROR 2026-02-18T14:43:05.821541352Z [severity: ERROR] scheduler.event_loop_overlap()
ERROR 2026-02-18T14:43:05.821543381Z [severity: ERROR] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
ERROR 2026-02-18T14:43:05.821545083Z [severity: ERROR] return func(*args, **kwargs)
ERROR 2026-02-18T14:43:05.821546569Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821548477Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1123, in event_loop_overlap
ERROR 2026-02-18T14:43:05.821550350Z [severity: ERROR] batch_result = self.run_batch(batch)
ERROR 2026-02-18T14:43:05.821552661Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821555338Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2279, in run_batch
ERROR 2026-02-18T14:43:05.821557515Z [severity: ERROR] batch_result = self.model_worker.forward_batch_generation(
ERROR 2026-02-18T14:43:05.821560725Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821562957Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 675, in forward_batch_generation
ERROR 2026-02-18T14:43:05.821564657Z [severity: ERROR] batch_output = self.verify(model_worker_batch)
ERROR 2026-02-18T14:43:05.821566718Z [severity: ERROR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 2026-02-18T14:43:05.821568432Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker_v2.py", line 701, in verify
ERROR 2026-02-18T14:43:05.821570010Z [severity: ERROR] verify_input.prepare_for_v2_verify(
ERROR 2026-02-18T14:43:05.821575207Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_info_v2.py", line 250, in prepare_for_v2_verify
ERROR 2026-02-18T14:43:05.821576791Z [severity: ERROR] target_worker.model_runner.graph_runner.replay_prepare(verify_forward_batch)
ERROR 2026-02-18T14:43:05.821578256Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 826, in replay_prepare
ERROR 2026-02-18T14:43:05.821579762Z [severity: ERROR] attn_backend.init_forward_metadata_replay_cuda_graph(
ERROR 2026-02-18T14:43:05.821581197Z [severity: ERROR] File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/nsa_backend.py", line 980, in init_forward_metadata_replay_cuda_graph
ERROR 2026-02-18T14:43:05.821583340Z [severity: ERROR] metadata.page_table_1[:, :max_seqlen_k].copy_(page_indices)
ERROR 2026-02-18T14:43:05.821584871Z [severity: ERROR] RuntimeError: The size of tensor a (202752) must match the size of tensor b (202754) at non-singleton dimension 1
```

From this https://huggingface.co/zai-org/GLM-5/blob/main/config.json, the max_position_embeddings is `202752`, but the code is trying to copy a sequence of length 202754 (Tensor b) into it.

### Reproduction

Image: lmsysorg/sglang:glm5-blackwell

Command used: 
env flags:
```
SGLANG_ENABLE_SPEC_V2=1
SGLANG_NSA_FORCE_MLA=1
```

```
python3 -m sglang.launch_server \
  --model-path zai-org/GLM-5-FP8 \
  --tp-size 8 \
  --tool-call-parser glm47  \
  --reasoning-parser glm45 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --trust-remote-code \
  --model-loader-extra-config='{"enable_multithread_load": "true","num_threads": 64}' \
  --host 0.0.0.0 \
  --port 7080
```

### Environment

```
root@5b1f3303eba3:/sgl-workspace/sglang# python3 -m sglang.check_env
Python: 3.12.3 (main, Jan 22 2026, 20:57:42) [GCC 13.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA B200
GPU 0,1,2,3,4,5,6,7 Compute Capability: 10.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 580.95.05
PyTorch: 2.9.1+cu129
sglang: 0.0.0.dev0
sgl_kernel: 0.3.21
flashinfer_python: 0.6.2
flashinfer_cubin: 0.6.2
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 5.2.0.dev0
torchao: 0.9.0
numpy: 2.3.5
aiohttp: 3.13.3
fastapi: 0.128.7
hf_transfer: 0.1.9
huggingface_hub: 1.4.1
interegular: 0.3.3
modelscope: 1.34.0
orjson: 3.11.7
outlines: 0.1.11
packaging: 25.0
psutil: 7.2.2
pydantic: 2.12.5
python-multipart: 0.0.22
pyzmq: 27.1.0
uvicorn: 0.40.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.79.0
litellm: Module Not Found
decord2: 3.0.0
NVIDIA Topology: 
	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NV18	NV18	NV18	NV18	NV18	NV18	NV18	0-55,112-167	0		N/A
GPU1	NV18	 X 	NV18	NV18	NV18	NV18	NV18	NV18	0-55,112-167	0		N/A
GPU2	NV18	NV18	 X 	NV18	NV18	NV18	NV18	NV18	0-55,112-167	0		N/A
GPU3	NV18	NV18	NV18	 X 	NV18	NV18	NV18	NV18	0-55,112-167	0		N/A
GPU4	NV18	NV18	NV18	NV18	 X 	NV18	NV18	NV18	56-111,168-223	1		N/A
GPU5	NV18	NV18	NV18	NV18	NV18	 X 	NV18	NV18	56-111,168-223	1		N/A
GPU6	NV18	NV18	NV18	NV18	NV18	NV18	 X 	NV18	56-111,168-223	1		N/A
GPU7	NV18	NV18	NV18	NV18	NV18	NV18	NV18	 X 	56-111,168-223	1		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Hypervisor vendor:: KVM
ulimit soft: 1024
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] GLM 5 Crashes at nsa_backend on B200 #18980

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] GLM 5 Crashes at nsa_backend on B200 #18980

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions