Checklist
Describe the bug
[2026-03-17 08:01:28] Scheduler hit an exception: Traceback (most recent call last):
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 3130, in run_scheduler_process
scheduler = Scheduler(
^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 368, in init
self.init_model_worker()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 564, in init_model_worker
self.init_tp_model_worker()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 522, in init_tp_model_worker
self.tp_worker = TpModelWorker(
^^^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 247, in init
self._init_model_runner()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 330, in _init_model_runner
self._model_runner = ModelRunner(
^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 413, in init
self.initialize(min_per_gpu_memory)
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 609, in initialize
self.init_device_graphs()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 2156, in init_device_graphs
self.graph_runner = graph_runnersself.device
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 370, in init
self.capture()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 526, in capture
_capture_one_stream()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 513, in _capture_one_stream
) = self.capture_one_batch_size(bs, forward, stream_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 689, in capture_one_batch_size
attn_backend.init_forward_metadata_capture_cuda_graph(
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/layers/attention/hybrid_linear_attn_backend.py", line 1546, in init_forward_metadata_capture_cuda_graph
attn_backend.init_forward_metadata_capture_cuda_graph(
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/layers/attention/hybrid_linear_attn_backend.py", line 431, in init_forward_metadata_capture_cuda_graph
self.forward_metadata = self._capture_metadata(
^^^^^^^^^^^^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/layers/attention/hybrid_linear_attn_backend.py", line 511, in _capture_metadata
if forward_mode.is_target_verify() and spec_info.topk > 1:
^^^^^^^^^^^^^^
AttributeError: 'NgramVerifyInput' object has no attribute 'topk'
Reproduction
python -m sglang.launch_server --model-path Qwen/Qwen3.5-4B --speculative-algo NGRAM
Environment
Python: 3.12.9 (main, Mar 17 2025, 21:01:58) [Clang 20.1.0 ]
CUDA available: True
GPU 0: NVIDIA H20
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.8, V12.8.93
CUDA Driver Version: 570.124.06
PyTorch: 2.9.1+cu128
sglang: 0.5.9
sgl_kernel: 0.3.21
flashinfer_python: 0.6.3
flashinfer_cubin: 0.6.3
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.4.3
aiohttp: 3.13.3
fastapi: 0.135.1
hf_transfer: 0.1.9
huggingface_hub: 0.36.2
interegular: 0.3.3
modelscope: 1.34.0
orjson: 3.11.7
outlines: 0.1.11
packaging: 26.0
psutil: 7.2.2
pydantic: 2.12.5
python-multipart: 0.0.22
pyzmq: 27.1.0
uvicorn: 0.41.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.84.0
litellm: Module Not Found
decord2: 3.0.0
NVIDIA Topology:
GPU0 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE NODE NODE NODE PIX NODE NODE NODE NODE NODE 0-191 0 N/A
NIC0 NODE X NODE NODE NODE NODE NODE NODE NODE NODE NODE
NIC1 NODE NODE X NODE NODE NODE NODE NODE NODE NODE NODE
NIC2 NODE NODE NODE X NODE NODE NODE NODE NODE NODE NODE
NIC3 NODE NODE NODE NODE X NODE NODE NODE NODE NODE NODE
NIC4 PIX NODE NODE NODE NODE X NODE NODE NODE NODE NODE
NIC5 NODE NODE NODE NODE NODE NODE X NODE NODE NODE NODE
NIC6 NODE NODE NODE NODE NODE NODE NODE X NODE NODE NODE
NIC7 NODE NODE NODE NODE NODE NODE NODE NODE X NODE NODE
NIC8 NODE NODE NODE NODE NODE NODE NODE NODE NODE X NODE
NIC9 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_4
NIC1: mlx5_5
NIC2: mlx5_6
NIC3: mlx5_7
NIC4: mlx5_8
NIC5: mlx5_9
NIC6: mlx5_10
NIC7: mlx5_11
NIC8: mlx5_bond_0
NIC9: mlx5_bond_1
ulimit soft: 1048576
Checklist
Describe the bug
[2026-03-17 08:01:28] Scheduler hit an exception: Traceback (most recent call last):
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 3130, in run_scheduler_process
scheduler = Scheduler(
^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 368, in init
self.init_model_worker()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 564, in init_model_worker
self.init_tp_model_worker()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 522, in init_tp_model_worker
self.tp_worker = TpModelWorker(
^^^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 247, in init
self._init_model_runner()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 330, in _init_model_runner
self._model_runner = ModelRunner(
^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 413, in init
self.initialize(min_per_gpu_memory)
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 609, in initialize
self.init_device_graphs()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 2156, in init_device_graphs
self.graph_runner = graph_runnersself.device
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 370, in init
self.capture()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 526, in capture
_capture_one_stream()
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 513, in _capture_one_stream
) = self.capture_one_batch_size(bs, forward, stream_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 689, in capture_one_batch_size
attn_backend.init_forward_metadata_capture_cuda_graph(
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/layers/attention/hybrid_linear_attn_backend.py", line 1546, in init_forward_metadata_capture_cuda_graph
attn_backend.init_forward_metadata_capture_cuda_graph(
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/layers/attention/hybrid_linear_attn_backend.py", line 431, in init_forward_metadata_capture_cuda_graph
self.forward_metadata = self._capture_metadata(
^^^^^^^^^^^^^^^^^^^^^^^
File "/sglang-env/.venv/lib/python3.12/site-packages/sglang/srt/layers/attention/hybrid_linear_attn_backend.py", line 511, in _capture_metadata
if forward_mode.is_target_verify() and spec_info.topk > 1:
^^^^^^^^^^^^^^
AttributeError: 'NgramVerifyInput' object has no attribute 'topk'
Reproduction
python -m sglang.launch_server --model-path Qwen/Qwen3.5-4B --speculative-algo NGRAM
Environment
Python: 3.12.9 (main, Mar 17 2025, 21:01:58) [Clang 20.1.0 ]
CUDA available: True
GPU 0: NVIDIA H20
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.8, V12.8.93
CUDA Driver Version: 570.124.06
PyTorch: 2.9.1+cu128
sglang: 0.5.9
sgl_kernel: 0.3.21
flashinfer_python: 0.6.3
flashinfer_cubin: 0.6.3
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.4.3
aiohttp: 3.13.3
fastapi: 0.135.1
hf_transfer: 0.1.9
huggingface_hub: 0.36.2
interegular: 0.3.3
modelscope: 1.34.0
orjson: 3.11.7
outlines: 0.1.11
packaging: 26.0
psutil: 7.2.2
pydantic: 2.12.5
python-multipart: 0.0.22
pyzmq: 27.1.0
uvicorn: 0.41.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.84.0
litellm: Module Not Found
decord2: 3.0.0
NVIDIA Topology:
GPU0 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE NODE NODE NODE PIX NODE NODE NODE NODE NODE 0-191 0 N/A
NIC0 NODE X NODE NODE NODE NODE NODE NODE NODE NODE NODE
NIC1 NODE NODE X NODE NODE NODE NODE NODE NODE NODE NODE
NIC2 NODE NODE NODE X NODE NODE NODE NODE NODE NODE NODE
NIC3 NODE NODE NODE NODE X NODE NODE NODE NODE NODE NODE
NIC4 PIX NODE NODE NODE NODE X NODE NODE NODE NODE NODE
NIC5 NODE NODE NODE NODE NODE NODE X NODE NODE NODE NODE
NIC6 NODE NODE NODE NODE NODE NODE NODE X NODE NODE NODE
NIC7 NODE NODE NODE NODE NODE NODE NODE NODE X NODE NODE
NIC8 NODE NODE NODE NODE NODE NODE NODE NODE NODE X NODE
NIC9 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_4
NIC1: mlx5_5
NIC2: mlx5_6
NIC3: mlx5_7
NIC4: mlx5_8
NIC5: mlx5_9
NIC6: mlx5_10
NIC7: mlx5_11
NIC8: mlx5_bond_0
NIC9: mlx5_bond_1
ulimit soft: 1048576