Checklist
Describe the bug
When launching SGLang with DeepSeek-V3.2 and speculative decoding (EAGLE), the scheduler crashes at startup with:
TypeError: repeat_interleave() received an invalid combination of arguments - got (Tensor, dim=int, repeats=list)
This happens inside python/sglang/srt/layers/attention/nsa_backend.py during init_forward_metadata()
PyTorch documentation indicates torch.repeat_interleave expects repeats to be a Tensor or int, not a Python list. 
Also, SGLang docs mention that DeepSeek V3.2 uses the NSA attention backend by default (unless overridden). 
Reproduction
Using Docker Compose:
version: "3.9"
services:
deepseek-v32:
image: lmsysorg/sglang:nightly-dev-20251218-d20699a3
container_name: sglang-deepseek-v32
restart: unless-stopped
ports:
- "40000:30000"
shm_size: "256g"
ipc: host
volumes:
- /data3/models/DeepSeek-V3.2:/models/DeepSeek-V3.2:ro
- /data3/deepgemm-cache-20251219:/root/.cache/deep_gemm
environment:
- HF_HOME=/data3/hf-cache
- HUGGINGFACE_HUB_CACHE=/data3/hf-cache
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
runtime: nvidia
command: >
python -m sglang.compile_deep_gemm
--model-path /models/DeepSeek-V3.2
--tp 8
--dp 1
--enable-dp-attention
--host 0.0.0.0
--port 30000
--reasoning-parser deepseek-v3
--tool-call-parser deepseekv32
--speculative-algorithm EAGLE
--speculative-num-steps 3
--speculative-eagle-topk 1
--speculative-num-draft-tokens 4
--mem-fraction-static 0.85
--max-running-requests 64
--max-prefill-tokens 32768
--chunked-prefill-size 8192
--log-requests
--log-requests-level 3
SGLang crashes with the following stack trace:
Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2798, in run_scheduler_process
scheduler.event_loop_normal()
...
File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/nsa_backend.py", line 436, in init_forward_metadata
page_table = torch.repeat_interleave(..., dim=..., repeats=[...])
TypeError: repeat_interleave() received an invalid combination of arguments - got (Tensor, dim=int, repeats=list), but expected one of:
* (Tensor input, Tensor repeats, int dim = None, *, int output_size = None)
* (Tensor input, int repeats, int dim = None, *, int output_size = None)
And Fixed with this patch:
diff --git a/python/sglang/srt/layers/attention/nsa_backend.py b/python/sglang/srt/layers/attention/nsa_backend.py
index 18b1b9daf..4202501b1 100644
--- a/python/sglang/srt/layers/attention/nsa_backend.py
+++ b/python/sglang/srt/layers/attention/nsa_backend.py
@@ -435,7 +435,7 @@ class NativeSparseAttnBackend(
# after verification. Lengths vary per request based on how many tokens
# were accepted.
page_table = torch.repeat_interleave(
- page_table, repeats=extend_seq_lens_cpu, dim=0
+ page_table, repeats=forward_batch.extend_seq_lens, dim=0
)
elif forward_batch.forward_mode.is_extend():
Environment
Python: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H200
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 570.86.10
PyTorch: 2.9.1+cu129
sglang: 0.5.6
sgl_kernel: 0.3.18.post2
flashinfer_python: 0.5.3
flashinfer_cubin: 0.5.3
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.3.5
aiohttp: 3.13.2
fastapi: 0.123.5
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.32.0
orjson: 3.11.4
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.3
pydantic: 2.12.5
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.38.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.75.0
litellm: Module Not Found
decord2: 2.0.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 NODE NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 NODE PIX SYS SYS 0,2,4,6,8,10 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS PIX NODE 1,3,5,7,9,11 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS NODE NODE 1,3,5,7,9,11 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS NODE PIX 1,3,5,7,9,11 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS NODE NODE 1,3,5,7,9,11 1 N/A
NIC0 PIX NODE NODE NODE SYS SYS SYS SYS X NODE SYS SYS
NIC1 NODE NODE NODE PIX SYS SYS SYS SYS NODE X SYS SYS
NIC2 SYS SYS SYS SYS PIX NODE NODE NODE SYS SYS X NODE
NIC3 SYS SYS SYS SYS NODE NODE PIX NODE SYS SYS NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
ulimit soft: 1048576
Checklist
Describe the bug
When launching SGLang with DeepSeek-V3.2 and speculative decoding (EAGLE), the scheduler crashes at startup with:
This happens inside python/sglang/srt/layers/attention/nsa_backend.py during init_forward_metadata()
PyTorch documentation indicates torch.repeat_interleave expects repeats to be a Tensor or int, not a Python list. 
Also, SGLang docs mention that DeepSeek V3.2 uses the NSA attention backend by default (unless overridden). 
Reproduction
Using Docker Compose:
SGLang crashes with the following stack trace:
And Fixed with this patch:
Environment
Python: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H200
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 570.86.10
PyTorch: 2.9.1+cu129
sglang: 0.5.6
sgl_kernel: 0.3.18.post2
flashinfer_python: 0.5.3
flashinfer_cubin: 0.5.3
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.3.5
aiohttp: 3.13.2
fastapi: 0.123.5
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.32.0
orjson: 3.11.4
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.3
pydantic: 2.12.5
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.38.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.75.0
litellm: Module Not Found
decord2: 2.0.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 NODE NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 NODE PIX SYS SYS 0,2,4,6,8,10 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS PIX NODE 1,3,5,7,9,11 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS NODE NODE 1,3,5,7,9,11 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS NODE PIX 1,3,5,7,9,11 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS NODE NODE 1,3,5,7,9,11 1 N/A
NIC0 PIX NODE NODE NODE SYS SYS SYS SYS X NODE SYS SYS
NIC1 NODE NODE NODE PIX SYS SYS SYS SYS NODE X SYS SYS
NIC2 SYS SYS SYS SYS PIX NODE NODE NODE SYS SYS X NODE
NIC3 SYS SYS SYS SYS NODE NODE PIX NODE SYS SYS NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
ulimit soft: 1048576