[Bug] TypeError in NSA backend: torch.repeat_interleave called with repeats=list during DeepSeek-V3.2  DEEPGEMM warm up (nightly docker)

### Checklist

- [x] I searched related issues but found no solution.
- [ ] The bug persists in the latest version.
- [ ] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [ ] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [ ] Please use English. Otherwise, it will be closed.

### Describe the bug

When launching SGLang with DeepSeek-V3.2 and speculative decoding (EAGLE), the scheduler crashes at startup with:
```
TypeError: repeat_interleave() received an invalid combination of arguments - got (Tensor, dim=int, repeats=list)
```
This happens inside python/sglang/srt/layers/attention/nsa_backend.py during init_forward_metadata() 
PyTorch documentation indicates torch.repeat_interleave expects repeats to be a Tensor or int, not a Python list.  ￼

Also, SGLang docs mention that DeepSeek V3.2 uses the NSA attention backend by default (unless overridden).  ￼

### Reproduction

Using Docker Compose:
```
version: "3.9"
services:
  deepseek-v32:
    image: lmsysorg/sglang:nightly-dev-20251218-d20699a3
    container_name: sglang-deepseek-v32
    restart: unless-stopped
    ports:
      - "40000:30000"
    shm_size: "256g"
    ipc: host
    volumes:
      - /data3/models/DeepSeek-V3.2:/models/DeepSeek-V3.2:ro
      - /data3/deepgemm-cache-20251219:/root/.cache/deep_gemm
    environment:
      - HF_HOME=/data3/hf-cache
      - HUGGINGFACE_HUB_CACHE=/data3/hf-cache
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
    runtime: nvidia
    command: >
      python -m sglang.compile_deep_gemm
      --model-path /models/DeepSeek-V3.2
      --tp 8
      --dp 1
      --enable-dp-attention
      --host 0.0.0.0
      --port 30000
      --reasoning-parser deepseek-v3
      --tool-call-parser deepseekv32
      --speculative-algorithm EAGLE
      --speculative-num-steps 3
      --speculative-eagle-topk 1
      --speculative-num-draft-tokens 4
      --mem-fraction-static 0.85
      --max-running-requests 64
      --max-prefill-tokens 32768
      --chunked-prefill-size 8192
      --log-requests
      --log-requests-level 3
```

SGLang crashes with the following stack trace:
```
Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2798, in run_scheduler_process
    scheduler.event_loop_normal()
  ...
  File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/nsa_backend.py", line 436, in init_forward_metadata
    page_table = torch.repeat_interleave(..., dim=..., repeats=[...])
TypeError: repeat_interleave() received an invalid combination of arguments - got (Tensor, dim=int, repeats=list), but expected one of:
 * (Tensor input, Tensor repeats, int dim = None, *, int output_size = None)
 * (Tensor input, int repeats, int dim = None, *, int output_size = None)
```

And Fixed with this patch: 
```
diff --git a/python/sglang/srt/layers/attention/nsa_backend.py b/python/sglang/srt/layers/attention/nsa_backend.py
index 18b1b9daf..4202501b1 100644
--- a/python/sglang/srt/layers/attention/nsa_backend.py
+++ b/python/sglang/srt/layers/attention/nsa_backend.py
@@ -435,7 +435,7 @@ class NativeSparseAttnBackend(
                 # after verification. Lengths vary per request based on how many tokens
                 # were accepted.
                 page_table = torch.repeat_interleave(
-                    page_table, repeats=extend_seq_lens_cpu, dim=0
+                    page_table, repeats=forward_batch.extend_seq_lens, dim=0
                 )
 
         elif forward_batch.forward_mode.is_extend():

```

### Environment

Python: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H200
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 570.86.10
PyTorch: 2.9.1+cu129
sglang: 0.5.6
sgl_kernel: 0.3.18.post2
flashinfer_python: 0.5.3
flashinfer_cubin: 0.5.3
flashinfer_jit_cache: Module Not Found
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.3.5
aiohttp: 3.13.2
fastapi: 0.123.5
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.32.0
orjson: 3.11.4
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.3
pydantic: 2.12.5
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.38.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.75.0
litellm: Module Not Found
decord2: 2.0.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 NODE NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 NODE PIX SYS SYS 0,2,4,6,8,10 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS PIX NODE 1,3,5,7,9,11 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS NODE NODE 1,3,5,7,9,11 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS NODE PIX 1,3,5,7,9,11 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS NODE NODE 1,3,5,7,9,11 1 N/A
NIC0 PIX NODE NODE NODE SYS SYS SYS SYS X NODE SYS SYS
NIC1 NODE NODE NODE PIX SYS SYS SYS SYS NODE X SYS SYS
NIC2 SYS SYS SYS SYS PIX NODE NODE NODE SYS SYS X NODE
NIC3 SYS SYS SYS SYS NODE NODE PIX NODE SYS SYS NODE X

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3

ulimit soft: 1048576

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] TypeError in NSA backend: torch.repeat_interleave called with repeats=list during DeepSeek-V3.2 DEEPGEMM warm up (nightly docker) #15428

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] TypeError in NSA backend: torch.repeat_interleave called with repeats=list during DeepSeek-V3.2 DEEPGEMM warm up (nightly docker) #15428

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions