[Bug] ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [8,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

### Checklist

- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 5. Please use English, otherwise it will be closed.

### Describe the bug

ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [8,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

### Reproduction

docker run --entrypoint bash -it --gpus all \
--network host --shm-size 32g \
-v /data/:/data/ \
--privileged=true  \
--ipc=host \
-e HF_HUB_OFFLINE=1 -e GLOO_SOCKET_IFNAME=eth0 -e NCCL_SOCKET_IFNAME=eth0 -e ENABLE_JIT_DEEPGEMM=1 -e NCCL_CUMEM_ENABLE=1  docker.1ms.run/lmsysorg/sglang:v0.4.9.post2-cu126



python3 /sgl-workspace/sglang/python/sglang/launch_server.py --model-path \
/data/models/DeepSeek-V3-0324  \
--served-model-name deepseek \
--host 0.0.0.0 --port 8000  --trust-remote-code --tp 16    --max-running-requests 100 --dist-init-addr 172.31.16.3:20000 --nnodes 2 --node-rank 0 --mem-fraction-static 0.8 \
--context-length 32768 --enable-mixed-chunk --chunked-prefill-size 2048 --attention-backend fa3  \
 --enable-eplb  --enable-hierarchical-cache  \
--enable-dp-attention  --dp 16 --enable-dp-lm-head  --cuda-graph-max-bs 8 --ep-size 8



python3 /sgl-workspace/sglang/python/sglang/launch_server.py --model-path \
/data/models/DeepSeek-V3-0324 \
--served-model-name deepseek \
--host 0.0.0.0 --port 8000  --trust-remote-code --tp 16 --max-running-requests 100 --dist-init-addr 172.31.16.3:20000 --nnodes 2 --node-rank 1 --mem-fraction-static 0.8 \
--context-length 32768 --enable-mixed-chunk --chunked-prefill-size 2048  --attention-backend fa3  \
 --enable-eplb  --enable-hierarchical-cache  \
--enable-dp-attention  --dp 16 --enable-dp-lm-head  --cuda-graph-max-bs 8 --ep-size 8

### Environment

2* 8 *h20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [8,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. #8379

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. #8379

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [8,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed. #8379