Skip to content

[Bug] ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. #8379

@wangkeya

Description

@wangkeya

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.

Reproduction

docker run --entrypoint bash -it --gpus all
--network host --shm-size 32g
-v /data/:/data/
--privileged=true
--ipc=host
-e HF_HUB_OFFLINE=1 -e GLOO_SOCKET_IFNAME=eth0 -e NCCL_SOCKET_IFNAME=eth0 -e ENABLE_JIT_DEEPGEMM=1 -e NCCL_CUMEM_ENABLE=1 docker.1ms.run/lmsysorg/sglang:v0.4.9.post2-cu126

python3 /sgl-workspace/sglang/python/sglang/launch_server.py --model-path
/data/models/DeepSeek-V3-0324
--served-model-name deepseek
--host 0.0.0.0 --port 8000 --trust-remote-code --tp 16 --max-running-requests 100 --dist-init-addr 172.31.16.3:20000 --nnodes 2 --node-rank 0 --mem-fraction-static 0.8
--context-length 32768 --enable-mixed-chunk --chunked-prefill-size 2048 --attention-backend fa3
--enable-eplb --enable-hierarchical-cache
--enable-dp-attention --dp 16 --enable-dp-lm-head --cuda-graph-max-bs 8 --ep-size 8

python3 /sgl-workspace/sglang/python/sglang/launch_server.py --model-path
/data/models/DeepSeek-V3-0324
--served-model-name deepseek
--host 0.0.0.0 --port 8000 --trust-remote-code --tp 16 --max-running-requests 100 --dist-init-addr 172.31.16.3:20000 --nnodes 2 --node-rank 1 --mem-fraction-static 0.8
--context-length 32768 --enable-mixed-chunk --chunked-prefill-size 2048 --attention-backend fa3
--enable-eplb --enable-hierarchical-cache
--enable-dp-attention --dp 16 --enable-dp-lm-head --cuda-graph-max-bs 8 --ep-size 8

Environment

2* 8 *h20

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions