Checklist
Describe the bug
File "/root/sglang/python/sglang/srt/layers/rotary_embedding.py", line 228, in forward_cuda
apply_rope_with_cos_sin_cache_inplace(
File "/usr/local/lib/python3.12/dist-packages/sgl_kernel/elementwise.py", line 323, in apply_rope_with_cos_sin_cache_inplace
torch.ops.sgl_kernel.apply_rope_pos_ids_cos_sin_cache.default(
File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__
return self._op(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: BatchQKApplyRotaryPosIdsCosSinCache failed with error code an illegal memory access was encountered
Reproduction
Launch the server
export MODEL=meta-llama/Meta-Llama-3.1-8B-Instruct
export SPEC_MODEL=lmsys/sglang-EAGLE-LLaMA3-Instruct-8B
export SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1
python -m sglang.launch_server \
--dtype float16 \
--model-path $MODEL \
--attention-backend triton \
--decode-log-interval 1 \
--cuda-graph-bs $(seq -s ' ' 1 64) \
--mem-fraction-static 0.75 \
--disable-radix-cache \
--speculative-algorithm EAGLE \
--speculative-draft-model $SPEC_MODEL \
--speculative-num-steps 5 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 6 \
--host 127.0.0.1 \
--port 23333
Run the benchmarking
export MODEL=meta-llama/Meta-Llama-3.1-8B-Instruct
python3 -m sglang.bench_serving \
--port 23333 \
--model $MODEL \
--dataset-name sharegpt \
--backend sglang-oai \
--random-range-ratio 0 \
--random-input-len 1200 \
--random-output-len 512 \
--num-prompts 1000
Environment
H100, latest main and lmsysorg/sglang:dev image
Checklist
Describe the bug
Reproduction
Launch the server
Run the benchmarking
Environment
H100, latest main and lmsysorg/sglang:dev image