[Bug] illegal memory of `BatchQKApplyRotaryPosIdsCosSinCache` when spec decoding

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

```
  File "/root/sglang/python/sglang/srt/layers/rotary_embedding.py", line 228, in forward_cuda
    apply_rope_with_cos_sin_cache_inplace(
  File "/usr/local/lib/python3.12/dist-packages/sgl_kernel/elementwise.py", line 323, in apply_rope_with_cos_sin_cache_inplace
    torch.ops.sgl_kernel.apply_rope_pos_ids_cos_sin_cache.default(
  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 829, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: BatchQKApplyRotaryPosIdsCosSinCache failed with error code an illegal memory access was encountered
```

### Reproduction

Launch the server

```bash
export MODEL=meta-llama/Meta-Llama-3.1-8B-Instruct
export SPEC_MODEL=lmsys/sglang-EAGLE-LLaMA3-Instruct-8B
export SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1
python -m sglang.launch_server \
    --dtype float16 \
    --model-path $MODEL \
    --attention-backend triton \
    --decode-log-interval 1 \
    --cuda-graph-bs $(seq -s ' ' 1 64) \
    --mem-fraction-static 0.75 \
    --disable-radix-cache \
    --speculative-algorithm EAGLE \
    --speculative-draft-model $SPEC_MODEL \
    --speculative-num-steps 5 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 6 \
    --host 127.0.0.1 \
    --port 23333
```

Run the benchmarking

```bash
export MODEL=meta-llama/Meta-Llama-3.1-8B-Instruct
python3 -m sglang.bench_serving \
    --port 23333 \
    --model $MODEL \
    --dataset-name sharegpt \
    --backend sglang-oai \
    --random-range-ratio 0 \
    --random-input-len 1200 \
    --random-output-len 512 \
    --num-prompts 1000
```

### Environment

H100, latest main and lmsysorg/sglang:dev image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] illegal memory of `BatchQKApplyRotaryPosIdsCosSinCache` when spec decoding #10713

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] illegal memory of BatchQKApplyRotaryPosIdsCosSinCache when spec decoding #10713

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] illegal memory of `BatchQKApplyRotaryPosIdsCosSinCache` when spec decoding #10713