Checklist
Describe the bug
Repro:
https://huggingface.co/xai-org/grok-2
When we enabled radix cache
GSM8K score will drop to 0.8 instead of 0.9
The second send_one will output garbage result
<pad><pad><pad><pad><pad><pad><pad><pad>
<pad><pad><pad><pad><pad><pad><pad><pad>
<pad><pad><pad><pad><pad><pad><pad><pad>
<pad><pad><pad><pad><pad><pad><pad><pad>
When we disable radix cache
The second send_one will still output garbage result
<pad><pad><pad><pad><pad><pad><pad><pad>
<pad><pad><pad><pad><pad><pad><pad><pad>
<pad><pad><pad><pad><pad><pad><pad><pad>
<pad><pad><pad><pad><pad><pad><pad><pad>
Reproduction
Download model from: https://huggingface.co/xai-org/grok-2
python3 -m sglang.launch_server --model /local/grok-2 --tokenizer-path /local/grok-2/tokenizer.tok.json --tp 8 --quantization fp8 --attention-backend triton --enable-piecewise-cuda-graph --disable-radix-cache
# First Run
python /sgl-workspace/sglang/python/sglang/test/send_one.py
# Second Run
python /sgl-workspace/sglang/python/sglang/test/send_one.py
Then send_one or run gsm8k
Environment
Latest main
Checklist
Describe the bug
Repro:
https://huggingface.co/xai-org/grok-2
When we enabled radix cache
GSM8K score will drop to 0.8 instead of 0.9
The second send_one will output garbage result
When we disable radix cache
The second send_one will still output garbage result
Reproduction
Download model from: https://huggingface.co/xai-org/grok-2
Then send_one or run gsm8k
Environment
Latest main