Describe the bug
As title
Reproduction
# Launch
sglang serve --model-path zai-org/GLM-5-FP8 --tp 8 --trust-remote-code --dp 8 --enable-dp-attention --kv-cache-dtype fp8_e4m3 --nsa-prefill-backend flashmla_sparse --nsa-decode-backend flashmla_kv
# Benchmark: 20-shots gsm8k
python3 benchmark/gsm8k/bench_sglang.py --num-shots 20 --num-questions 1319 --parallel 1319
The accuracy result is
Accuracy: 0.919
Invalid: 0.000
Latency: 29.930 s
Output throughput: 4294.924 token/s
However the expected result should be about 0.95
Environment
Latest main branch, 8*B200
Describe the bug
As title
Reproduction
The accuracy result is
However the expected result should be about 0.95
Environment
Latest main branch, 8*B200