Checklist
Describe the bug
For Qwen3-next model, the gsm8k accuracy is much higher than 0.93 threshold. But in disaggregation test, sometimes it's lower than the threshold. There may be some data race issues when PD disaggregation enabled. We need to find the root cause and fix it.
Reproduction
https://github.com/sgl-project/sglang/actions/runs/23760688526/job/69259644818#step:9:2711
Environment
/
Checklist
Describe the bug
For Qwen3-next model, the gsm8k accuracy is much higher than 0.93 threshold. But in disaggregation test, sometimes it's lower than the threshold. There may be some data race issues when PD disaggregation enabled. We need to find the root cause and fix it.
Reproduction
https://github.com/sgl-project/sglang/actions/runs/23760688526/job/69259644818#step:9:2711
Environment
/