Checklist
Describe the bug
flashinfer-ai/flashinfer#969 claims that the flashinfer mla backend can be sped up after removal of
with self.device as device:
stream = torch.cuda.current_stream(device).cuda_stream
in fast_mla_decode_plan of flashinfer_mla_backend.py
We need to test its performance after removal.
Reproduction
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --enable-flashinfer-mla
Environment
GPU: H200 * 8
Latest version of sglang and flashinfer
Related PR
#5208 #5538
Checklist
Describe the bug
flashinfer-ai/flashinfer#969 claims that the flashinfer mla backend can be sped up after removal of
in
fast_mla_decode_planofflashinfer_mla_backend.pyWe need to test its performance after removal.
Reproduction
Environment
GPU: H200 * 8
Latest version of sglang and flashinfer
Related PR
#5208 #5538