[Bug] Remove stream sync in fast decode plan of flashinfer mla backend

### Checklist

- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 5. Please use English, otherwise it will be closed.

### Describe the bug

https://github.com/flashinfer-ai/flashinfer/pull/969 claims that the flashinfer mla backend can be sped up after removal of 
```python
  with self.device as device:
      stream = torch.cuda.current_stream(device).cuda_stream
```
in `fast_mla_decode_plan` of `flashinfer_mla_backend.py`

We need to test its performance after removal.

### Reproduction

```bash
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --enable-flashinfer-mla
```

### Environment

GPU: H200 * 8
Latest version of sglang and flashinfer

### Related PR

#5208  #5538

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Remove stream sync in fast decode plan of flashinfer mla backend #4905

Checklist

Describe the bug

Reproduction

Environment

Related PR

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Remove stream sync in fast decode plan of flashinfer mla backend #4905

Description

Checklist

Describe the bug

Reproduction

Environment

Related PR

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions