There is a recent PR (https://github.com/vllm-project/vllm/pull/2542) in vLLM that introduced some fused kernels to accelerate mixtral MoE models. We can bring it to our [code](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/mixtral.py) as well.
There is a recent PR (vllm-project/vllm#2542) in vLLM that introduced some fused kernels to accelerate mixtral MoE models.
We can bring it to our code as well.