Checklist
Describe the bug
I noticed that when running the MiMo-V2-Flash model on SGLang, there is a precision issue with the configuration CUDA Graph + MTP + page_size = 64.
However, the precision is correct when using Graph + MTP + page_size = 1
Has anyone tried to fix this issue?
Reproduction
cudagraph+mtp+page_size 64
SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server --model-path /ssd3/MiMo-V2-Flash --max-total-tokens 835584 --disable-radix-cache --decode-log-interval 1 --host 0.0.0.0 --port 8806 --trust-remote-code --tp-size 8 --page-size 64 --cuda-graph-max-bs 64 --max-running-requests 64 --disable-overlap-schedule --attention-backend fa3 --mem-fraction-static 0.9 --dp-size 2 --enable-dp-attention --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-num-draft-tokens 4 --speculative-eagle-topk 1

cudagraph+mtp+page_size 1
SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server --model-path /ssd3/MiMo-V2-Flash --max-total-tokens 835584 --disable-radix-cache --decode-log-interval 1 --host 0.0.0.0 --port 8806 --trust-remote-code --tp-size 8 --page-size 1 --cuda-graph-max-b
s 64 --max-running-requests 64 --disable-overlap-schedule --attention-backend fa3 --mem-fraction-static 0.9 --dp-size 2 --enable-dp-attention --speculative-algorith
m EAGLE --speculative-num-steps 3 --speculative-num-draft-tokens 4 --speculative-eagle-topk 1

Environment
h200
Checklist
Describe the bug
I noticed that when running the MiMo-V2-Flash model on SGLang, there is a precision issue with the configuration CUDA Graph + MTP + page_size = 64.
However, the precision is correct when using Graph + MTP + page_size = 1
Has anyone tried to fix this issue?
Reproduction
cudagraph+mtp+page_size 64
SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server --model-path /ssd3/MiMo-V2-Flash --max-total-tokens 835584 --disable-radix-cache --decode-log-interval 1 --host 0.0.0.0 --port 8806 --trust-remote-code --tp-size 8 --page-size 64 --cuda-graph-max-bs 64 --max-running-requests 64 --disable-overlap-schedule --attention-backend fa3 --mem-fraction-static 0.9 --dp-size 2 --enable-dp-attention --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-num-draft-tokens 4 --speculative-eagle-topk 1
cudagraph+mtp+page_size 1
SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server --model-path /ssd3/MiMo-V2-Flash --max-total-tokens 835584 --disable-radix-cache --decode-log-interval 1 --host 0.0.0.0 --port 8806 --trust-remote-code --tp-size 8 --page-size 1 --cuda-graph-max-b
s 64 --max-running-requests 64 --disable-overlap-schedule --attention-backend fa3 --mem-fraction-static 0.9 --dp-size 2 --enable-dp-attention --speculative-algorith
m EAGLE --speculative-num-steps 3 --speculative-num-draft-tokens 4 --speculative-eagle-topk 1
Environment
h200