bump flashinfer 0.2.5#5208
Conversation
|
ah big break change from flashinfer |
|
Flashinfer backend seems works well, while the MLA version has some issue. @yzh119 @Fridge003 Could you please let me know how to test the mla version locally before I fix and publish? I tried to run test/srt/test_mla_flashinfer.py but it is a private hf test that I cannot access. |
The model used in that test is private. As alternative, you can try launching DeepSeek-v3 or DeepSeek-Coder-V2-Lite-Instruct and use commands in #4218 for testing accuracy and performance. |
|
Hi @AkazaAkane
Can you point me to the failed case? |
https://github.com/sgl-project/sglang/actions/runs/14369665459/job/40290462862 |
|
@zhyncs @yzh119 Now that it passes most of the tests, while for the two unpassed ones, the error is: Child process unexpectedly failed with an exit code 9. pid=3803583. I wonder is that related to this flashinfer updates? Seems this error code is more like out of memory or sth. Could you please give me some suggestion on how to proceed? https://github.com/sgl-project/sglang/actions/runs/14372108772/job/40299535895?pr=5208 https://github.com/sgl-project/sglang/actions/runs/14372108772/job/40299535888?pr=5208 |
|
can you refer to this PR and update the Docker CI and other FlashInfer dependencies accordingly? |
Sure, I will test it locally first before I push it. |
a753240 to
bfa3922
Compare
Motivation
Modifications
Updated references to flashinfer_python in:
Update BatchDecodeWithPagedKVCacheWrapper Plan (API Document) and remove cuda stream sync in:
Checklist