Skip to content

bump flashinfer 0.2.5#5208

Closed
AkazaAkane wants to merge 0 commit intosgl-project:mainfrom
AkazaAkane:flashinfer-0.2.5-bump
Closed

bump flashinfer 0.2.5#5208
AkazaAkane wants to merge 0 commit intosgl-project:mainfrom
AkazaAkane:flashinfer-0.2.5-bump

Conversation

@AkazaAkane
Copy link
Copy Markdown
Contributor

@AkazaAkane AkazaAkane commented Apr 9, 2025

Motivation

Modifications

Updated references to flashinfer_python in:

  • docs/start/install.md
  • python/pyproject.toml
  • python/sglang/srt/entrypoints/engine.py
  • scripts/ci_install_dependency.sh

Update BatchDecodeWithPagedKVCacheWrapper Plan (API Document) and remove cuda stream sync in:

  • python/sglang/srt/layers/attention/flashinfer_backend.py
  • python/sglang/srt/layers/attention/flashinfer_mla_backend.py

Checklist

@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented Apr 9, 2025

ah big break change from flashinfer

@AkazaAkane
Copy link
Copy Markdown
Contributor Author

Flashinfer backend seems works well, while the MLA version has some issue. @yzh119 @Fridge003 Could you please let me know how to test the mla version locally before I fix and publish? I tried to run test/srt/test_mla_flashinfer.py but it is a private hf test that I cannot access.

@Fridge003
Copy link
Copy Markdown
Collaborator

Flashinfer backend seems works well, while the MLA version has some issue. @yzh119 @Fridge003 Could you please let me know how to test the mla version locally before I fix and publish? I tried to run test/srt/test_mla_flashinfer.py but it is a private hf test that I cannot access.

The model used in that test is private. As alternative, you can try launching DeepSeek-v3 or DeepSeek-Coder-V2-Lite-Instruct and use commands in #4218 for testing accuracy and performance.

@yzh119
Copy link
Copy Markdown
Collaborator

yzh119 commented Apr 10, 2025

Hi @AkazaAkane

Flashinfer backend seems works well, while the MLA version has some issue.

Can you point me to the failed case?

@AkazaAkane
Copy link
Copy Markdown
Contributor Author

Hi @AkazaAkane

Flashinfer backend seems works well, while the MLA version has some issue.

Can you point me to the failed case?

https://github.com/sgl-project/sglang/actions/runs/14369665459/job/40290462862
I fixed it by remove the custom wrapper but have not test it yet.

@AkazaAkane
Copy link
Copy Markdown
Contributor Author

AkazaAkane commented Apr 10, 2025

@zhyncs @yzh119 Now that it passes most of the tests, while for the two unpassed ones, the error is: Child process unexpectedly failed with an exit code 9. pid=3803583. I wonder is that related to this flashinfer updates? Seems this error code is more like out of memory or sth. Could you please give me some suggestion on how to proceed? https://github.com/sgl-project/sglang/actions/runs/14372108772/job/40299535895?pr=5208 https://github.com/sgl-project/sglang/actions/runs/14372108772/job/40299535888?pr=5208

@sleepcoo
Copy link
Copy Markdown
Collaborator

can you refer to this PR and update the Docker CI and other FlashInfer dependencies accordingly?

@AkazaAkane

@AkazaAkane
Copy link
Copy Markdown
Contributor Author

can you refer to this PR and update the Docker CI and other FlashInfer dependencies accordingly?

@AkazaAkane

Sure, I will test it locally first before I push it.

@AkazaAkane AkazaAkane closed this Apr 18, 2025
@AkazaAkane AkazaAkane force-pushed the flashinfer-0.2.5-bump branch from a753240 to bfa3922 Compare April 18, 2025 16:24
@Fridge003 Fridge003 mentioned this pull request Apr 29, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants