Draft: Fix flashinfer swizzle enum name for flashinfer update.#23311
Draft: Fix flashinfer swizzle enum name for flashinfer update.#23311weireweire wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Code Review
This pull request updates an enum name in vllm/compilation/collective_fusion.py to reflect a change in the flashinfer library. Specifically, flashinfer_comm.FP4QuantizationSFLayout.SWIZZLED is renamed to flashinfer_comm.QuantizationSFLayout.SWIZZLED. This change is correct and necessary for compatibility with the updated dependency. The pull request is a draft, and the author has noted that additional changes related to the flashinfer update will be handled separately.
| scale_out=scale_out, | ||
| # in vllm we only support swizzled layout | ||
| layout_code=flashinfer_comm.FP4QuantizationSFLayout.SWIZZLED, | ||
| layout_code=flashinfer_comm.QuantizationSFLayout.SWIZZLED, |
There was a problem hiding this comment.
SWIZZLED needs to be changed to SWIZZLED_128x4
(caused by flashinfer-ai/flashinfer@669ff33 )
|
merged in #23537 |
Purpose
New flahsinfer have changed the enum name.
Besides, to update flashinfer, flashinfer-ai/flashinfer#1475 require we make the scale of moe 2d for autotune. But it's not handle in this PR.
Test Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.