[Attention] Register FLASHMLA_SPARSE#26441
Merged
LucasWilkinson merged 1 commit intovllm-project:mainfrom Oct 8, 2025
Merged
Conversation
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request correctly registers the FLASHMLA_SPARSE attention backend. The changes are consistent with the existing structure for registering backends. I've reviewed the modifications in vllm/attention/backends/registry.py and vllm/v1/attention/backends/mla/flashmla_sparse.py and found no issues of high or critical severity. The new enum member and its corresponding entry in BACKEND_MAP are correctly added, and the name of the backend in get_name is now consistent with its registration key in-tree conventions.
yewentao256
approved these changes
Oct 8, 2025
Member
yewentao256
left a comment
There was a problem hiding this comment.
LGTM, thanks for the work!
845473182
pushed a commit
to dsxsteven/vllm_splitPR
that referenced
this pull request
Oct 10, 2025
…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits) [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001) Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164) Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353) [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189) Bump Flashinfer to v0.4.0 (vllm-project#26326) Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464) [Core] Relax the LoRA max rank (vllm-project#26461) [CI/Build] Fix model nightly tests (vllm-project#26466) [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486) [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926) [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200) [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439) [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462) [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445) [Minor] Change warning->warning_once in preprocess (vllm-project#26455) [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392) [Misc] Redact ray runtime env before logging (vllm-project#26302) Separate MLAAttention class from Attention (vllm-project#25103) [Attention] Register FLASHMLA_SPARSE (vllm-project#26441) [Kernels] Modular kernel refactor (vllm-project#24812) ...
Dhruvilbhatt
pushed a commit
to Dhruvilbhatt/vllm
that referenced
this pull request
Oct 14, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
lywa1998
pushed a commit
to lywa1998/vllm
that referenced
this pull request
Oct 20, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
alhridoy
pushed a commit
to alhridoy/vllm
that referenced
this pull request
Oct 24, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
0xrushi
pushed a commit
to 0xrushi/vllm
that referenced
this pull request
Oct 26, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi
pushed a commit
to 0xrushi/vllm
that referenced
this pull request
Oct 26, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman
pushed a commit
to rtourgeman/vllm
that referenced
this pull request
Nov 10, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
devpatelio
pushed a commit
to SumanthRH/vllm
that referenced
this pull request
Nov 29, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Add
FLASHMLA_SPARSEto the backend registryTest Plan
CI should suffice
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.