[Attention] Refactor CUDA attention backend selection logic by MatthewBonanni · Pull Request #24794 · vllm-project/vllm

MatthewBonanni · 2025-09-13T05:58:31Z

Purpose

CudaPlatformBase.get_attention_backend_cls has gotten complex and messy over time. This PR cleans up the logic (without changing the behavior) and standardizes the interface.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2025-09-13T05:59:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/platforms/cuda.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

LucasWilkinson

left a few comments; we should figure out who the owner of the plugin mechanism is and figure out how to notify downstream HW plugins since I think this will affect them pretty dramatically

vllm/v1/attention/backends/mla/flashmla_sparse.py

vllm/v1/spec_decode/eagle.py

vllm/platforms/cuda.py

vllm/platforms/rocm.py

vllm/attention/backends/abstract.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni · 2025-11-04T17:57:00Z

@LucasWilkinson thanks for your review! I've already notified @ILikeIneine but I'm not sure if there's anyone else we should reach out to?

ILikeIneine · 2025-11-06T08:09:59Z

@MatthewBonanni Hi, would this refactor be able to merge into v0.11.1?

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni · 2025-11-06T15:00:41Z

@ILikeIneine we were planning on waiting until after v0.11.1, we don't want to risk further delaying the release and because it changes the platform interface, it might be better to be part of v0.12

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

LucasWilkinson · 2025-11-10T18:15:20Z

@MatthewBonanni how hard would it be to keep backwards compatibility between _Backend and AttentionBackendEnum for a version with a warning?

LucasWilkinson · 2025-11-10T18:34:01Z

With #26487 potential in the pipe what do we think about having a get_mla_attn_backend_cls instead of is_mla? @Yikun ?

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni · 2025-11-10T18:48:50Z

@MatthewBonanni how hard would it be to keep backwards compatibility between _Backend and AttentionBackendEnum for a version with a warning?

@LucasWilkinson done in d0f4698

NickLucche · 2025-11-10T18:55:00Z

Discussed offline thanks for the work @MatthewBonanni !

wangxiyuan · 2025-11-11T10:56:23Z

vllm/attention/backends/registry.py

+        return AttentionBackendEnum[name]
+
+
+class _Backend(metaclass=_BackendMeta):


Nice change

mgoin

Release has been cut, let's go for it on main

hmellor · 2025-11-11T13:44:53Z

The merge commit of this PR failed pre-commit because the base of the branch was out of date

mergify bot added rocm Related to AMD ROCm speculative-decoding v1 tpu Related to Google TPUs labels Sep 13, 2025

mergify bot added the needs-rebase label Sep 13, 2025

njhill reviewed Sep 13, 2025

View reviewed changes

vllm/platforms/cuda.py Outdated Show resolved Hide resolved

MatthewBonanni changed the title ~~[Attention] Refactor CUDA attention backend selection logic~~ [WIP][Attention] Refactor CUDA attention backend selection logic Sep 13, 2025

MatthewBonanni changed the title ~~[WIP][Attention] Refactor CUDA attention backend selection logic~~ [Attention] Refactor CUDA attention backend selection logic Sep 16, 2025

MatthewBonanni marked this pull request as ready for review September 16, 2025 13:22

MatthewBonanni requested review from LucasWilkinson, NickLucche, WoosukKwon, alexm-redhat, benchislett, bigPYJ1151, comaniac, gshtras, jikunshang, luccafong, mgoin, robertgshaw2-redhat, tdoublep, youkaichao, ywang96 and zhuohan123 as code owners September 16, 2025 13:22

mergify bot removed the needs-rebase label Sep 16, 2025

MatthewBonanni requested review from sighingnow, tlrmchlsmth and yewentao256 as code owners September 17, 2025 19:23

MatthewBonanni added 2 commits November 4, 2025 09:58

skip test

4d1102a

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Merge branch 'main' into backend_selection_refactor

09058fc

LucasWilkinson reviewed Nov 4, 2025

View reviewed changes

MatthewBonanni added 4 commits November 4, 2025 12:47

fix path

15a19d3

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

_get_backend_priorities return sorted list instead of dict

6515e4c

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

fix merge miss

0b841ff

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

remove get_default_block_size

69bb887

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni added 2 commits November 4, 2025 12:57

Merge branch 'main' into backend_selection_refactor

2f956f9

Merge branch 'main' into backend_selection_refactor

556fc3d

Merge branch 'main' into backend_selection_refactor

9850d95

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Merge branch 'main' into backend_selection_refactor

58e0639

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

LucasWilkinson mentioned this pull request Nov 10, 2025

[CI/Test Fix] Fix CP tests on Blackwell #28404

Merged

add _Backend backward compatibility

d0f4698

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

wangxiyuan approved these changes Nov 11, 2025

View reviewed changes

mgoin approved these changes Nov 11, 2025

View reviewed changes

MatthewBonanni mentioned this pull request Nov 11, 2025

[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device #26487

Merged

5 tasks

hl475 mentioned this pull request Nov 12, 2025

[CI Failure] Fix backend selection for encoder-only models #28534

Merged

5 tasks

mgoin mentioned this pull request Nov 12, 2025

[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support #28561

Merged

5 tasks

njhill mentioned this pull request Nov 12, 2025

[BugFix] Fix mm_encoder_attn_backend arg type checking #28599

Merged

MatthewBonanni mentioned this pull request Nov 13, 2025

[Attention][Bugfix] Fix FA sink support #28660

Merged

5 tasks

NickLucche mentioned this pull request Nov 20, 2025

[Attention] Refactor FA block_size limitations to hybrid models only #29084

Merged

MatthewBonanni mentioned this pull request Dec 15, 2025

[Feature][Attention][UX]: Incorporate Features into Attention Selection #30654

Closed

1 task

		return AttentionBackendEnum[name]


		class _Backend(metaclass=_BackendMeta):

Uh oh!

Conversation

MatthewBonanni commented Sep 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Sep 13, 2025

Uh oh!

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MatthewBonanni commented Nov 4, 2025

Uh oh!

ILikeIneine commented Nov 6, 2025

Uh oh!

MatthewBonanni commented Nov 6, 2025

Uh oh!

LucasWilkinson commented Nov 10, 2025

Uh oh!

LucasWilkinson commented Nov 10, 2025

Uh oh!

MatthewBonanni commented Nov 10, 2025

Uh oh!

NickLucche commented Nov 10, 2025

Uh oh!

wangxiyuan Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

MatthewBonanni commented Sep 13, 2025 •

edited by github-actions bot

Loading