[V1] [Hybrid] Move MiniMaxLinearAttention into layers/mamba by tdoublep · Pull Request #23831 · vllm-project/vllm

tdoublep · 2025-08-28T12:55:33Z

Purpose

This PR just moves the MinimaxLinearAttention layer into the layers/mamba directory, to be consistent with the other mamba-like layers (e.g., mamba1, mamba2, short_conv).

cc @heheda12345

Test Plan

Let's see if CI is OK (works locally).

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

gemini-code-assist

Code Review

This pull request successfully moves the MinimaxLinearAttention layer and its related components to the vllm/model_executor/layers/mamba/ directory, which improves code organization. However, this move introduces a critical circular dependency between the layers and models packages. Additionally, I've identified a pre-existing critical bug in the moved MiniMaxText01RMSNormTP class related to weight handling. Both issues should be addressed.

gemini-code-assist · 2025-08-28T12:57:20Z

+import torch
+import torch.distributed
+
+from vllm.model_executor.models.minimax_cache import MinimaxCacheParams


This import introduces a circular dependency. The layers module should not depend on the models module. Specifically, vllm.model_executor.models.minimax_text_01 now imports MiniMaxText01LinearAttention from this file (layers/mamba/linear_attn.py), which in turn imports MinimaxCacheParams from models/minimax_cache.py. This creates a models -> layers -> models dependency cycle, which can lead to module resolution issues and makes the codebase harder to maintain.

To resolve this, MinimaxCacheParams should be moved to a more foundational location that both layers and models can depend on, for example vllm/model_executor/layers/mamba/mamba_utils.py.

gemini-code-assist · 2025-08-28T12:57:20Z

+        weight = self.weight
+        if x.size(-1) != self.weight.size(0):
+            if self.weight.size(0) < x.size(-1):
+                repeat_count = (x.size(-1) + self.weight.size(0)) // x.size(-1)


The calculation for repeat_count is incorrect. It should perform a ceiling division of x.size(-1) by self.weight.size(0) to determine how many times the weight tensor needs to be repeated to match the input tensor's dimension. The current logic will result in repeat_count being 1 if self.weight.size(0) < x.size(-1), which will lead to incorrect behavior or a runtime error due to shape mismatch during the multiplication.

Suggested change

repeat_count = (x.size(-1) + self.weight.size(0)) // x.size(-1)

repeat_count = (x.size(-1) + self.weight.size(0) - 1) // self.weight.size(0)

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

heheda12345

I prefer to keep model-specific layers in model.py but open to further discussion. There is also a Plamo2MambaMixer in vllm/model_executor/models/plamo2.py (and maybe more, I didn't check the full list).

LucasWilkinson

LGTM! thanks for doing this; left one nit

LucasWilkinson · 2025-08-28T19:24:50Z


 if TYPE_CHECKING:
-    from vllm.attention.backends.abstract import AttentionBackend
+    pass


nit: can we get rid of this block if its not used anymore

LucasWilkinson · 2025-08-28T19:27:25Z

I prefer to keep model-specific layers in model.py but open to further discussion. There is also a Plamo2MambaMixer in vllm/model_executor/models/plamo2.py (and maybe more, I didn't check the full list).

Oh fair; good point! ya idk do we expect more MiniMax models using this to come?

@heheda12345

wait for @heheda12345 to comment

heheda12345 · 2025-08-28T19:34:11Z

Oh fair; good point! ya idk do we expect more MiniMax models using this to come?

I don't think this module will be reuse unless minimax team has new model release.

tdoublep · 2025-08-29T07:48:30Z

I prefer to keep model-specific layers in model.py but open to further discussion.

OK - if that is the preferred approach then should we move short_conv into the Lfm2 modeling file then?

My thinking here was it would be useful to have all of these ops in one place so we can then start to look for commonalities (e.g., to have something like unified_attention but for mamba-like ops).

There is also a Plamo2MambaMixer in vllm/model_executor/models/plamo2.py (and maybe more, I didn't check the full list).

This model has a number of issues - it's on my to-do list to take a look at it.

heheda12345 · 2025-08-29T19:55:25Z

My thinking here was it would be useful to have all of these ops in one place so we can then start to look for commonalities (e.g., to have something like unified_attention but for mamba-like ops).

Sounds good!

…ject#23831) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Move linear attention in layers/mamba

af4cae4

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

gemini-code-assist Bot reviewed Aug 28, 2025

View reviewed changes

tdoublep added 2 commits August 28, 2025 09:55

Merge branch 'main' into move-linear-attn

8260a34

Fix lint

e7c22cd

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

heheda12345 reviewed Aug 28, 2025

View reviewed changes

LucasWilkinson previously approved these changes Aug 28, 2025

View reviewed changes

heheda12345 approved these changes Aug 29, 2025

View reviewed changes

heheda12345 enabled auto-merge (squash) August 29, 2025 19:55

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 29, 2025

DarkLight1337 disabled auto-merge August 29, 2025 23:54

Merge branch 'main' into move-linear-attn

f660ce1

DarkLight1337 enabled auto-merge (squash) August 29, 2025 23:54

vllm-bot merged commit 4071c76 into vllm-project:main Aug 30, 2025
37 of 43 checks passed

wangxiyuan mentioned this pull request Apr 28, 2026

[Attention] Mamba attention module refactor #41126

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[V1] [Hybrid] Move MiniMaxLinearAttention into layers/mamba#23831

[V1] [Hybrid] Move MiniMaxLinearAttention into layers/mamba#23831
vllm-bot merged 4 commits into
vllm-project:mainfrom
tdoublep:move-linear-attn

tdoublep commented Aug 28, 2025 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Aug 28, 2025

Uh oh!

gemini-code-assist Bot Aug 28, 2025

Uh oh!

heheda12345 left a comment

Uh oh!

LucasWilkinson left a comment

Uh oh!

LucasWilkinson Aug 28, 2025

Uh oh!

LucasWilkinson commented Aug 28, 2025

Uh oh!

heheda12345 commented Aug 28, 2025

Uh oh!

tdoublep commented Aug 29, 2025 •

edited

Loading

Uh oh!

heheda12345 commented Aug 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	repeat_count = (x.size(-1) + self.weight.size(0)) // x.size(-1)
	repeat_count = (x.size(-1) + self.weight.size(0) - 1) // self.weight.size(0)

Uh oh!

Conversation

tdoublep commented Aug 28, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson commented Aug 28, 2025

Uh oh!

heheda12345 commented Aug 28, 2025

Uh oh!

tdoublep commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heheda12345 commented Aug 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tdoublep commented Aug 28, 2025 •

edited by github-actions Bot

Loading

tdoublep commented Aug 29, 2025 •

edited

Loading