[V1] [Hybrid] Move MiniMaxLinearAttention into layers/mamba#23831
Conversation
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
There was a problem hiding this comment.
Code Review
This pull request successfully moves the MinimaxLinearAttention layer and its related components to the vllm/model_executor/layers/mamba/ directory, which improves code organization. However, this move introduces a critical circular dependency between the layers and models packages. Additionally, I've identified a pre-existing critical bug in the moved MiniMaxText01RMSNormTP class related to weight handling. Both issues should be addressed.
| import torch | ||
| import torch.distributed | ||
|
|
||
| from vllm.model_executor.models.minimax_cache import MinimaxCacheParams |
There was a problem hiding this comment.
This import introduces a circular dependency. The layers module should not depend on the models module. Specifically, vllm.model_executor.models.minimax_text_01 now imports MiniMaxText01LinearAttention from this file (layers/mamba/linear_attn.py), which in turn imports MinimaxCacheParams from models/minimax_cache.py. This creates a models -> layers -> models dependency cycle, which can lead to module resolution issues and makes the codebase harder to maintain.
To resolve this, MinimaxCacheParams should be moved to a more foundational location that both layers and models can depend on, for example vllm/model_executor/layers/mamba/mamba_utils.py.
| weight = self.weight | ||
| if x.size(-1) != self.weight.size(0): | ||
| if self.weight.size(0) < x.size(-1): | ||
| repeat_count = (x.size(-1) + self.weight.size(0)) // x.size(-1) |
There was a problem hiding this comment.
The calculation for repeat_count is incorrect. It should perform a ceiling division of x.size(-1) by self.weight.size(0) to determine how many times the weight tensor needs to be repeated to match the input tensor's dimension. The current logic will result in repeat_count being 1 if self.weight.size(0) < x.size(-1), which will lead to incorrect behavior or a runtime error due to shape mismatch during the multiplication.
| repeat_count = (x.size(-1) + self.weight.size(0)) // x.size(-1) | |
| repeat_count = (x.size(-1) + self.weight.size(0) - 1) // self.weight.size(0) |
heheda12345
left a comment
There was a problem hiding this comment.
I prefer to keep model-specific layers in model.py but open to further discussion. There is also a Plamo2MambaMixer in vllm/model_executor/models/plamo2.py (and maybe more, I didn't check the full list).
LucasWilkinson
left a comment
There was a problem hiding this comment.
LGTM! thanks for doing this; left one nit
|
|
||
| if TYPE_CHECKING: | ||
| from vllm.attention.backends.abstract import AttentionBackend | ||
| pass |
There was a problem hiding this comment.
nit: can we get rid of this block if its not used anymore
Oh fair; good point! ya idk do we expect more MiniMax models using this to come? |
I don't think this module will be reuse unless minimax team has new model release. |
OK - if that is the preferred approach then should we move My thinking here was it would be useful to have all of these ops in one place so we can then start to look for commonalities (e.g., to have something like
This model has a number of issues - it's on my to-do list to take a look at it. |
Sounds good! |
…ject#23831) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…ject#23831) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…ject#23831) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…ject#23831) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…ject#23831) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…ject#23831) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Purpose
This PR just moves the
MinimaxLinearAttentionlayer into thelayers/mambadirectory, to be consistent with the other mamba-like layers (e.g., mamba1, mamba2, short_conv).cc @heheda12345
Test Plan
Let's see if CI is OK (works locally).
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.