Update Mamba types and pass through use_cache attr to MambaModel by koayon · Pull Request #29605 · huggingface/transformers

koayon · 2024-03-12T02:14:04Z

What does this PR do?

Previously the cache_params variable was inconsistently typed as MambaCache, torch.Tensor or list[torch.Tensor]. This PR updates this to MambaCache everywhere which is inline with the attributes that are being accessed in the logic.
Previously if you ran the forward method with use_cache=True, the cache_params in the output would still be None as this argument wasn't being passed through to the MambaModel. This PR fixes this as below:
Also updates the docstrings in line with this change

Allowed the use_cache information to be passed through so that you can do:

import torch as t

from transformers import AutoTokenizer

if __name__ == "__main__":
    model = MambaForCausalLM(MambaConfig())
    tokeniser = AutoTokenizer.from_pretrained("state-spaces/mamba-130m-hf")
    input_ids: t.Tensor = tokeniser("Hey how are you doing?", return_tensors="pt")[  # type: ignore
        "input_ids"
    ]

    out: MambaCausalLMOutput = model(input_ids=input_ids, use_cache=True)
    assert out.cache_params is not None
    print(out.cache_params.ssm_states)

And get back the ssm_states, which was not previously possible

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.

Who can review?

@ArthurZucker
@gante

amyeroberts · 2024-03-12T11:00:31Z

Thanks for adding this @koayon

Pinging @gante for first review of the cache logic, as @ArthurZucker is off this week

gante

LGTM, thank you for the PR 🤗

I'd like a final check from @ArthurZucker, though -- there are some terminology updates in the docstrings, and I'm not very familiar with Mamba :)

koayon · 2024-03-19T18:00:24Z

Hey @ArthurZucker! Hope you had a great holiday 🙌
Is it possible for you to take a little look at the docstring here?

ArthurZucker

Thanks for the typing, there are a lot of unrelated changes, and I am not sure I understand how you got the cache_params to be None but adding kwargs to the call should not really be the solution !

use_cache=True returns the cache for me:

unless you are training.

ArthurZucker · 2024-03-20T01:12:29Z

src/transformers/models/mamba/modeling_mamba.py

 is_fast_path_available = all(
-    (selective_state_update, selective_scan_fn, causal_conv1d_fn, causal_conv1d_update, mamba_inner_fn)
+    (
+        selective_state_update,
+        selective_scan_fn,
+        causal_conv1d_fn,
+        causal_conv1d_update,
+        mamba_inner_fn,
+    )
 )


this is unrelated and is styling, should be reverted!

Thanks, I've updated the styling 👌

ArthurZucker · 2024-03-20T01:12:47Z

src/transformers/models/mamba/modeling_mamba.py

-class MambaCache:
-    def __init__(self, config, batch_size, dtype=torch.float16, device=None):
-        self.seqlen_offset = 0
-        self.dtype = dtype
-        intermediate_size = config.intermediate_size
-        ssm_state_size = config.state_size
-        conv_kernel_size = config.conv_kernel
-
-        self.conv_states = {
-            i: torch.zeros(batch_size, intermediate_size, conv_kernel_size, device=device, dtype=dtype)
-            for i in range(config.num_hidden_layers)
-        }
-        self.ssm_states = {
-            i: torch.zeros(batch_size, intermediate_size, ssm_state_size, device=device, dtype=dtype)
-            for i in range(config.num_hidden_layers)
-        }
-


if moved, let's just keep the styling of this one please

ArthurZucker · 2024-03-20T01:13:03Z

src/transformers/models/mamba/modeling_mamba.py

-                ssm_parameters, [self.time_step_rank, self.ssm_state_size, self.ssm_state_size], dim=-1
+                ssm_parameters,
+                [self.time_step_rank, self.ssm_state_size, self.ssm_state_size],
+                dim=-1,


same here, unrelated change

ArthurZucker · 2024-03-20T01:13:07Z

src/transformers/models/mamba/modeling_mamba.py

                    )
                    cache_params.conv_states[self.layer_idx].copy_(conv_states)
                hidden_states = causal_conv1d_fn(
-                    hidden_states, conv_weights, self.conv1d.bias, activation=self.activation


ArthurZucker · 2024-03-20T01:13:13Z

src/transformers/models/mamba/modeling_mamba.py

            else:
                if cache_params is not None:
                    conv_states = nn.functional.pad(
-                        hidden_states, (self.conv_kernel_size - hidden_states.shape[-1], 0)


same here, unrelated change

ArthurZucker · 2024-03-20T01:13:19Z

src/transformers/models/mamba/modeling_mamba.py

-        self.x_proj = nn.Linear(self.intermediate_size, self.time_step_rank + self.ssm_state_size * 2, bias=False)
+        self.x_proj = nn.Linear(
+            self.intermediate_size,
+            self.time_step_rank + self.ssm_state_size * 2,
+            bias=False,
+        )


ArthurZucker · 2024-03-20T01:13:42Z

src/transformers/models/mamba/modeling_mamba.py


        if cache_params is None and use_cache:
            cache_params = MambaCache(
-                self.config, inputs_embeds.size(0), device=inputs_embeds.device, dtype=inputs_embeds.dtype


unrelated change let's revert

ArthurZucker · 2024-03-20T01:13:48Z

src/transformers/models/mamba/modeling_mamba.py

        return model_kwargs

    def prepare_inputs_for_generation(
-        self, input_ids, cache_params=None, inputs_embeds=None, attention_mask=None, **kwargs


ArthurZucker · 2024-03-20T01:14:21Z

src/transformers/models/mamba/modeling_mamba.py

            inputs_embeds=inputs_embeds,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
+            **kwargs,


why is this required? it should not. The cache params are passed right above

I believe it's the use_cache argument that needs to be passed in for this to work as expected - we could restrict to just passing that through?

Have amended this to only pass through the use_cache argument

…eyword arguments

koayon · 2024-03-20T02:15:19Z

Hey @ArthurZucker, thanks for your review! 🙌
I've reverted the styling changes, thanks 🙏

In terms of the image that you were sending, it's unfortunately not showing up for me. But without the change to pass in use_cache I don't see the cache_params being returned. If there's a difference for you, I've just thought that it might be how it's running on CUDA vs MPS/CPU?

I append the following to the file:

import torch as t

from transformers import AutoTokenizer

if __name__ == "__main__":
    model = MambaForCausalLM(MambaConfig())
    tokeniser = AutoTokenizer.from_pretrained("state-spaces/mamba-130m-hf")
    input_ids: t.Tensor = tokeniser("Hey how are you doing?", return_tensors="pt")["input_ids"]  # type: ignore

    out: MambaCausalLMOutput = model(input_ids=input_ids, use_cache=True)
    assert out.cache_params is not None
    print(out.cache_params.ssm_states)

If the use_cache argument isn't passed through to the backbone (either with kwargs or separately as in the newer version), there is no cache_params returned and I get the error:

python src/transformers/models/mamba/modeling_mamba.py      
...
Traceback (most recent call last):
 File "/[PATH_TO_TRANSFORMERS]/transformers/src/transformers/models/mamba/modeling_mamba.py", line 688, in <module>
   assert out.cache_params is not None
AssertionError

whereas with the use_cache argument being passed through I get a tensor returned:

{0: tensor([[[-5.5237e-04,  9.6599e-04,  6.6771e-04,  ..., -5.3982e-04,
          -4.6061e-04, -7.0508e-04],
         [-3.7170e-05, -2.2089e-04, -1.0218e-04,  ...,  7.1232e-05, 
         ...

It does seem like this would be required for the expected behaviour. Please let me know if you have any questions! 😄

src/transformers/models/mamba/modeling_mamba.py

ArthurZucker · 2024-03-20T08:02:19Z

Alright, when using from_pretrained, the cache is used and passed subsequently, but not when using the initialization

ArthurZucker

Almost good to go!

ArthurZucker · 2024-03-20T08:04:41Z

src/transformers/models/mamba/modeling_mamba.py

+        cache_params: Optional[MambaCache] = None,
        labels: Optional[torch.LongTensor] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,


let's add use_cache as an arg here

@ArthurZucker Added! 👍

src/transformers/models/mamba/modeling_mamba.py

ArthurZucker

Sorry forgot about these!

src/transformers/models/mamba/modeling_mamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

koayon · 2024-03-20T10:19:04Z

@ArthurZucker great suggestion, didn't realise that was an attribute of the Config 👌
Think it's ready to merge now 🚀

src/transformers/models/mamba/modeling_mamba.py

ArthurZucker · 2024-03-20T11:27:13Z

The failing test seems new, but it's because when training the use_cache should be disabled by the model

ArthurZucker · 2024-03-20T11:27:22Z

I'll have a look

src/transformers/models/mamba/modeling_mamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker

Thanks for iterating!

) * Update docstring for RMSNorm * Update cache_params object to correct MambaCache type * Update docstrings and type info * Pass through use_cache * ruff * Reformat with 119 char limit per line (thanks Arthur) * Pass through use_cache specifically to the backbone rather than all keyword arguments * Update src/transformers/models/mamba/modeling_mamba.py * Update src/transformers/models/mamba/modeling_mamba.py * Update src/transformers/models/mamba/modeling_mamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/mamba/modeling_mamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tab * Update src/transformers/models/mamba/modeling_mamba.py * Update src/transformers/models/mamba/modeling_mamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

koayon added 5 commits March 12, 2024 00:53

Update docstring for RMSNorm

77eacd9

Update cache_params object to correct MambaCache type

c975cf5

Update docstrings and type info

c1211ce

Pass through use_cache

e4aa1b6

ruff

f9b13ec

koayon changed the title ~~Update Mamba types and allow use_cache to be passed through~~ Update Mamba types and pass through use_cache attr to MambaModel Mar 12, 2024

gante approved these changes Mar 14, 2024

View reviewed changes

y1xia0w mentioned this pull request Mar 17, 2024

mamba generation throughput lower than original due to DecodingCGCache #29699

Open

4 tasks

ArthurZucker reviewed Mar 20, 2024

View reviewed changes

koayon added 2 commits March 20, 2024 01:37

Reformat with 119 char limit per line (thanks Arthur)

5816a8d

Pass through use_cache specifically to the backbone rather than all k…

5a3f59d

…eyword arguments

koayon commented Mar 20, 2024

View reviewed changes

src/transformers/models/mamba/modeling_mamba.py Show resolved Hide resolved

Update src/transformers/models/mamba/modeling_mamba.py

d8aea25

ArthurZucker reviewed Mar 20, 2024

View reviewed changes

koayon commented Mar 20, 2024

View reviewed changes

src/transformers/models/mamba/modeling_mamba.py Show resolved Hide resolved

Update src/transformers/models/mamba/modeling_mamba.py

2c3fa5b

ArthurZucker reviewed Mar 20, 2024

View reviewed changes

src/transformers/models/mamba/modeling_mamba.py Outdated Show resolved Hide resolved

src/transformers/models/mamba/modeling_mamba.py Outdated Show resolved Hide resolved

koayon and others added 2 commits March 20, 2024 10:15

Update src/transformers/models/mamba/modeling_mamba.py

72f99a0

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/mamba/modeling_mamba.py

de6eef1

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update tab

d16ffce

koayon commented Mar 20, 2024

View reviewed changes

src/transformers/models/mamba/modeling_mamba.py Show resolved Hide resolved

Update src/transformers/models/mamba/modeling_mamba.py

b4438a8

ArthurZucker reviewed Mar 20, 2024

View reviewed changes

src/transformers/models/mamba/modeling_mamba.py Outdated Show resolved Hide resolved

Update src/transformers/models/mamba/modeling_mamba.py

b0e7160

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker approved these changes Mar 20, 2024

View reviewed changes

ArthurZucker merged commit 76b3b20 into huggingface:main Mar 20, 2024

Conversation

koayon commented Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

amyeroberts commented Mar 12, 2024

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

koayon commented Mar 19, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koayon Mar 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koayon commented Mar 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ArthurZucker commented Mar 20, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

koayon Mar 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

koayon commented Mar 20, 2024

Uh oh!

Uh oh!

ArthurZucker commented Mar 20, 2024

Uh oh!

ArthurZucker commented Mar 20, 2024

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

koayon commented Mar 12, 2024 •

edited

Loading

koayon Mar 20, 2024 •

edited

Loading

koayon commented Mar 20, 2024 •

edited

Loading

koayon Mar 20, 2024 •

edited

Loading