Mamba / FalconMamba: Fix mamba left padding by younesbelkada · Pull Request #32677 · huggingface/transformers

younesbelkada · 2024-08-14T08:29:39Z

What does this PR do?

As pointed out in #32080 (comment) - it is important to zero-out hidden states that corresponds to the padd tokens before and after the causal convolution so that the padd token will not have an impact on the calculated hidden states.

This can be empirically proven by generation quality before / after this fix (note by default FalconMamba uses left padding):

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "tiiuae/falcon-mamba-7b"
tok = AutoTokenizer.from_pretrained(model_id)
tok.pad_token_id = tok.eos_token_id

texts = [
    "Hello today",
    "Hello my name is Younes and today"
]

inputs = tok(texts, return_tensors="pt", padding=True, return_token_type_ids=False).to(0)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=0, torch_dtype=torch.bfloat16)

out = model.generate(**inputs, max_new_tokens=20)
print(tok.batch_decode(out, skip_special_tokens=True))

Before the fix:

Hello today I'm.\nI'm.\n Hello today.\n Hello today.\n Hello today

After the fix:

Hello today I'm going to show you how to make a 3D model of a house.\n

Propagated the changes in Mamba1 as well

cc @ArthurZucker @molbap

molbap

Thanks @younesbelkada for adding the states tuning-out! 😁 left a couple comments, mostly curious of some situations that were edge cases for mamba 2

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

src/transformers/models/mamba/modeling_mamba.py

tests/models/falcon_mamba/test_modeling_falcon_mamba.py

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

vasqu · 2024-08-14T11:10:31Z

Can we propagate this to Jamba as well :D thx for this fix ❤️

molbap

LGTM! pinging @ArthurZucker for merging 🙂

ArthurZucker

Thanks for adding a test 🤗

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

ArthurZucker · 2024-08-15T17:08:18Z

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

+        # In case cache is not used, manually add a new column in the attention mask
+        if not use_cache and attention_mask is not None and input_ids.shape != attention_mask.shape:
+            pad_length = input_ids.shape[-1] - attention_mask.shape[-1]
+            attention_mask = torch.cat([attention_mask, torch.ones_like(input_ids[:, :pad_length])], dim=-1)


Not sure I understand why we are adding a [1] x batch_size? ( past_length is usually gonna be 1 - current_generation_token , so imagine 20 input ids, then -19 to slice the input_ids?
Unless the inpud_ids is 20, but then it always has the same shape as the mask

This is for users that run generation with use_cache=False and makes sure to manually update the attention mask because this is done no where else except here

then this is more a problem with generate as it should pass the correct attention mask 😓

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker

Will include this in the patch 🤗

ArthurZucker · 2024-08-16T08:43:08Z

src/transformers/models/mamba/modeling_mamba.py

+        # In case cache is not used, manually update the attention mask
+        if not use_cache and attention_mask is not None and input_ids.shape != attention_mask.shape:
+            past_length = input_ids.shape[-1] - attention_mask.shape[-1]
+            attention_mask = torch.cat([attention_mask, torch.ones_like(input_ids[:, :past_length])], dim=-1)
+


that's the only thing bothering me as generate with use_cache = False should not alter the attention mask being passed

Yes fixed it !

HuggingFaceDocBuilderDev · 2024-08-16T09:02:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-08-16T17:19:02Z

src/transformers/models/mamba/modeling_mamba.py

    def forward(
        self,
        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.LongTensor] = None,


this is breaking (having it as the second place)

yes fixed it

ArthurZucker · 2024-08-16T17:49:52Z

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

+        if "attention_mask" in model_kwargs:
+            attention_mask = model_kwargs["attention_mask"]
+            model_kwargs["attention_mask"] = torch.cat(
+                [attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1
+            )


good catch !

* fix mamba left padding * Apply suggestions from code review Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix copies * test with `inputs_embeds` * Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copies * clairfy * fix last comments * remove --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

This reverts commit 91b799b.

fix mamba left padding

4e33582

molbap reviewed Aug 14, 2024

View reviewed changes

younesbelkada and others added 2 commits August 14, 2024 13:38

Apply suggestions from code review

39dab5a

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

fix copies

97e9dfd

younesbelkada requested a review from molbap August 14, 2024 09:42

test with inputs_embeds

8978e00

molbap approved these changes Aug 14, 2024

View reviewed changes

ArthurZucker reviewed Aug 15, 2024

View reviewed changes

younesbelkada and others added 3 commits August 16, 2024 12:12

Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

fcc05af

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

copies

fe2725e

clairfy

6ea01ee

younesbelkada requested a review from ArthurZucker August 16, 2024 08:23

ArthurZucker approved these changes Aug 16, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fix-mamba-padding

f3339a7

ArthurZucker reviewed Aug 16, 2024

View reviewed changes

fix last comments

9a09c89

younesbelkada requested a review from ArthurZucker August 16, 2024 17:27

remove

6c16fc0

ArthurZucker approved these changes Aug 16, 2024

View reviewed changes

ArthurZucker merged commit 93e538a into huggingface:main Aug 19, 2024

younesbelkada deleted the fix-mamba-padding branch August 19, 2024 14:01

ArthurZucker added a commit that referenced this pull request Aug 20, 2024

Revert "Mamba / FalconMamba: Fix mamba left padding (#32677)"

c0e2699

This reverts commit 91b799b.

vasqu mentioned this pull request Aug 21, 2024

Fix: Jamba batched generation #32914

Merged

5 tasks

Conversation

younesbelkada commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before the fix:

After the fix:

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu commented Aug 14, 2024

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Aug 16, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

younesbelkada commented Aug 14, 2024 •

edited

Loading