[`Refactor Attention mask handling`] Moves attention mask processing to the Attention class by ArthurZucker · Pull Request #28132 · huggingface/transformers

ArthurZucker · 2023-12-19T07:49:18Z

What does this PR do?

This is more aligned with our philosophy, but also simplifies and will simplify things.
Will help a lot with the static cache.

The only way to share the mask is to call LlamaAttention but if you have a better way I'll update it!
This makes the attention class self contained, which is also pretty convenient for testing.
Ran the slow test without fa2 will run them again on dgx once approved.

cc @patrickvonplaten for visibility

…tor-attention-converesion

gante

LGTM (yay, fewer config-dependent if/elses 🙌 )

BTW, for retrocompatibility, we may want to check whether the attention masks are 4D before expanding in the attention classes. As we've learned with the Cache refactor, other repos might rely on the interface of these internal classes, and this is technically an interface change (4D attention mask input -> 2D attention mask input).

ArthurZucker · 2023-12-20T13:08:58Z

We added support for 4d attention mask inside the converter so should be alright but yeah will check related issues!

src/transformers/models/llama/modeling_llama.py

LysandreJik · 2023-12-20T13:49:22Z

src/transformers/models/llama/modeling_llama.py

 class LlamaAttention(nn.Module):
    """Multi-headed attention from 'Attention Is All You Need' paper"""

+    cached_mask = None


as seen with you offline, this cannot work as-is due to sharing across model instances

Yes will either not cache it at the class level but instance level a tril. Or pass it as kwargs. Jax does not seem to care so should not be too bad

Why can't we pass the attention_mask just into the cache.update(...) function?

I'll check that as well, but that doesn't help for all type of attention we have which need a pre-processed mask, will work after the pre-processing tho

patrickvonplaten · 2023-12-26T16:41:34Z

Can you specify how this helps with the static cache?

The static cache should also work with the attention_mask being passed at every forward call (it'll always have the same shape). I don't think it's a good idea to have the attention_mask be a class variable.

ArthurZucker · 2024-01-02T16:37:15Z

It will not be a class variable forgot to update but I'll follow what we do with jax.
This will help as the cache length is different from the number of tokens that are seen which you get when you are in the attention layer.

ArthurZucker · 2024-01-02T16:37:47Z

Can give more details but basically new cache + attention was not behaving properly. This is gonna be my priority this week anyway!

github-actions · 2024-01-28T08:04:52Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker added 4 commits December 19, 2023 08:41

refactor attention in LlamaAttention for now

2518876

update mask in the correct places

7bf5962

cleanup

a9734d7

fixup0

db35c83

ArthurZucker mentioned this pull request Dec 19, 2023

[Sdpa / Flash] save the attention not a bool #28131

Closed

ArthurZucker added 14 commits December 19, 2023 10:23

nits

d5ace03

pass more tests

3dc777a

fix nit

e126711

nit

24e07d4

one more nit

e63bf49

nits

d276285

this passes the test but control flow might not be the best

98c9080

Merge branch 'main' of github.com:huggingface/transformers into refac…

5bb4662

…tor-attention-converesion

the simplest fix yet

1dc571d

style as well

53888f8

revert

1e323f8

YES

5d51dc0

nits

5580e29

revert wrong change

00917b2

ArthurZucker marked this pull request as ready for review December 20, 2023 10:18

ArthurZucker added 2 commits December 20, 2023 11:27

nit

5d6d031

giga nit

a446a09

ArthurZucker requested review from LysandreJik and patrickvonplaten December 20, 2023 10:37

gante reviewed Dec 20, 2023

View reviewed changes

ArthurZucker mentioned this pull request Dec 20, 2023

4D mask documentation updates #28151

Closed

LysandreJik reviewed Dec 20, 2023

View reviewed changes

ArthurZucker mentioned this pull request Dec 21, 2023

[Core generation] Adds support for static KV cache #27931

Merged

4 tasks

ArthurZucker mentioned this pull request Jan 3, 2024

causal_mask in GPT2Attention should not be broadcastable across the seq_len #28215

Open

4 tasks

github-actions bot closed this Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Refactor Attention mask handling`] Moves attention mask processing to the Attention class#28132

[`Refactor Attention mask handling`] Moves attention mask processing to the Attention class#28132
ArthurZucker wants to merge 20 commits intomainfrom
refactor-attention-converesion

ArthurZucker commented Dec 19, 2023 •

edited

Loading

Uh oh!

gante left a comment

Uh oh!

ArthurZucker commented Dec 20, 2023

Uh oh!

Uh oh!

LysandreJik Dec 20, 2023

Uh oh!

ArthurZucker Dec 21, 2023

Uh oh!

patrickvonplaten Dec 26, 2023

Uh oh!

ArthurZucker Jan 3, 2024

Uh oh!

patrickvonplaten commented Dec 26, 2023

Uh oh!

ArthurZucker commented Jan 2, 2024

Uh oh!

ArthurZucker commented Jan 2, 2024

Uh oh!

github-actions bot commented Jan 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ArthurZucker commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Dec 20, 2023

Uh oh!

Uh oh!

LysandreJik Dec 20, 2023

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Dec 26, 2023

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 3, 2024

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Dec 26, 2023

Uh oh!

ArthurZucker commented Jan 2, 2024

Uh oh!

ArthurZucker commented Jan 2, 2024

Uh oh!

github-actions bot commented Jan 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ArthurZucker commented Dec 19, 2023 •

edited

Loading