Fix SDPA sliding window compatibility#30127
Merged
fxmarty merged 5 commits intohuggingface:mainfrom Apr 17, 2024
Merged
Conversation
Co-authored-by: ehuaa <ehuamail@163.com>
Contributor
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Contributor
Author
|
@ehuaa Thank you, for sure will do! Runing mistral, mixtral, starcoder2 tests, those fail but are already failing on main: |
ArthurZucker
approved these changes
Apr 17, 2024
Collaborator
ArthurZucker
left a comment
There was a problem hiding this comment.
Don't know if I mentioned it offline, we'll refactor this to a single function without inheritance similar to update_causal_mask. See Recurrent Gemma, as it supports sliding window!
Thanks for re-enabling sliding window.
Comment on lines
+319
to
+324
| ignore_causal_mask = False | ||
|
|
||
| if attention_mask is None: | ||
| if sliding_window is None or key_value_length < sliding_window: | ||
| ignore_causal_mask = not is_tracing | ||
| elif sliding_window is None or key_value_length < sliding_window: |
Collaborator
There was a problem hiding this comment.
There are basically 2 cases:
- You ignore the causal mask
- You don't ignore it.
Code is really not super clear but we will refactor this soon anyways.
This was referenced Apr 17, 2024
ydshieh
pushed a commit
that referenced
this pull request
Apr 23, 2024
* fix sdpa + sliding window * give credit Co-authored-by: ehuaa <ehuamail@163.com> * remove unnecessary warning * fix typog * add test --------- Co-authored-by: ehuaa <ehuamail@163.com>
This was referenced Apr 26, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As per title, fixes #28980
Supersedes #29220 #29407 as the implementation ends up being different (added you as co-author here @ehuaa).
This bug dates back to #26572 where
sliding_windowwas not properly accounted for in the_prepare_4d_causal_attention_mask_for_sdpamethod. Since then, SDPA support was added to models that use sliding window, but this bug was not yet fixed.