[Mistral&Mixtral]Add sliding window param to sdpa after torch 2.2.0#29220
[Mistral&Mixtral]Add sliding window param to sdpa after torch 2.2.0#29220ehuaa wants to merge 55 commits intohuggingface:mainfrom
Conversation
|
I modified mixtral model as well, because of they have same sliding window structure here. qwen2 failed with these tests below: |
…com/https://github.com/ehuaa/transformers into add_sliding_window_for_sdpa
|
Hey @fxmarty @ArthurZucker , |
ArthurZucker
left a comment
There was a problem hiding this comment.
Looks alright but let's not add the unrelated change
I have deleted the unrelated changes and please review them again. The failed test are because of |
ArthurZucker
left a comment
There was a problem hiding this comment.
A small nit but LGTM otherwise.
Let's just declare _is_torch_version_greater_or_equal_than_2_2_0 in the src/transformers/utils/import_utils.py file
I have resolved the conflict, and the test you mentioned above i think we can use the test you wrote before, https://github.com/huggingface/transformers/blob/main/tests/models/mistral/test_modeling_mistral.py#L534 |
|
When i tried to add a new test to check if sdpa work with sliding_window, the test below is failed. I just modified your original test_model_7b_logits with a longer input with size of 4097 to check sliding_window feature, it failes as below: |
@slow |
…e#29264) * Add compatibility with mps device * fix * typo and style
Co-authored-by: Joao Gante <joao@huggingface.co>
* [i18n-zh] Translate fsdp.md into Chinese Signed-off-by: windsonsea <haifeng.yao@daocloud.io> * apply suggestions from Fan-Lin --------- Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
* [Whisper Tok] Update integration test * make style
* Fix yolos processing * Add back slow marker - protects for pycocotools in slow * Slow decorator goes above copied from header
enable subfolder
|
Thanks for this PR. |
* Fix deprecated arg issue * Trainer check too * Check for dict or dataclass * Simplify, make config always AcceleratorConfig * Upstream to Trainer
|
Probably yes. That why it's a bit critical. Mistral only had |
@ArthurZucker , Yes, so i test mistral-7b logits of sdpa with sliding_window and eager with sliding_window, and the results are not the same. |
That's less important than sdpa vs flash! |
Ok, I'll upload a new slow test of sdpa vs flash later. |
…com/https://github.com/ehuaa/transformers into add_sliding_window_for_sdpa
|
After I git push, there're something weird to the changed files above so i closed this pr and try to open a new pull request. @ArthurZucker |
|
alright! It think this fix is important, let's try to have this ready! Ping me on the next PR and link it with this one as well to get the full conv! |
#29407 @ArthurZucker Please review this pr you mentioned above. |
What does this PR do?
Add sliding window param as described in #28980, and the solution is check if torch version greater than or equal as torch 2.2.0 (release version) , and add sliding window to attention_mask if it's true.
Tests of sdpa and sliding window attention with torch 2.2.0 has been passed with
RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/models/mistral/test_modeling_mistral.py
in my local environment.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker @fxmarty