FIX [quantization / ESM] Fix ESM 8bit / 4bit with bitsandbytes#29329
Merged
younesbelkada merged 3 commits intohuggingface:mainfrom Mar 1, 2024
Merged
FIX [quantization / ESM] Fix ESM 8bit / 4bit with bitsandbytes#29329younesbelkada merged 3 commits intohuggingface:mainfrom
quantization / ESM] Fix ESM 8bit / 4bit with bitsandbytes#29329younesbelkada merged 3 commits intohuggingface:mainfrom
Conversation
younesbelkada
commented
Feb 28, 2024
| attention_probs = attention_probs * head_mask | ||
|
|
||
| context_layer = torch.matmul(attention_probs, value_layer) | ||
| context_layer = torch.matmul(attention_probs.to(value_layer.dtype), value_layer) |
Contributor
Author
There was a problem hiding this comment.
This was needed to perform correctly inference otherwise you get dtype mismatch
Member
There was a problem hiding this comment.
what do we get if we don't do this fix ?
Contributor
Author
There was a problem hiding this comment.
You get a dtype mismatch :/
4 tasks
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
5 tasks
4 tasks
Contributor
|
Thanks for the quick fix, everyone! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes: #29323
Currently on main, simply running:
Fails with an error
This is because the model pushed in
"facebook/esm2_t36_3B_UR50D"do not contain theinv_freq. Maybe during the HfQuantizer refactor we did not properly dealt with that specific scenario, leading to this bug for transformers > 4.37cc @SunMarc
I ran the quantization tests and they seem to all pass on my end