Granite language models#31502
Conversation
younesbelkada
left a comment
There was a problem hiding this comment.
Thanks a lot !
isn't granite support already added in #30031 ? If not we could leverage diff tool that has been recently added - see for example #31211 for reference. I'll let @ArthurZucker comment on this
|
hey @younesbelkada lets just leave this PR for now. |
|
@ArthurZucker this is ready for merge |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM, 2 small nits and let's merge
| @slow | ||
| @require_torch_gpu | ||
| @require_read_token | ||
| def test_compile_static_cache(self): |
There was a problem hiding this comment.
does this not work (or why was it removed!)
There was a problem hiding this comment.
yeah, it does not work.
There was a problem hiding this comment.
the models use mup and the error is way to high to compare generated outputs
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
I have addressed the changes |
|
Thanks for bearing with me 🤗 |
|
passed docs 🥳 |
|
thanks Arthur :) |
|
Thank you as well! 🤗 |
|
hello! module 'torch.nn' has no attribute 'RMSNorm' The version of torch < 2.4.0 will report an error. |
|
|
||
|
|
||
| ALL_LAYERNORM_LAYERS = [nn.LayerNorm] | ||
| ALL_LAYERNORM_LAYERS = [nn.LayerNorm, nn.RMSNorm] |
There was a problem hiding this comment.
Encountered the same issue, opened #33177 to fix it
* first commit * drop tokenizer * drop tokenizer * drop tokenizer * drop convert * granite * drop tokenization test * mup * fix * reformat * reformat * reformat * fix docs * stop checking for checkpoint * update support * attention multiplier * update model * tiny drop * saibo drop * skip test * fix test * fix test * drop * drop useless imports * update docs * drop flash function * copied from * drop pretraining tp * drop pretraining tp * drop pretraining tp * drop unused import * drop code path * change name * softmax scale * head dim * drop legacy cache * rename params * cleanup * fix copies * comments * add back legacy cache * multipliers * multipliers * multipliers * text fix * fix copies * merge * multipliers * attention multiplier * drop unused imports * fix * fix * fix * move rope? * Update src/transformers/models/granite/configuration_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * Update src/transformers/models/granite/modeling_granite.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix * fix * fix * fix-copies * torch rmsnorm * add authors * change model path * fix * test * drop static cache test * uupdate readme * drop non-causal * readme * drop useless imports * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
This PR adds support for IBM's upcoming LLMs 3B and 8B.