Conversation
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks.
Let's keep BC for the layer norm name.
While we are at it, missing copied from! 🤗
|
|
||
| class CohereLayerNorm(nn.Module): | ||
| def __init__(self, hidden_size, eps=1e-5, bias=False): | ||
| def __init__(self, param_shape=None, eps=1e-5, bias=False): |
There was a problem hiding this comment.
| def __init__(self, param_shape=None, eps=1e-5, bias=False): | |
| def __init__(self, hidden_size =None, eps=1e-5, bias=False): | |
| """ The hidden size can be a tuple or an int. If a tuple is used, the layer will be applied on both dim """ |
| dtype = q.dtype | ||
| q = q.float() | ||
| k = k.float() | ||
| cos = cos.unsqueeze(unsqueeze_dim) | ||
| sin = sin.unsqueeze(unsqueeze_dim) | ||
| q_embed = (q * cos) + (rotate_half(q) * sin) | ||
| k_embed = (k * cos) + (rotate_half(k) * sin) | ||
| return q_embed, k_embed | ||
| return q_embed.to(dtype=dtype), k_embed.to(dtype=dtype) |
There was a problem hiding this comment.
if this is done outside apply_rotary_pos_emb we can keep the copied from 😉 but it's not an issue
| ) | ||
|
|
||
| if self.use_qk_norm: | ||
| # When sharding the model using Tensor Parallelism, need to be careful to use n_local_heads |
There was a problem hiding this comment.
Not sure if this comment is relevant as the model is not sharded by default
There was a problem hiding this comment.
Oh this is to warn others who port this to other frameworks from HF.
|
As a followup, adding integration tests for the new model will be nice. |
|
I think there's an issue with the tokenizer.json (or the way transformers is converting). Random multilingual out of nowhere. See: https://huggingface.co/CohereForAI/c4ai-command-r-plus/discussions/15 Found this after converting to mlx-lm, but was found to be reproducible via the following: You can reproduce with: |
|
Concretely, loading -> saving -> loading the tokenizer all through HF |
|
worth noting - the bitsandbytes 4 bit repo linked from the main cohere repo is also a different size and looks very different from the original model repo's tokenizer.json (https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit/blob/main/tokenizer.json). |
|
another interesting difference between the 4 bit bnb tokenizer and the original - in the original one, token id token id 255001 <|END_OF_TURN_TOKEN|>, special is set to False. In the 4bit bnb one, it's True. |
|
I have no idea what you are talking about? A 4bit tokenizer? could you actually open an issue with a repro and the issue? |
|
I think the difference between two tokenizers.json is unicode encoding after |
What @ahmetustun mentioned is the difference - not a 4 bit tokenizer, the tokenizer.json in https://huggingface.co/CohereForAI/c4ai-command-r-plus vs. https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit, which @ahmetustun clarified. Not at my workstation now, but if nobody else is seeing this issue, I'll assume I've got something wrong on my end. Thanks for the clarification. |
|
Please left further comment in the model repo if the problem continues. Thanks @fblissjr. |
What does this PR do?
Refactor Cohere Model
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker and @younesbelkada
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.