Refactor Cohere Model by saurabhdash2512 · Pull Request #30027 · huggingface/transformers

saurabhdash2512 · 2024-04-03T19:21:34Z

What does this PR do?

Refactor Cohere Model

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

ArthurZucker

Thanks.
Let's keep BC for the layer norm name.
While we are at it, missing copied from! 🤗

ArthurZucker · 2024-04-04T06:24:55Z

src/transformers/models/cohere/modeling_cohere.py


 class CohereLayerNorm(nn.Module):
-    def __init__(self, hidden_size, eps=1e-5, bias=False):
+    def __init__(self, param_shape=None, eps=1e-5, bias=False):


Suggested change

def __init__(self, param_shape=None, eps=1e-5, bias=False):

def __init__(self, hidden_size =None, eps=1e-5, bias=False):

""" The hidden size can be a tuple or an int. If a tuple is used, the layer will be applied on both dim """

src/transformers/models/cohere/modeling_cohere.py

ArthurZucker · 2024-04-04T06:29:18Z

src/transformers/models/cohere/modeling_cohere.py

+    dtype = q.dtype
+    q = q.float()
+    k = k.float()
    cos = cos.unsqueeze(unsqueeze_dim)
    sin = sin.unsqueeze(unsqueeze_dim)
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
-    return q_embed, k_embed
+    return q_embed.to(dtype=dtype), k_embed.to(dtype=dtype)


if this is done outside apply_rotary_pos_emb we can keep the copied from 😉 but it's not an issue

ArthurZucker · 2024-04-04T06:30:34Z

src/transformers/models/cohere/modeling_cohere.py

            )

+        if self.use_qk_norm:
+            # When sharding the model using Tensor Parallelism, need to be careful to use n_local_heads


Not sure if this comment is relevant as the model is not sharded by default

Oh this is to warn others who port this to other frameworks from HF.

ArthurZucker · 2024-04-04T10:46:16Z

As a followup, adding integration tests for the new model will be nice.

fblissjr · 2024-04-08T18:11:04Z

I think there's an issue with the tokenizer.json (or the way transformers is converting). Random multilingual out of nowhere.

See: https://huggingface.co/CohereForAI/c4ai-command-r-plus/discussions/15

Found this after converting to mlx-lm, but was found to be reproducible via the following:

You can reproduce with:

 
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-plus")
tokenizer.save_pretrained(".")```

awni · 2024-04-08T18:17:15Z

Concretely, loading -> saving -> loading the tokenizer all through HF AutoTokenizer produces a new tokenizer than the initial loaded one with worse behavior. Presumably that should not be the case.

fblissjr · 2024-04-08T20:11:06Z

worth noting - the bitsandbytes 4 bit repo linked from the main cohere repo is also a different size and looks very different from the original model repo's tokenizer.json (https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit/blob/main/tokenizer.json).

fblissjr · 2024-04-08T20:29:28Z

another interesting difference between the 4 bit bnb tokenizer and the original - in the original one, token id token id 255001 <|END_OF_TURN_TOKEN|>, special is set to False. In the 4bit bnb one, it's True.

ArthurZucker · 2024-04-08T22:15:30Z

I have no idea what you are talking about? A 4bit tokenizer? could you actually open an issue with a repro and the issue?

ahmetustun · 2024-04-08T22:39:07Z

I think the difference between two tokenizers.json is unicode encoding after tokenizer.save_pretrained which is used to save tokenizer in 4bit model. Also, in the config <|END_OF_TURN_TOKEN|> token is also set as special because it is used as eos_token, which is also overwritten in the original tokenizer (command-r-plus) as well. Therefore, two tokenizers should work the same. @fblissjr it would be helpful if you post the text you tokenize to double-check and reproduce.

fblissjr · 2024-04-08T23:06:37Z

I have no idea what you are talking about? A 4bit tokenizer? could you actually open an issue with a repro and the issue?

What @ahmetustun mentioned is the difference - not a 4 bit tokenizer, the tokenizer.json in https://huggingface.co/CohereForAI/c4ai-command-r-plus vs. https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit, which @ahmetustun clarified.

Not at my workstation now, but if nobody else is seeing this issue, I'll assume I've got something wrong on my end. Thanks for the clarification.

ahmetustun · 2024-04-08T23:18:27Z

Please left further comment in the model repo if the problem continues. Thanks @fblissjr.

changes

0490ce3

saurabhdash2512 changed the title ~~Updates~~ Refactor Cohere Model Apr 3, 2024

ArthurZucker reviewed Apr 4, 2024

View reviewed changes

saurabhdash2512 added 2 commits April 4, 2024 09:22

addressing comments

907dcb4

smol fix

9326091

ArthurZucker approved these changes Apr 4, 2024

View reviewed changes

ArthurZucker merged commit 517a3e6 into huggingface:main Apr 4, 2024

fblissjr mentioned this pull request Apr 8, 2024

Command-R-Plus, Context Window Limitations ml-explore/mlx-examples#660

Open

fblissjr mentioned this pull request Apr 8, 2024

It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ ggml-org/llama.cpp#6551

Closed

drbh mentioned this pull request Apr 22, 2024

fix: default use_qk_norm false in cohere huggingface/text-generation-inference#1758

Closed

	def __init__(self, param_shape=None, eps=1e-5, bias=False):
	def __init__(self, hidden_size =None, eps=1e-5, bias=False):
	""" The hidden size can be a tuple or an int. If a tuple is used, the layer will be applied on both dim """

Conversation

saurabhdash2512 commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

saurabhdash2512 Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Apr 4, 2024

Uh oh!

fblissjr commented Apr 8, 2024

Uh oh!

awni commented Apr 8, 2024

Uh oh!

fblissjr commented Apr 8, 2024

Uh oh!

fblissjr commented Apr 8, 2024

Uh oh!

ArthurZucker commented Apr 8, 2024

Uh oh!

ahmetustun commented Apr 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fblissjr commented Apr 8, 2024

Uh oh!

ahmetustun commented Apr 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

saurabhdash2512 commented Apr 3, 2024 •

edited

Loading

ahmetustun commented Apr 8, 2024 •

edited

Loading