Cohere Model Release by saurabhdash2512 · Pull Request #29622 · huggingface/transformers

saurabhdash2512 · 2024-03-12T22:05:06Z

Cohere Model Release

What does this PR do?

This PR adds the Cohere Model Family with the release of the Command-R model weights

More information about the model can be found here

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@LysandreJik

Cohere Model Release

Some cleanup

younesbelkada

Huge work ! Thanks a lot for this !

I left couple of comments - mostly being that we can leverage # Copied from mechanism almost all over the places and simply put # Ignore copy statements at the places where the implementation differs from Llama, please have a look at my comments
Also it seems the documentation, and testing files are missing - they should be pretty easy to add as the model is mostly copied from llama so I'd expect all tests to pass out of the box ! For a similar PR: #29215 Starcoder was mostly copied from Mistral with minor architectural changes, please have a look at that PR as well and try to take some inspiration from there - I really think it shouldn't be hard to make this PR mergeable ASAP ! 🚀

Also, let's remove pretraining_tp as from the checkpoints I inspected, it's either set to 1: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/main/config.json#L27 or None so we can remove it completely from the code base of this model

For making the CI happy, you can first make sure that make fixup passes on your local machine (this will check the # Copied from mechanism)

Let me know if you need any help !

younesbelkada · 2024-03-13T08:43:28Z

README.md

 1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
 1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
 1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
+1. **[Cohere](https://huggingface.co/docs/transformers/main/model_doc/cohere)** (from Cohere) released with the paper [Command-R: Retrieval Augmented Generation at Production Scale](<https://txt.cohere.com/command-r/>) by Cohere. 


You need to run make fix-copies to propagate these changes to all other READMEs

src/transformers/models/cohere/configuration_cohere.py

younesbelkada · 2024-03-13T08:48:01Z

src/transformers/models/cohere/modeling_cohere.py

+    )
+
+
+class LayerNorm(nn.Module):


Can you prepend the name of these modules with Cohere? -- > CohereLayerNorm

younesbelkada · 2024-03-13T08:49:40Z

src/transformers/models/cohere/modeling_cohere.py

+
+
+class LayerNorm(nn.Module):
+    def __init__(self, hidden_size, eps=1e-5, bias=False):


This layer norm is different from the Llama layernorm because of the bias term, can we explcitly mention that in the docstring of this module?

younesbelkada · 2024-03-13T08:51:23Z

src/transformers/models/cohere/modeling_cohere.py

+    def __init__(self, hidden_size, eps=1e-5, bias=False):
+        super().__init__()
+        self.weight = nn.Parameter(torch.ones(hidden_size))
+        self.bias = nn.Parameter(torch.zeros(hidden_size)) if bias else None


Actually bias seems to be always set to None (looking at the LayerNorm definitions below) - can we instead here remove bias , copy the LayerNorm module from Llama and add a # Copied from statement on top of the module?

The Llama LayerNorm is a bit different from this one which happens in FP32. I kept the bias incase we decide to add bias in the Future.

src/transformers/models/cohere/modeling_cohere.py

younesbelkada · 2024-03-13T09:06:16Z

src/transformers/models/cohere/modeling_cohere.py

+        return causal_mask
+
+
+class CohereForCausalLM(CoherePreTrainedModel):


Suggested change

class CohereForCausalLM(CoherePreTrainedModel):

# Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM with Llama->Cohere

class CohereForCausalLM(CoherePreTrainedModel):

younesbelkada · 2024-03-13T09:06:57Z

src/transformers/models/cohere/modeling_cohere.py

+
+    @add_start_docstrings_to_model_forward(COHERE_INPUTS_DOCSTRING)
+    @replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC)
+    def forward(


Suggested change

def forward(

# Ignore copy

def forward(

Since here the difference with llama is that we multiply the lm logits with logits_scale

younesbelkada · 2024-03-13T09:08:39Z

src/transformers/utils/dummy_pt_objects.py

A file for modeling and tokenizer tests is missing

What kind of tests are required for modeling and tokenization @younesbelkada ?

you can copy over tests/models/llama/test_modeling_llama.py and tests/models/llama/test_tokenization_llama.py and adapt it for Cohere. Note we also need an integration test so you can remove the llama integration tests and replace it with new ones

younesbelkada · 2024-03-13T09:10:47Z

docs/source/en/_toctree.yml

      - local: model_doc/code_llama
        title: CodeLlama
+      - local: model_doc/cohere
+        title: Cohere


The cohere doc page is missing

younesbelkada

Here you need to add a field for the slow tokenizer in order for the failing CI to pass, e.g.:

transformers/src/transformers/models/auto/tokenization_auto.py

Line 97 in 9acce7d

("bloom", (None, "BloomTokenizerFast" if is_tokenizers_available() else None)),

younesbelkada · 2024-03-13T12:20:09Z

src/transformers/models/auto/tokenization_auto.py

            ("codegen", ("CodeGenTokenizer", "CodeGenTokenizerFast" if is_tokenizers_available() else None)),
+            (
+                "cohere",
+                ("CohereTokenizerFast" if is_tokenizers_available() else None,),


Suggested change

("CohereTokenizerFast" if is_tokenizers_available() else None,),

(None, "CohereTokenizerFast" if is_tokenizers_available() else None,),

younesbelkada · 2024-03-13T12:20:14Z

src/transformers/models/auto/tokenization_auto.py

+            (
+                "cohere",
+                (
+                    "CohereTokenizerFast" if is_tokenizers_available() else None,


Suggested change

"CohereTokenizerFast" if is_tokenizers_available() else None,

None, "CohereTokenizerFast" if is_tokenizers_available() else None,

* fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py

* tokenizer test * format fix

Fixes in Tokenization Tests

younesbelkada

Thanks for the huge work ! This is much cleaner ! 🤩 I left few possible enhancements to make the code easier to maintain in the future ! We should be really close merging this ! 🚀

younesbelkada · 2024-03-14T09:04:09Z

docs/source/en/model_doc/cohere.md

+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)


Suggested change

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id)

These won't be needed once we merge the PR right?

younesbelkada · 2024-03-14T09:04:26Z

docs/source/en/model_doc/cohere.md

+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)


Suggested change

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id)

younesbelkada · 2024-03-14T09:04:37Z

docs/source/en/model_doc/cohere.md

+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, quantization_config=bnb_config)


Suggested change

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, quantization_config=bnb_config)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)

younesbelkada · 2024-03-14T09:05:28Z

src/transformers/models/cohere/modeling_cohere.py

+    def __init__(self, hidden_size, eps=1e-5, bias=False):
+        super().__init__()
+        self.weight = nn.Parameter(torch.ones(hidden_size))
+        self.bias = nn.Parameter(torch.zeros(hidden_size)) if bias else None


younesbelkada · 2024-03-14T09:06:16Z

src/transformers/models/cohere/modeling_cohere.py

+        t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.int64).type_as(self.inv_freq)
+        t = t / self.scaling_factor
+        freqs = torch.outer(t, self.inv_freq)
+        emb = torch.repeat_interleave(freqs, 2, dim=-1)
+        self.register_buffer("_cos_cached", emb.cos().to(torch.get_default_dtype()), persistent=False)
+        self.register_buffer("_sin_cached", emb.sin().to(torch.get_default_dtype()), persistent=False)


Suggested change

t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.int64).type_as(self.inv_freq)

t = t / self.scaling_factor

freqs = torch.outer(t, self.inv_freq)

emb = torch.repeat_interleave(freqs, 2, dim=-1)

self.register_buffer("_cos_cached", emb.cos().to(torch.get_default_dtype()), persistent=False)

self.register_buffer("_sin_cached", emb.sin().to(torch.get_default_dtype()), persistent=False)

_sin_cached and _cos_cached seems to be not used below, I think we can remove them !

younesbelkada · 2024-03-14T09:08:33Z

src/transformers/models/cohere/modeling_cohere.py

+class CohereForCausalLM(CoherePreTrainedModel):
+    _tied_weights_keys = ["model.embed_tokens.weight", "lm_head.weight"]
+
+    def __init__(self, config):


Suggested change

def __init__(self, config):

# Ignore copy

def __init__(self, config):

younesbelkada · 2024-03-14T09:09:23Z

src/transformers/models/cohere/modeling_cohere.py

+        return causal_mask
+
+
+class CohereForCausalLM(CoherePreTrainedModel):


Do you know why the copied from cannot be used here? It will be very useful to easily maintain the methods below such as _prepare_inputs_for_generation, otherwise you can also try to put a copied from statement on the _prepare_inputs_for_generation method

younesbelkada · 2024-03-14T09:10:59Z

tests/models/cohere/test_modeling_cohere.py

+    from transformers import CohereForCausalLM, CohereModel
+
+
+class CohereModelTester:


Suggested change

class CohereModelTester:

# Copied from transformers.tests.models.llama.test_modeling_llama.LlamaModelTester with Llama->Cohere

class CohereModelTester:

Can we add also copied from on tests as well ?

There are differences in the tests.

younesbelkada · 2024-03-14T09:11:11Z

tests/models/cohere/test_modeling_cohere.py

+
+
+@require_torch
+class CohereModelTest(unittest.TestCase):


same here for copied from

younesbelkada · 2024-03-14T09:11:43Z

tests/models/cohere/test_modeling_cohere.py

+class CohereIntegrationTest(unittest.TestCase):
+    @unittest.skip("Logits are not exactly the same, once we fix the instabalities somehow, will update!")
+    @slow
+    def test_model_logits(self):


Are these the real values obtained?
Could you rather add some end-to-end generation tests similarly as:

transformers/tests/models/starcoder2/test_modeling_starcoder2.py

Line 471 in f738ab3

def test_starcoder2_batched_generation_sdpa(self):

* fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test

HuggingFaceDocBuilderDev · 2024-03-14T15:43:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2024-03-14T19:08:33Z

Hi @saurabhdash2512, this looks really good! We've just made some updates to the transformers library here to support some of the new chat template features this model uses - if possible, can you rebase your PR so we can use them in this port? I'll make a PR to your branch to add support for the features once you rebase!

(I realize you've already rebased quite recently - I'm sorry about this!)

saurabhdash2512 · 2024-03-14T19:35:59Z

Hi @saurabhdash2512, this looks really good! We've just made some updates to the transformers library here to support some of the new chat template features this model uses - if possible, can you rebase your PR so we can use them in this port? I'll make a PR to your branch to add support for the features once you rebase!

(I realize you've already rebased quite recently - I'm sorry about this!)

@Rocketknight1 Done! Sync'ed with main.

LysandreJik

Awesome! LGTM

younesbelkada

Great work @saurabhdash and team !

saurabhdash2512 added 3 commits March 13, 2024 03:23

Cohere Model Release (#1)

6e73900

Cohere Model Release

Remove unnecessary files and code (#2)

67c4a9b

Some cleanup

Delete cohere-model directory (#3)

0bc7cf9

younesbelkada reviewed Mar 13, 2024

View reviewed changes

Make Fix (#5)

f964504

younesbelkada reviewed Mar 13, 2024

View reviewed changes

ahmetustun and others added 17 commits March 13, 2024 20:31

Pr fixes (#6)

9c867c5

* fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py

Merge branch 'huggingface:main' into main

04d96fd

Tokenizer test (#8)

115f198

* tokenizer test * format fix

Adding Docs and other minor changes (#7)

cacb8ae

Merge branch 'huggingface:main' into main

d959443

Add modeling tests (#9)

82a0f3b

Smol Fix (#11)

ef6ed3d

Merge branch 'huggingface:main' into main

aa2f878

tokenization tests are fixed

c86b184

Merge branch 'main' into tokenization_tests

09cd02f

format fixes

e7567ca

Merge pull request #12 from saurabhdash2512/tokenization_tests

cf86bba

Fixes in Tokenization Tests

fix pr doc tests

62822c8

fix pr doc tests

43ae26e

Merge branch 'tokenization_tests'

aeac596

fix pr doc tests

d1dd3e1

fix pr style check

6faf117

younesbelkada reviewed Mar 14, 2024

View reviewed changes

saurabhdash2512 and others added 2 commits March 14, 2024 11:43

small changes in cohere.md

c841cc7

FIX: Address final comments for transformers integration (#13)

966ec9c

* fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test

Merge branch 'huggingface:main' into main

aeb5908

younesbelkada and others added 3 commits March 15, 2024 17:50

fix modeling cohere (#14)

bb7f728

Update chat templates to use the new API (#15)

24a2227

Merge branch 'huggingface:main' into main

24a746e

LysandreJik approved these changes Mar 15, 2024

View reviewed changes

younesbelkada approved these changes Mar 15, 2024

View reviewed changes

LysandreJik merged commit 0e4a1c3 into huggingface:main Mar 15, 2024

warner-benjamin mentioned this pull request Apr 4, 2024

Llama uses significantly more memory in 4.38 & 4.39 than 4.37 with identical code #30010

Closed

4 tasks

xenova mentioned this pull request Jun 12, 2024

Add Cohere ONNX export support huggingface/optimum#1905

Closed

3 tasks

ArthurZucker mentioned this pull request Jul 26, 2024

TinyModel addition #31804

Open

2 tasks

mnauf mentioned this pull request Aug 31, 2024

'CohereModel' object has no attribute '_prune_heads' #33235

Closed

4 tasks



		class LayerNorm(nn.Module):
		def __init__(self, hidden_size, eps=1e-5, bias=False):

		return causal_mask


		class CohereForCausalLM(CoherePreTrainedModel):

	class CohereForCausalLM(CoherePreTrainedModel):
	# Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM with Llama->Cohere
	class CohereForCausalLM(CoherePreTrainedModel):

	("CohereTokenizerFast" if is_tokenizers_available() else None,),
	(None, "CohereTokenizerFast" if is_tokenizers_available() else None,),

	"CohereTokenizerFast" if is_tokenizers_available() else None,
	None, "CohereTokenizerFast" if is_tokenizers_available() else None,

		tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
		model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

	def __init__(self, config):
	# Ignore copy
	def __init__(self, config):

		from transformers import CohereForCausalLM, CohereModel


		class CohereModelTester:

	class CohereModelTester:
	# Copied from transformers.tests.models.llama.test_modeling_llama.LlamaModelTester with Llama->Cohere
	class CohereModelTester:

Conversation

saurabhdash2512 commented Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 14, 2024

Uh oh!

Rocketknight1 commented Mar 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saurabhdash2512 commented Mar 14, 2024

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

saurabhdash2512 commented Mar 12, 2024 •

edited

Loading

Rocketknight1 commented Mar 14, 2024 •

edited

Loading