Cohere Model Release#29622
Cohere Model Release#29622LysandreJik merged 27 commits intohuggingface:mainfrom saurabhdash2512:main
Conversation
Cohere Model Release
Some cleanup
younesbelkada
left a comment
There was a problem hiding this comment.
Huge work ! Thanks a lot for this !
I left couple of comments - mostly being that we can leverage # Copied from mechanism almost all over the places and simply put # Ignore copy statements at the places where the implementation differs from Llama, please have a look at my comments
Also it seems the documentation, and testing files are missing - they should be pretty easy to add as the model is mostly copied from llama so I'd expect all tests to pass out of the box ! For a similar PR: #29215 Starcoder was mostly copied from Mistral with minor architectural changes, please have a look at that PR as well and try to take some inspiration from there - I really think it shouldn't be hard to make this PR mergeable ASAP ! 🚀
Also, let's remove pretraining_tp as from the checkpoints I inspected, it's either set to 1: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/main/config.json#L27 or None so we can remove it completely from the code base of this model
For making the CI happy, you can first make sure that make fixup passes on your local machine (this will check the # Copied from mechanism)
Let me know if you need any help !
| 1. **[CLVP](https://huggingface.co/docs/transformers/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker. | ||
| 1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. | ||
| 1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. | ||
| 1. **[Cohere](https://huggingface.co/docs/transformers/main/model_doc/cohere)** (from Cohere) released with the paper [Command-R: Retrieval Augmented Generation at Production Scale](<https://txt.cohere.com/command-r/>) by Cohere. |
There was a problem hiding this comment.
You need to run make fix-copies to propagate these changes to all other READMEs
| ) | ||
|
|
||
|
|
||
| class LayerNorm(nn.Module): |
There was a problem hiding this comment.
Can you prepend the name of these modules with Cohere? -- > CohereLayerNorm
|
|
||
|
|
||
| class LayerNorm(nn.Module): | ||
| def __init__(self, hidden_size, eps=1e-5, bias=False): |
There was a problem hiding this comment.
This layer norm is different from the Llama layernorm because of the bias term, can we explcitly mention that in the docstring of this module?
| def __init__(self, hidden_size, eps=1e-5, bias=False): | ||
| super().__init__() | ||
| self.weight = nn.Parameter(torch.ones(hidden_size)) | ||
| self.bias = nn.Parameter(torch.zeros(hidden_size)) if bias else None |
There was a problem hiding this comment.
Actually bias seems to be always set to None (looking at the LayerNorm definitions below) - can we instead here remove bias , copy the LayerNorm module from Llama and add a # Copied from statement on top of the module?
There was a problem hiding this comment.
The Llama LayerNorm is a bit different from this one which happens in FP32. I kept the bias incase we decide to add bias in the Future.
| return causal_mask | ||
|
|
||
|
|
||
| class CohereForCausalLM(CoherePreTrainedModel): |
There was a problem hiding this comment.
| class CohereForCausalLM(CoherePreTrainedModel): | |
| # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM with Llama->Cohere | |
| class CohereForCausalLM(CoherePreTrainedModel): |
|
|
||
| @add_start_docstrings_to_model_forward(COHERE_INPUTS_DOCSTRING) | ||
| @replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC) | ||
| def forward( |
There was a problem hiding this comment.
| def forward( | |
| # Ignore copy | |
| def forward( |
Since here the difference with llama is that we multiply the lm logits with logits_scale
There was a problem hiding this comment.
A file for modeling and tokenizer tests is missing
There was a problem hiding this comment.
What kind of tests are required for modeling and tokenization @younesbelkada ?
There was a problem hiding this comment.
you can copy over tests/models/llama/test_modeling_llama.py and tests/models/llama/test_tokenization_llama.py and adapt it for Cohere. Note we also need an integration test so you can remove the llama integration tests and replace it with new ones
| - local: model_doc/code_llama | ||
| title: CodeLlama | ||
| - local: model_doc/cohere | ||
| title: Cohere |
There was a problem hiding this comment.
The cohere doc page is missing
younesbelkada
left a comment
There was a problem hiding this comment.
Here you need to add a field for the slow tokenizer in order for the failing CI to pass, e.g.:
| ("codegen", ("CodeGenTokenizer", "CodeGenTokenizerFast" if is_tokenizers_available() else None)), | ||
| ( | ||
| "cohere", | ||
| ("CohereTokenizerFast" if is_tokenizers_available() else None,), |
There was a problem hiding this comment.
| ("CohereTokenizerFast" if is_tokenizers_available() else None,), | |
| (None, "CohereTokenizerFast" if is_tokenizers_available() else None,), |
| ( | ||
| "cohere", | ||
| ( | ||
| "CohereTokenizerFast" if is_tokenizers_available() else None, |
There was a problem hiding this comment.
| "CohereTokenizerFast" if is_tokenizers_available() else None, | |
| None, "CohereTokenizerFast" if is_tokenizers_available() else None, |
* fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py
* tokenizer test * format fix
Fixes in Tokenization Tests
younesbelkada
left a comment
There was a problem hiding this comment.
Thanks for the huge work ! This is much cleaner ! 🤩 I left few possible enhancements to make the code easier to maintain in the future ! We should be really close merging this ! 🚀
docs/source/en/model_doc/cohere.md
Outdated
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | ||
| model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) |
There was a problem hiding this comment.
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id) |
These won't be needed once we merge the PR right?
docs/source/en/model_doc/cohere.md
Outdated
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | ||
| model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) |
There was a problem hiding this comment.
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id) |
docs/source/en/model_doc/cohere.md
Outdated
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | ||
| model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, quantization_config=bnb_config) |
There was a problem hiding this comment.
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, quantization_config=bnb_config) | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config) |
| def __init__(self, hidden_size, eps=1e-5, bias=False): | ||
| super().__init__() | ||
| self.weight = nn.Parameter(torch.ones(hidden_size)) | ||
| self.bias = nn.Parameter(torch.zeros(hidden_size)) if bias else None |
| t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.int64).type_as(self.inv_freq) | ||
| t = t / self.scaling_factor | ||
| freqs = torch.outer(t, self.inv_freq) | ||
| emb = torch.repeat_interleave(freqs, 2, dim=-1) | ||
| self.register_buffer("_cos_cached", emb.cos().to(torch.get_default_dtype()), persistent=False) | ||
| self.register_buffer("_sin_cached", emb.sin().to(torch.get_default_dtype()), persistent=False) |
There was a problem hiding this comment.
| t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.int64).type_as(self.inv_freq) | |
| t = t / self.scaling_factor | |
| freqs = torch.outer(t, self.inv_freq) | |
| emb = torch.repeat_interleave(freqs, 2, dim=-1) | |
| self.register_buffer("_cos_cached", emb.cos().to(torch.get_default_dtype()), persistent=False) | |
| self.register_buffer("_sin_cached", emb.sin().to(torch.get_default_dtype()), persistent=False) |
_sin_cached and _cos_cached seems to be not used below, I think we can remove them !
| class CohereForCausalLM(CoherePreTrainedModel): | ||
| _tied_weights_keys = ["model.embed_tokens.weight", "lm_head.weight"] | ||
|
|
||
| def __init__(self, config): |
There was a problem hiding this comment.
| def __init__(self, config): | |
| # Ignore copy | |
| def __init__(self, config): |
| return causal_mask | ||
|
|
||
|
|
||
| class CohereForCausalLM(CoherePreTrainedModel): |
There was a problem hiding this comment.
Do you know why the copied from cannot be used here? It will be very useful to easily maintain the methods below such as _prepare_inputs_for_generation, otherwise you can also try to put a copied from statement on the _prepare_inputs_for_generation method
| from transformers import CohereForCausalLM, CohereModel | ||
|
|
||
|
|
||
| class CohereModelTester: |
There was a problem hiding this comment.
| class CohereModelTester: | |
| # Copied from transformers.tests.models.llama.test_modeling_llama.LlamaModelTester with Llama->Cohere | |
| class CohereModelTester: |
Can we add also copied from on tests as well ?
There was a problem hiding this comment.
There are differences in the tests.
|
|
||
|
|
||
| @require_torch | ||
| class CohereModelTest(unittest.TestCase): |
There was a problem hiding this comment.
same here for copied from
| class CohereIntegrationTest(unittest.TestCase): | ||
| @unittest.skip("Logits are not exactly the same, once we fix the instabalities somehow, will update!") | ||
| @slow | ||
| def test_model_logits(self): |
There was a problem hiding this comment.
Are these the real values obtained?
Could you rather add some end-to-end generation tests similarly as:
* fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @saurabhdash2512, this looks really good! We've just made some updates to the (I realize you've already rebased quite recently - I'm sorry about this!) |
@Rocketknight1 Done! Sync'ed with main. |
younesbelkada
left a comment
There was a problem hiding this comment.
Great work @saurabhdash and team !
Cohere Model Release
What does this PR do?
This PR adds the Cohere Model Family with the release of the Command-R model weights
More information about the model can be found here
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@LysandreJik