Internalise the NomicBERT model#43067
Conversation
Co-authored-by: Felix Arkle <felixarkle@icloud.com>
…elfAttention signature
Implemented descriptions for the main nomic bert documentation and debugged modular_nomic_bert
Co-authored-by: Felix Arkle <felixarkle@icloud.com>
Add einops to setup and add availibility checks for more graceful exit if not available
previous version overrote bert, leading to forward_unimplemented
Remove code which broke the encoder only assumption
16fa0de to
f9763e1
Compare
Alter logic so smaller hidden dimensions are still computed correctly and not lost
Although NomicBERT is encoder only model BertGeneration also requires it to have decoder capabilities
ddc59d6 to
9330347
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, jina_embeddings_v3, nomic_bert |
1 similar comment
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, jina_embeddings_v3, nomic_bert |
|
run-slow: jina_embeddings_v3, nomic_bert |
|
This comment contains models: ["models/jina_embeddings_v3", "models/nomic_bert"] |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43067&sha=da43bf |
tomaarsen
left a comment
There was a problem hiding this comment.
My understanding is that this incorporates only the non-MoE path? The https://huggingface.co/nomic-ai/nomic-bert-2048/blob/main/modeling_hf_nomic_bert.py modeling code is used for various models, including:
- https://huggingface.co/nomic-ai/nomic-embed-text-v1
- https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
- https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe (but it uses MoE parameters, 8 experts, etc.)
These vision models:
- https://huggingface.co/nomic-ai/nomic-embed-vision-v1.5
- https://huggingface.co/nomic-ai/nomic-embed-vision-v1
And these research checkpoints:
- https://huggingface.co/nomic-ai/nomic-embed-text-v1-unsupervised
- https://huggingface.co/nomic-ai/nomic-embed-text-v1-ablated
- https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe-unsupervised
- https://huggingface.co/nomic-ai/nomic-xlm-2048
I assume that this work is only aiming for the text portion. That does mean that we're diverging from the original implementation a bit, which also supports vision and MoE. Not strictly an issue, just something to note.
If we move forward, let's try to support not just the v1.5, but also the v1, it's also getting used a lot.
|
|
||
| ## Overview | ||
|
|
||
| The NomicBERT model currently has no academic papers specifically written about it, however, the [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) card clearly describes the model’s architecture and training approach: it extends BERT to a 2048 token context length, and modifies the BERT training procedure. Notable changes include: |
There was a problem hiding this comment.
Yes, it does: https://arxiv.org/abs/2402.01613
cc @zanussbaum
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, jina_embeddings_v3, nomic_bert |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, jina_embeddings_v3, nomic_bert |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, jina_embeddings_v3, nomic_bert |
|
run-slow: jina_embeddings_v3, nomic_bert |
|
This comment contains models: ["models/jina_embeddings_v3", "models/nomic_bert"] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, jina_embeddings_v3, nomic_bert |
CI ResultsCommit Info
The test failure analysis could not be completed. Please check the workflow run for details. |
|
run-slow: jina_embeddings_v3, nomic_bert |
|
This comment contains models: ["models/jina_embeddings_v3", "models/nomic_bert"] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, jina_embeddings_v3, nomic_bert |
tomaarsen
left a comment
There was a problem hiding this comment.
Small nits, the general gist is solid I think.
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, jina_embeddings_v3, nomic_bert |
|
hub has problems and the other test is unrelated, merging |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43067&sha=67fbce |
|
Thanks a lot to everyone involved @ed22699 @tomaarsen 🤗 |
What does this PR do?
This PR internalises the NomicBERT model, following the basic structure of the https://huggingface.co/nomic-ai/nomic-bert-2048
Fixes #42738
Problem
BERT-like models using RoPE are currently not internalized in our codebase, e.g. https://huggingface.co/nomic-ai/nomic-bert-2048
Solution
This PR creates a basic internalized version of nomic-bert-2048 with required modifications.
modular_nomic_bert.pyimplemented and verified withpython utils/modular_model_converter.py modular_nomic_bert.pyconvert_nomic_bert_to_hf.pyadded with usage examplesdocs/source/en/model_doc/make fixuppasses with no errorsWho Can Review?
@ArthurZucker @Cyrilvallez (text models)