model: llama-embed-nemotron-8b#3407
Conversation
|
Do you have plans to integrate your omnin embed model? We're releasing v2 version on Monday with better support for multimodality |
| model_name, | ||
| revision, | ||
| max_seq_length=4096, | ||
| batch_size=4, |
There was a problem hiding this comment.
I added batch_size handling from encode_kwargs, but some of the benchmarks are getting GPU OOM now. Is it a user's responsibility to specify a proper encode_kwargs={"batch_size": 4} argument?
There was a problem hiding this comment.
Well, it really depends on the system that they are on. Currently, the default is 32, but it might be ideal to lower that. Unsure if it is better to get the OOM and adjust it down to a reasonable level, rather than have it at a too low default. I might be leaning toward OOM being better
There was a problem hiding this comment.
Right, I think, 32 is a reasonable default choice. Actually, I was getting those OOMs for version 1.39.7 which had 128 default for some problem types
| with torch.inference_mode(): | ||
| inputs = self.tokenizer( | ||
| batch, | ||
| max_length=self.max_seq_length, |
There was a problem hiding this comment.
I think this would be better to specify in tokenization config
There was a problem hiding this comment.
Here is our current state with a context length:
- Base
Llama-3.1-8Bsupports 128k - We've tested our
llama-embed-nemotron-8bwith the context length up to 32k, which we report in the metadata - We've ran the evaluation with 4k context length
So, our config has a theoretical 128k limit, but 4k is here for eval reproducibility
Hi @Samoed, do you mean v2 of M-MTEB? What will happen to the current M-MTEB leaderboard on Monday? By integrating, you mean combining |
I mean this library
Nothing, it will be unchanged
No, add it as separate model, because it's omni is multimodal, but nemotron is text only |
|
I've found the change log here: https://embeddings-benchmark.github.io/mteb/whats_new/, looks nice! @Samoed Which existing/upcoming Leaderboards would you suggest for the Omni model? |
|
Created issue about discussion of |
Thanks! I added a few changes + lint @Samoed Do you have any ETA when this model can make it to MMTEB Leaderboard? Do we have to wait for |
|
If you can wait a bit, it’ll be easier to add the model to v2, and it will appear on the leaderboard on Monday with the release of the second version. |
# Conflicts: # mteb/models/nvidia_models.py
|
@ybabakhin I've aligned your model with |
|
@Samoed I added a small fix to make it work with v2.0.0. New eval code works fine: import mteb
model_name = "nvidia/llama-embed-nemotron-8b"
model = mteb.get_model(model_name)
tasks = mteb.get_tasks(tasks=["HagridRetrieval"])
mteb.evaluate(
model,
tasks,
encode_kwargs={"batch_size": 4},
)I'm only getting huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)I will run it on more tasks to check if there are any discrepancies. Shall I also update a |
Yes, this is for models that wouldn't use dataloaders directly (e.g. sentence transformers).
That would be nice, but this is minor |
|
@Samoed some tests are failing, but I don't think it is related to the changes in this PR |
|
Yes, I see. This is a flaky test that we’re currently working to fix. |
|
@Samoed , @KennethEnevoldsen can you, please, merge this PR now? Also, is v2.0.0 release still planned for tomorrow? |
|
Yes, it will be released tomorrow. This pr will be merged and Kenneth finish review of results |
Adds llama-embed-nemotron-8b model
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)