[v2] New encoder interface#2415
Conversation
# Conflicts: # mteb/evaluation/evaluators/Image/ClassificationEvaluator.py # mteb/evaluation/evaluators/PairClassificationEvaluator.py # mteb/model_meta.py
|
@isaac-chung @KennethEnevoldsen Can you review? |
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
A lot of good stuff here - sorry the second part of the review came so long after the first comments (the GitHub UI is simply excessively slow when working with large diffs - this is also why some of the comments are a bit terse)
KennethEnevoldsen
left a comment
There was a problem hiding this comment.
Looks good. I think this one is ready to merge
Hmm, if you can run every model in this screenshot that is runnable, it's probably a good enough check. Thanks again in advance! CC @gowitheflow-1998 this is a change in v2. |
Did you try this? |
|
Ah yes, yours is correct. e5-v is at the top. |
|
@orionw Hope you’ve successfully submitted to COLM! Could you take a look at the PR when you have a moment? Regarding the |
|
Thanks for the ping @Samoed, sorry for the delay. It seems like this is the same interface that the cross-encoders currently have: namely the I have missed a lot of the conversation here so apologies if this has been hashed out: the only thing I think is confusing is we lump cross-encoders into the same class as the bi-encoders (also same with the base Otherwise it looks good and complete w.r.t. cross-encoders! |
Yes, I understand your point, but we haven't implemented the cross-encoder interface anywhere yet, and it would be difficult to check each time if a model is a cross-encoder. I have made a change based on the |
|
@isaac-chung I've tested models. Results from this PR:
Results from leaderboard:
|
|
Odd. I was able to get 'main_score': 0.55203 on |
|
Yes, I'm getting similar score |
|
I got 'main_score': 0.55457 from |
|
For CVBench I agree, but what's for |
|
We previously overwrote LB scores to use |
|
That should be it, thanks so much, @Samoed ! |
|
Scores after main merge are matching for
|



Closes #1735
WrappertoAbsEncodertypesfileAll tasks now use the model's similarity implementation cosine by default. This allows for new approaches for similarity, but does mean that scores will change:
facebook/contriever-msmarco, that was added inv2For now, I don't know what to do with
CrossEncoders. Should we create another interface for them?Code Quality
make lintto maintain consistent style.Documentation
Testing
make test-with-coverage.make testormake test-with-coverageto ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2intfloat/multilingual-e5-smallself.stratified_subsampling() under dataset_transform()make test.make lint.Adding a model checklist
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)