qzhou-embedding results#249
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | Kingsoft-LLM/QZhou-Embedding | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result |
|---|---|---|---|---|
| AFQMC | 0.72 | nan | 0.33 | 0.72 |
| ATEC | 0.58 | nan | 0.4 | 0.65 |
| AmazonCounterfactualClassification | 0.97 | 0.88 | 0.7 | 0.97 |
| ArXivHierarchicalClusteringP2P | 0.65 | 0.65 | 0.56 | 0.69 |
| ArXivHierarchicalClusteringS2S | 0.63 | 0.64 | 0.54 | 0.65 |
| ArguAna | 0.77 | 0.86 | 0.54 | 0.90 |
| AskUbuntuDupQuestions | 0.77 | 0.64 | 0.59 | 0.77 |
| BIOSSES | 0.97 | 0.89 | 0.85 | 0.97 |
| BQ | 0.84 | nan | 0.48 | 0.84 |
| Banking77Classification | 0.83 | 0.94 | 0.75 | 0.94 |
| BiorxivClusteringP2P.v2 | 0.70 | 0.54 | 0.37 | 0.70 |
| CLSClusteringP2P | 0.78 | nan | nan | 0.82 |
| CLSClusteringS2S | 0.76 | nan | nan | 0.76 |
| CMedQAv1-reranking | 0.95 | nan | 0.68 | 0.95 |
| CMedQAv2-reranking | 0.95 | nan | 0.67 | 0.95 |
| CQADupstackGamingRetrieval | 0.75 | 0.71 | 0.59 | 0.79 |
| CQADupstackUnixRetrieval | 0.70 | 0.54 | 0.4 | 0.72 |
| ClimateFEVERHardNegatives | 0.52 | 0.31 | 0.26 | 0.52 |
| CmedqaRetrieval | 0.52 | nan | 0.29 | 0.57 |
| Cmnli | 0.94 | nan | nan | 0.94 |
| CovidRetrieval | 0.93 | 0.79 | 0.76 | 0.96 |
| DuRetrieval | 0.82 | nan | 0.85 | 0.94 |
| EcomRetrieval | 0.75 | nan | 0.55 | 0.78 |
| FEVERHardNegatives | 0.83 | 0.89 | 0.84 | 0.95 |
| FiQA2018 | 0.63 | 0.62 | 0.44 | 0.80 |
| HotpotQAHardNegatives | 0.72 | 0.87 | 0.71 | 0.87 |
| IFlyTek | 0.72 | nan | 0.42 | 0.72 |
| ImdbClassification | 0.99 | 0.95 | 0.89 | 0.99 |
| JDReview | 0.95 | nan | 0.81 | 0.95 |
| LCQMC | 0.82 | nan | 0.76 | 0.82 |
| MMarcoReranking | 0.28 | nan | 0.29 | 0.47 |
| MMarcoRetrieval | 0.77 | nan | 0.79 | 0.90 |
| MTOPDomainClassification | 0.99 | 0.98 | 0.9 | 1.00 |
| MassiveIntentClassification | 0.74 | 0.82 | 0.6 | 0.92 |
| MassiveScenarioClassification | 0.88 | 0.87 | 0.7 | 0.99 |
| MedicalRetrieval | 0.73 | nan | 0.51 | 0.76 |
| MedrxivClusteringP2P | 0.74 | nan | 0.32 | 0.74 |
| MedrxivClusteringS2S | 0.72 | nan | 0.3 | 0.72 |
| MindSmallReranking | 0.35 | 0.33 | 0.3 | 0.35 |
| MultilingualSentiment | 0.97 | nan | 0.71 | 0.97 |
| Ocnli | 0.96 | nan | nan | 0.96 |
| OnlineShopping | 0.99 | nan | 0.9 | 0.99 |
| PAWSX | 0.80 | nan | 0.15 | 0.80 |
| QBQTC | 0.58 | nan | nan | 0.71 |
| SCIDOCS | 0.54 | 0.25 | 0.17 | 0.54 |
| SICK-R | 0.90 | 0.83 | 0.8 | 0.95 |
| STS12 | 0.92 | 0.82 | 0.8 | 0.95 |
| STS13 | 0.96 | 0.90 | 0.82 | 0.98 |
| STS14 | 0.93 | 0.85 | 0.78 | 0.98 |
| STS15 | 0.95 | 0.90 | 0.89 | 0.98 |
| STS17 | 0.91 | 0.89 | 0.82 | 0.93 |
| STS22.v2 | 0.87 | 0.72 | 0.64 | 0.87 |
| STSB | 0.96 | 0.85 | 0.82 | 0.96 |
| STSBenchmark | 0.96 | 0.89 | 0.87 | 0.96 |
| SprintDuplicateQuestions | 0.99 | 0.97 | 0.93 | 0.99 |
| StackExchangeClustering.v2 | 0.82 | 0.92 | 0.46 | 0.92 |
| StackExchangeClusteringP2P.v2 | 0.57 | 0.51 | 0.39 | 0.57 |
| SummEvalSummarization.v2 | 0.79 | 0.38 | 0.31 | 0.79 |
| T2Reranking | 0.68 | 0.68 | 0.66 | 0.73 |
| T2Retrieval | 0.79 | nan | 0.76 | 0.89 |
| TNews | 0.88 | nan | 0.49 | 0.88 |
| TRECCOVID | 0.76 | 0.86 | 0.71 | 0.95 |
| ThuNewsClusteringP2P | 0.91 | nan | nan | 0.91 |
| ThuNewsClusteringS2S | 0.88 | nan | nan | 0.88 |
| Touche2020Retrieval.v3 | 0.55 | 0.52 | 0.5 | 0.75 |
| ToxicConversationsClassification | 0.97 | 0.89 | 0.66 | 0.98 |
| TweetSentimentExtractionClassification | 0.94 | 0.70 | 0.63 | 0.94 |
| TwentyNewsgroupsClustering.v2 | 0.91 | 0.57 | 0.39 | 0.91 |
| TwitterSemEval2015 | 0.92 | 0.79 | 0.75 | 0.92 |
| TwitterURLCorpus | 0.93 | 0.87 | 0.86 | 0.96 |
| VideoRetrieval | 0.72 | nan | 0.58 | 0.84 |
| Waimai | 0.98 | nan | 0.86 | 0.98 |
| Average | 0.80 | 0.75 | 0.61 | 0.84 |
|
Hi @PennyYu123, I can neither find this model on Huggingface nor an implementation within the mteb package |
|
@KennethEnevoldsen It was added in this PR embeddings-benchmark/mteb#2965, but I can't find it on HF too |
|
Given that the model submission was closed, I will also close this. Feel free to re-open if you resubmit the model |
|
I have corrected the wrong info in model_meta in the latest mteb PR and updated the model implementation code. Our Qingzhou model is now private and will be made public on HF tomorrow. |
|
Ah, that is great. Do feel free to reopen the PR once the implementation has been reviewed |
Checklist
mteb/models/this can be as an API. Instruction on how to add a model can be found hereI have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
mteb.get_model(model_name, revision) and
mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.
The model is public, i.e. is available either as an API or the wieght are publicly avaiable to download