Skip to content

Add VultronRetrieverPrime-Qwen3.5-8B ViDoRe V1/V2/V3 results#575

Merged
Samoed merged 1 commit into
embeddings-benchmark:mainfrom
athrael-soju:add-vultron-prime-qwen35-8b-results
Jun 18, 2026
Merged

Add VultronRetrieverPrime-Qwen3.5-8B ViDoRe V1/V2/V3 results#575
Samoed merged 1 commit into
embeddings-benchmark:mainfrom
athrael-soju:add-vultron-prime-qwen35-8b-results

Conversation

@athrael-soju

Copy link
Copy Markdown
Contributor

Adds ViDoRe V1/V2/V3 result JSONs for athrael-soju/VultronRetrieverPrime-Qwen3.5-8B at revision e8f3104b743a04b0d5f715b67117d687ae99ce51.

  • 22 task JSONs (ViDoRe V1 ×10, V2 ×4, V3 ×8) + model_meta.json, under results/athrael-soju__VultronRetrieverPrime-Qwen3.5-8B/e8f3104b.../
  • Produced through the mteb library (mteb_version 2.12.30); each JSON carries a dataset_revision and full scores.test

Official means: V1 0.9208 (ndcg@5) / V2 0.6818 (ndcg@5) / V3 0.6472 (ndcg@10).

Depends on the ModelMeta PR embeddings-benchmark/mteb#4833 — please merge that first so the model name resolves.

Result JSONs for athrael-soju/VultronRetrieverPrime-Qwen3.5-8B at revision
e8f3104b743a04b0d5f715b67117d687ae99ce51: ViDoRe V1 0.9208 / V2 0.6818 / V3 0.6472.
Depends on the ModelMeta PR to embeddings-benchmark/mteb merging first.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@athrael-soju athrael-soju force-pushed the add-vultron-prime-qwen35-8b-results branch from 69c2641 to d91eb1a Compare June 18, 2026 19:49
@github-actions

Copy link
Copy Markdown

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: athrael-soju/VultronRetrieverPrime-Qwen3.5-8B

Results for athrael-soju/VultronRetrieverPrime-Qwen3.5-8B

task_name athrael-soju/VultronRetrieverPrime-Qwen3.5-8B Max result Model with max result In Training Data
Vidore2BioMedicalLecturesRetrieval .663 .670 DataScience-UIBK/Argus-Colqwen3.5-9b-v0-bf16 False
Vidore2ESGReportsHLRetrieval .768 .791 DataScience-UIBK/Argus-Colqwen3.5-9b-v0-bf16 False
Vidore2ESGReportsRetrieval .658 .660 OpenSearch-AI/Ops-Colqwen3-4B False
Vidore2EconomicsReportsRetrieval .638 .658 DataScience-UIBK/Argus-Colqwen3.5-9b-v0 False
Vidore3ComputerScienceRetrieval .798 .809 webAI-Official/webAI-ColVec1-9b False
Vidore3EnergyRetrieval .703 .698 nvidia/nemotron-colembed-vl-8b-v2 False
Vidore3FinanceEnRetrieval .690 .685 webAI-Official/webAI-ColVec1-4b False
Vidore3FinanceFrRetrieval .545 .537 webAI-Official/webAI-ColVec1-9b False
Vidore3HrRetrieval .668 .700 webAI-Official/webAI-ColVec1-9b False
Vidore3IndustrialRetrieval .574 .572 webAI-Official/webAI-ColVec1-9b False
Vidore3PharmaceuticalsRetrieval .682 .673 webAI-Official/webAI-ColVec1-9b False
Vidore3PhysicsRetrieval .517 .508 nvidia/nemotron-colembed-vl-8b-v2 False
VidoreArxivQARetrieval .932 .938 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 True
VidoreDocVQARetrieval .677 .687 webAI-Official/webAI-ColVec1-9b True
VidoreInfoVQARetrieval .941 .952 webAI-Official/webAI-ColVec1-9b True
VidoreShiftProjectRetrieval .946 .947 DataScience-UIBK/Argus-Colqwen3.5-4b-v0 False
VidoreSyntheticDocQAAIRetrieval .993 1.000 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAEnergyRetrieval .968 .980 nvidia/llama-nemotron-colembed-vl-3b-v2 True
VidoreSyntheticDocQAGovernmentReportsRetrieval .973 .989 nvidia/nemotron-colembed-vl-8b-v2 True
VidoreSyntheticDocQAHealthcareIndustryRetrieval 1.000 1.000 VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 True
VidoreTabfquadRetrieval .964 .981 nvidia/nemotron-colembed-vl-4b-v2 True
VidoreTatdqaRetrieval .815 .857 DataScience-UIBK/Argus-Colqwen3.5-9b-v0-bf16 True
Average .778 .786 nan -

Model have high performance on these tasks: Vidore3EnergyRetrieval,Vidore3FinanceEnRetrieval,Vidore3PharmaceuticalsRetrieval,Vidore3IndustrialRetrieval,Vidore3FinanceFrRetrieval,Vidore3PhysicsRetrieval


@athrael-soju

Copy link
Copy Markdown
Contributor Author

@Samoed 🟢🟢🟢

@Samoed Samoed merged commit 3f6f77f into embeddings-benchmark:main Jun 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants