Skip to content

Add ViDoRe V3 results: Verm1ion/ColTurk-VDR-Qwen3VL-4B-v1.0#565

Merged
Samoed merged 11 commits into
embeddings-benchmark:mainfrom
Verm1lion:add-colturk-vdr-results
Jun 13, 2026
Merged

Add ViDoRe V3 results: Verm1ion/ColTurk-VDR-Qwen3VL-4B-v1.0#565
Samoed merged 11 commits into
embeddings-benchmark:mainfrom
Verm1lion:add-colturk-vdr-results

Conversation

@Verm1lion

Copy link
Copy Markdown
Contributor

Self-reported ViDoRe V3 results (8 public retrieval tasks) for Verm1ion/ColTurk-VDR-Qwen3VL-4B-v1.0 — mean NDCG@10 = 0.5584 (full corpus, all queries, MaxSim). ModelMeta PR: embeddings-benchmark/mteb#4796.

Files: results/Verm1ion__ColTurk-VDR-Qwen3VL-4B-v1.0/d56c7bbc278ba2fe4ac1c255fb0e55dd46b40bad/ — 8 task JSONs (live dataset_revisions, mteb_version 2.15.3) + model_meta.json.

Checklist

@KennethEnevoldsen KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Jun 11, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format of resutls is not mteb format. How did you run your model?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were reformatted from my own eval harness output - the mteb implementation (embeddings-benchmark/mteb#4796) wasn't merged yet when I opened this. It's merged now, so I'm re-running the tasks with mteb directly and will update the files here.

@github-actions

Copy link
Copy Markdown

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Verm1ion/ColTurk-VDR-Qwen3VL-4B-v1.0
Tasks: Vidore3ComputerScienceRetrieval, Vidore3EnergyRetrieval, Vidore3FinanceEnRetrieval, Vidore3FinanceFrRetrieval, Vidore3HrRetrieval, Vidore3IndustrialRetrieval, Vidore3PharmaceuticalsRetrieval, Vidore3PhysicsRetrieval

Results for Verm1ion/ColTurk-VDR-Qwen3VL-4B-v1.0

task_name is_public Verm1ion/ColTurk-VDR-Qwen3VL-4B-v1.0 Max result Model with max result In Training Data
Vidore3ComputerScienceRetrieval 1.0000 0.7311 0.8092 webAI-Official/webAI-ColVec1-9b False
Vidore3EnergyRetrieval 1.0000 0.6237 0.6982 nvidia/nemotron-colembed-vl-8b-v2 False
Vidore3FinanceEnRetrieval 1.0000 0.5863 0.6849 webAI-Official/webAI-ColVec1-4b False
Vidore3FinanceFrRetrieval 1.0000 0.4451 0.5372 webAI-Official/webAI-ColVec1-9b False
Vidore3HrRetrieval 1.0000 0.5483 0.7004 webAI-Official/webAI-ColVec1-9b False
Vidore3IndustrialRetrieval 1.0000 0.4605 0.5718 webAI-Official/webAI-ColVec1-9b False
Vidore3PharmaceuticalsRetrieval 1.0000 0.6152 0.6732 webAI-Official/webAI-ColVec1-9b False
Vidore3PhysicsRetrieval 1.0000 0.4552 0.5084 nvidia/nemotron-colembed-vl-8b-v2 False
Average nan 0.5582 0.6479 nan -

Training datasets: JinaVDRArxivQARetrieval, JinaVDRDocQAAI, JinaVDRDocQAEnergyRetrieval, JinaVDRDocQAGovReportRetrieval, JinaVDRDocQAHealthcareIndustryRetrieval, JinaVDRDocVQARetrieval, JinaVDRInfovqaRetrieval, JinaVDRTatQARetrieval, VidoreArxivQARetrieval, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreSyntheticDocQAEnergyRetrieval, VidoreSyntheticDocQAGovernmentReportsRetrieval, VidoreSyntheticDocQAHealthcareIndustryRetrieval, VidoreTatdqaRetrieval


@Verm1lion

Copy link
Copy Markdown
Contributor Author

Updated to the native mteb output — the earlier files were reformatted from my own harness, which was the mismatch you spotted. The 8 ViDoRe V3 task JSONs are now generated by mteb.evaluate (mteb 2.15.4) with the registered ColQwen3EngineWrapper, one row per language subset. All checks are green — could you take another look? Thanks @Samoed!

@Samoed Samoed merged commit 6777baf into embeddings-benchmark:main Jun 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting for review of implementation This PR is waiting for an implementation review before merging the results.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants