Skip to content

model: Add VIRTUE multimodal embedding models (Sony VIRTUE-2B/7B-SCaR)#4822

Merged
Samoed merged 2 commits into
embeddings-benchmark:mainfrom
fzowl:fix/issue-4517
Jun 17, 2026
Merged

model: Add VIRTUE multimodal embedding models (Sony VIRTUE-2B/7B-SCaR)#4822
Samoed merged 2 commits into
embeddings-benchmark:mainfrom
fzowl:fix/issue-4517

Conversation

@fzowl

@fzowl fzowl commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Adds the Sony VIRTUE universal text-image embedders (VIRTUE-2B-SCaR and VIRTUE-7B-SCaR), built on Qwen2-VL. The wrapper uses left-padding last-token pooling with L2 normalization and supports text-only, image-only, and fused image+text inputs, matching the no-visual-prompt path of the reference implementation. A smoke evaluation on AILAStatutes ran successfully with finite scores. Fixes #4517.

@Samoed

Samoed commented Jun 17, 2026

Copy link
Copy Markdown
Member

Can you try to run vidore v1&v2 tasks to reproduce scores?

@fzowl

fzowl commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Sure, I'll run ViDoRe v1 & v2 with both checkpoints and post the scores here.

Comment thread mteb/models/model_implementations/virtue_models.py Outdated
@fzoll

fzoll commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Re: ViDoRe — I'll need to queue this on GPU and will post scores in a follow-up.

Re: CI — the test and 3.13 failures don't reproduce locally (both latest and lowest deps pass). Could you share the failure logs or re-trigger the runs?

@Samoed

Samoed commented Jun 17, 2026

Copy link
Copy Markdown
Member

Re: CI — the test and 3.13 failures don't reproduce locally (both latest and lowest deps pass). Could you share the failure logs or re-trigger the runs?

This is fine. It's just a flaky test. I think you can also check it yourself

@fzoll fzoll left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 83d8c76 — moved show_progress_bar to an explicit function arg.

@Samoed Samoed added the new model Questions related to adding a new model to the benchmark label Jun 17, 2026
@fzowl

fzowl commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

ViDoRe(v1&v2) results for both VIRTUE checkpoints:

Task VIRTUE-2B-SCaR ndcg@5 VIRTUE-7B-SCaR ndcg@5
VidoreArxivQARetrieval 0.0114 0.1818
VidoreDocVQARetrieval 0.0052 0.1095
VidoreInfoVQARetrieval 0.0163 0.5171
VidoreTabfquadRetrieval 0.0423 0.2589
VidoreTatdqaRetrieval 0.0157 0.0418
VidoreShiftProjectRetrieval 0.0000 0.1361
VidoreSyntheticDocQAAIRetrieval 0.0000 0.1568
VidoreSyntheticDocQAEnergyRetrieval 0.0163 0.2577
VidoreSyntheticDocQAGovernmentReportsRetrieval 0.0356 0.1854
VidoreSyntheticDocQAHealthcareIndustryRetrieval 0.0113 0.2330
Vidore2ESGReportsRetrieval 0.0371 0.1762
Vidore2ESGReportsHLRetrieval 0.0090 0.1390
Vidore2EconomicsReportsRetrieval 0.0000 0.1641
Vidore2BioMedicalLecturesRetrieval 0.0035 0.0558

@Samoed

Samoed commented Jun 17, 2026

Copy link
Copy Markdown
Member

Hm, seems they don't report results on vidore even they evaluated on MMEB which have vidore 1&2 in subtasks. I think we can merge then

@Samoed Samoed merged commit 41f6c7e into embeddings-benchmark:main Jun 17, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model Questions related to adding a new model to the benchmark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add model: VIRTUE

3 participants