MultiVox

MultiVox is a benchmark to assess how well omni-modal language models can integrate audio and visual cues to give a contextual repsonse

We provide scripts to run Qwen 2.5 Omni using vLLM here

python3 src/baseline_qwen.py

We use GPT 4.1-mini to run evaluation. You can use the following script to run evaluation

python3 src/evaluate.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback