This issue keeps track of all vlm models accuracy in MMMU benchmark. Keep updating
python benchmark/mmmu/bench_sglang.py
python benchmark/mmmu/bench_hf.py --model-path model
|
sglang |
hf |
| Qwen2-VL-7B-Instruct |
0.485 |
0.255 |
| Qwen2.5-VL-7B-Instruct |
0.477 |
0.242 |
| MiniCPM-V-2_6 |
0.426 |
|
| MiniCPM-O-2_6 |
0.481 |
0.49 |
| Deepseek-vl2 |
0.496 |
0.499 |
| Deepseek-vl2-small |
0.464 |
0.453 |
| Deepseek-vl2-tiny |
0.382 |
0.369 |
| Deepseek-Janus-Pro-7B |
|
|
| Llava + Llama |
|
|
| Llava + qwen |
|
|
| Llava + Mistral |
|
|
| Mlama |
|
|
| Gemma-3-it-4B |
0.409 |
0.403 |
| InternVL2.5-38B |
0.61 |
|
This issue keeps track of all vlm models accuracy in MMMU benchmark. Keep updating