evaluation-llms topic
List
evaluation-llms repositories
AttrScore
52
Stars
2
Forks
Watchers
Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"
MLLM-CompBench
41
Stars
2
Forks
41
Watchers
[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, tem...